Welcome
Username or Email:

Password:


Missing Code




[ ]
[ ]
Online
  • Guests: 22
  • Members: 0
  • Newest Member: omjtest
  • Most ever online: 396
    Guests: 396, Members: 0 on 12 Jan : 12:51
Members Birthdays:
One birthday today, congrats!
Vaxian (17)


Next birthdays
05/21 Dalus (34)
05/21 Kizmo (37)
05/22 Skynet (32)
Contact
If you need assistance, please send an email to forum at 4hv dot org. To ensure your email is not marked as spam, please include the phrase "4hv help" in the subject line. You can also find assistance via IRC, at irc.shadowworld.net, room #hvcomm.
Support 4hv.org!
Donate:
4hv.org is hosted on a dedicated server. Unfortunately, this server costs and we rely on the help of site members to keep 4hv.org running. Please consider donating. We will place your name on the thanks list and you'll be helping to keep 4hv.org alive and free for everyone. Members whose names appear in red bold have donated recently. Green bold denotes those who have recently donated to keep the server carbon neutral.


Special Thanks To:
  • Aaron Holmes
  • Aaron Wheeler
  • Adam Horden
  • Alan Scrimgeour
  • Andre
  • Andrew Haynes
  • Anonymous000
  • asabase
  • Austin Weil
  • barney
  • Barry
  • Bert Hickman
  • Bill Kukowski
  • Blitzorn
  • Brandon Paradelas
  • Bruce Bowling
  • BubeeMike
  • Byong Park
  • Cesiumsponge
  • Chris F.
  • Chris Hooper
  • Corey Worthington
  • Derek Woodroffe
  • Dalus
  • Dan Strother
  • Daniel Davis
  • Daniel Uhrenholt
  • datasheetarchive
  • Dave Billington
  • Dave Marshall
  • David F.
  • Dennis Rogers
  • drelectrix
  • Dr. John Gudenas
  • Dr. Spark
  • E.TexasTesla
  • eastvoltresearch
  • Eirik Taylor
  • Erik Dyakov
  • Erlend^SE
  • Finn Hammer
  • Firebug24k
  • GalliumMan
  • Gary Peterson
  • George Slade
  • GhostNull
  • Gordon Mcknight
  • Graham Armitage
  • Grant
  • GreySoul
  • Henry H
  • IamSmooth
  • In memory of Leo Powning
  • Jacob Cash
  • James Howells
  • James Pawson
  • Jeff Greenfield
  • Jeff Thomas
  • Jesse Frost
  • Jim Mitchell
  • jlr134
  • Joe Mastroianni
  • John Forcina
  • John Oberg
  • John Willcutt
  • Jon Newcomb
  • klugesmith
  • Leslie Wright
  • Lutz Hoffman
  • Mads Barnkob
  • Martin King
  • Mats Karlsson
  • Matt Gibson
  • Matthew Guidry
  • mbd
  • Michael D'Angelo
  • Mikkel
  • mileswaldron
  • mister_rf
  • Neil Foster
  • Nick de Smith
  • Nick Soroka
  • nicklenorp
  • Nik
  • Norman Stanley
  • Patrick Coleman
  • Paul Brodie
  • Paul Jordan
  • Paul Montgomery
  • Ped
  • Peter Krogen
  • Peter Terren
  • PhilGood
  • Richard Feldman
  • Robert Bush
  • Royce Bailey
  • Scott Fusare
  • Scott Newman
  • smiffy
  • Stella
  • Steven Busic
  • Steve Conner
  • Steve Jones
  • Steve Ward
  • Sulaiman
  • Thomas Coyle
  • Thomas A. Wallace
  • Thomas W
  • Timo
  • Torch
  • Ulf Jonsson
  • vasil
  • Vaxian
  • vladi mazzilli
  • wastehl
  • Weston
  • William Kim
  • William N.
  • William Stehl
  • Wesley Venis
The aforementioned have contributed financially to the continuing triumph of 4hv.org. They are deserving of my most heartfelt thanks.
Forums
4hv.org :: Forums :: Computer Science
« Previous topic | Next topic »   

debugging a PC, am I on the right track?

1 2 
Move Thread LAN_403
Dr. Slack
Sat Feb 05 2011, 09:29AM Print
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
Hi guys,

I could take this to a more specialist PC forum, but I find that people here are better thinkers, so I hope you don't mind if I pitch it here.

I don't know whether it's better to tell the story as it has evolved, in case I've missed something. Or just cut to the chase and present it as it appears now, to save you a lot of reading and me a lot of typing. I'll try the second first, after noting that it has been getting gradually worse, over a period of about 1 month.

The present symptom, the PC just switches "off" at some point early on during the OS load splash, whether that OS is XP Home SP3 (I get the carawling bar for a moment), or an Ubuntu Lucid (loader + live) CD (I get the little man at the bottom for a couple of seconds, but it dies before the animated dots appear) (I have used this disc before, on this machine, to sort Vista/XP permissions). I can load XP repair, which I've never seen fail (so it's not overheating, or the PSU!). I can run CHKDSK from it, sometimes it finds bad disk sectors, sometimes not. It used to be possible to sidestep the fail for a while by booting into XP safe mode then doing a warm restart, after which it may run for hours. However, after a nuke and reload of XP from scratch to the same HD, this workaround no longer works.

I have removed the HD and replaced it with another, was 160GB SATA, is now 80GB ATA slaved to the DVD. I have removed the PCIe video card to use the onboard graphics, this didn't seem to make any difference to the timing of the fail during the splash. I have reseated the RAM. I haven't done anything with the BIOS yet.

My present thinking is, that as it dies a little way into the splash, it's probably access to particular address locations, ones that don't get visited by XP repair, but are visited by both Ubuntu and XP, whether it's using a mobo or PCIe graphics adapter. So I think it's the effect of a "higher level" driver, after the display is enabled and shows something suscessfully. So is it the mobo? Would resetting the BIOS help? Are there any "check mobo" tools available.

Needless to say, due to lack of time and building levels of frustration, I haven't tried all possible combinations of hardware and software. For instance, Linux fails with a formatted but not bootable 80GB ATA, but I haven't tried it with the bootable 160GB SATA, or HD absent. I haven't rebuilt XP with only the mobo graphics present.

The system in an el-cheapo Emachines E4252, dual core athlon, 1GB. It was a refurb, rather than new, I acquired it perhaps 18 months ago. It came with Vista preloaded, but I've put an OEM XP SP3 on it. I used generic (huge) Nvidia and Realtek drivers from their websites, which seemed to give me full video and audio functionality. I'm not inclined to think that there's anything in the BIOS specific to Vista which would cause XP not to work, as it's worked fine for 18 months. I later added a cheap GT430 Nvidia graphics card to boost my daughter's SIMs3 display speed, which seemed to work OK for a month after that. As that was the last thing I did before the symptoms appeared, I have spent a long time thinking it's this, but I doubt it now.

I thought there was a facility when XP starts to be able to load the drivers one at a time from safe mode, but I can't find how to do this? Did it get removed with SP3? Was it on 98 or 95 and has never been on XP? Google no help with this so far. Is there this sort of facility with Lucid?

I could just replace the mobo, or the whole PC, but I'm looking for suggestions that will help me pinpoint exactly what has failed. Any help gratefully receieved

Back to top
Steve Conner
Sat Feb 05 2011, 09:46AM
Steve Conner Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706
I've had similar problems caused by an overheating CPU or GPU. Running XP Repair console doesn't take a lot of processing power, so the chip doesn't get hot.

To be precise: I fixed two systems with problems like this. One of them, the CPU heatsink had been installed with no thermal grease. The other, the GPU was running too hot. On a warm boot, it would crash when it got to loading the graphics driver. I discovered that blowing on the GPU fixed it.

If you suspect bad memory, you can run Memtestx86 or whatever it's called, but that might make the CPU get hot too.

If you can get the "windows did not load successfully" screen to appear, there's an option there to log all of the drivers loaded at boot time. Usually the last one in the log is the culprit. That was how I found my GPU problem.
Back to top
Dr. Slack
Sat Feb 05 2011, 10:54AM
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
Thanks Steve

The reason I didn't think it was overheating is that it will fail in the following circumstance - power off for ages, boot from Lucid, fail in seconds, yet run for hours otherwise. It fails in much the same time whether I'm using the onboard GeForce 5100 hardware, or a PCIe GT430 card.

A little more info, it *will* run in safe mode, in 1280/1024 resolution, but it is in a basic mode, it's slooooow and says "new hardware found, want to find video drivers".

I can't get the boot logging option after it fails, I just get 5 options for F8, but once it has booted to safe mode successfully, a reboot gives more F8 options. I enabled boot logging, and found ntbtlog.txt and setupapi.log deposited in the windows folder. Not what I expected, they're huge, it's obviosuly several appended, the size of the section headed with today's date and time is 33kb. This is the last few dozen lines of the ntbtlog.txt file, I guess that hoping it would say "loading this driver - uggh!" was somewhat forlorn. The thing that I didn't expect is that this "did not load audio codecs, did not load legacy stuff" is repeated over and over, which may be relevant in itself.

I'll try regreasing the sinks, and looking for memtest86

Did not load driver Communications Port
Did not load driver Printer Port
Did not load driver Realtek RTL8139 Family PCI Fast Ethernet NIC
Did not load driver Audio Codecs
Did not load driver Legacy Audio Drivers
Did not load driver Media Control Devices
Did not load driver Legacy Video Capture Devices
Did not load driver Video Codecs
Did not load driver WAN Miniport (L2TP)
Did not load driver WAN Miniport (IP)
Did not load driver WAN Miniport (PPPOE)
Did not load driver WAN Miniport (PPTP)
Did not load driver Packet Scheduler Miniport
Did not load driver Packet Scheduler Miniport
Did not load driver Direct Parallel
Loaded driver \SystemRoot\System32\Drivers\Cdfs.SYS
Did not load driver Processor
Did not load driver Processor
Did not load driver Communications Port
Did not load driver Printer Port
Did not load driver Realtek RTL8139 Family PCI Fast Ethernet NIC
Did not load driver Audio Codecs
Did not load driver Legacy Audio Drivers
Did not load driver Media Control Devices
Did not load driver Legacy Video Capture Devices
Did not load driver Video Codecs
Did not load driver WAN Miniport (L2TP)
Did not load driver WAN Miniport (IP)
Did not load driver WAN Miniport (PPPOE)
Did not load driver WAN Miniport (PPTP)
Did not load driver Packet Scheduler Miniport
Did not load driver Packet Scheduler Miniport
Did not load driver Direct Parallel
Loaded driver \SystemRoot\System32\Drivers\Fastfat.SYS


AFAIK through google, FastFat is for FAT based USB external drives, which is how I got the log files off the machine to something that could read them more easily, so it's not clear that that is the fault.

a few warnings/errors from setupapi, and the last few lines
it's not obvious to me whether they are relevant or not


...
#E077 Could not locate a non-empty section [iis_common_install] when calculating disk space in "C:\WINDOWS\INF\iis.inf". Error 0xe0000102: The required line was not found in the INF.
#E077 Could not locate a non-empty section [iis_inetmgr_install] when calculating disk space in "C:\WINDOWS\INF\iis.inf". Error 0xe0000102: The required line was not found in the INF.
...

#I121 Device install of "PCI\VEN_10DE&DEV_0DE1&SUBSYS_082810DE&REV_A1\4&228469D0&0&0048" finished successfully.
[2011/02/04 16:07:11 3064.124]
#-198 Command line processed: "C:\NVIDIA\HDAudioWHQLDriver\1.0.15.0\International\setup.exe" -s /s
#E412 Per-machine codesigning policy settings appear to have been tampered with. Error 13: The data is invalid.

...
#I123 Doing full install of "USBSTOR\DISK&VEN_UDISK&PROD_PDU09_4G_9AI2.0&REV_0.00\0000000000032A&0".
#W100 Query-removal during install of "USBSTOR\DISK&VEN_UDISK&PROD_PDU09_4G_9AI2.0&REV_0.00\0000000000032A&0" was vetoed by "STORAGE\RemovableMedia\7&1688b6db&0&RM" (veto type 5: PNP_VetoOutstandingOpen).
#W104 Device "USBSTOR\DISK&VEN_UDISK&PROD_PDU09_4G_9AI2.0&REV_0.00\0000000000032A&0" required reboot: Query remove failed (install) CfgMgr32 returned: 0x17: CR_REMOVE_VETOED.
...

#-166 Device install function: DIF_INSTALLINTERFACES.
#-011 Installing section [volume_install.Interfaces] from "c:\windows\inf\volume.inf".
#I054 Interfaces installed.
#-166 Device install function: DIF_INSTALLDEVICE.
#I123 Doing full install of "STORAGE\REMOVABLEMEDIA\7&1688B6DB&0&RM".
#I121 Device install of "STORAGE\REMOVABLEMEDIA\7&1688B6DB&0&RM" finished successfully.
Back to top
Nicko
Sat Feb 05 2011, 06:36PM
Nicko Registered Member #1334 Joined: Tue Feb 19 2008, 04:37PM
Location: Nr. London, UK
Posts: 615
First thing - reset bios to "safe defaults". It may have been played with to overclock it.

Second - vacuum out the CPU heatsink (or use compressed air) and the PSU - I use a small paint brush to dislodge the dust whilst hoovering - as the PSU and CPU fans are sucking in air all the time, they accumulate a whole load of dust - I've seen many PSUs & CPU heatsinks that were locked solid with crud - PSUs & CPUs overheat very quickly if blocked...

Third, if there is more than one DIMM, swap them around - see if that moves the problem - just re-seating the DIMMs can often help.

Fourth - run memtest86 (must be booted from a CD or floppy) - Link2

Fifth - is your PSU BIG enough for the monster graphics cards etc - it could just be the PSU about to die - try another PSU...

Sixth - enable POST testing - see your bios manual for this - you may need to buy a card for this - they are very cheap and very useful to have around - Link2

Seventh - try the above and then we'll go there!

Cheers
Back to top
Andyman
Sun Feb 06 2011, 08:36AM
Andyman Registered Member #1083 Joined: Mon Oct 29 2007, 06:16PM
Location: Upland, California
Posts: 256
Possibly try removing/moving around the RAM cards. That's been known to fix a similar problem before for me.
Back to top
Dr. Slack
Sun Feb 06 2011, 12:31PM
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
Thanks all,

memtest86 reports no RAM errors. I do have two sticks but haven't tried running with just one or swapped yet. I did reseat them though. I need to move the machine to where it's easier to work on. It's had little run time, and is still really clean in there, but it's not going to hurt to blow out what little dust there is. I'll do both of those today.

Nicko, I've used the BIOS "safe defaults" option, it didn't appear to change anything, so I guess I was already on them. I'm not an overclocker BTW. The settings as reported by memtest were 351M 5,5,5,15 RAM DDR-2, 2109M Athlon, 17290 L1 and 3220 L2. Are there any relevan ones I've missed?

Since my last post it has clocked up 8+ hours running in XP safe mode (sloooow 1280x1024 graphics) and memtest86, yet fails routinely a few seconds after the splash from a Linux or full XP boot, whether warm or cold (like actual machine cold after 30 mins off) using the on-board weedy graphics ("just enough for Aero" GeForce5100 chipset). I am assuming it's not temperature, as the fail time is so independant of it.

I would be convinced it's the mobo, perhaps high address connections to the graphics, except that the timing of the fail is exactly the same whether I'm using the on-board or the PCIe graphics. Does the interface from the CPU to graphics function use the same physical harwdare or address ranges at some point, after the OS has recognised that it's capable of more than bogo VGA?
Back to top
Bjørn
Sun Feb 06 2011, 01:23PM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
I have managed to hunt out a few broken CPUs/chipsets using Prime95 Link2

If that does not find anything either you might have a hairline crack in the PCB somewhere that is only causing problems when certain functions are in use, if you suspect that then it is time to recycle the thing before you go mad because it will be almost impossible to pin down.
Back to top
Conundrum
Sun Feb 06 2011, 03:19PM
Conundrum Registered Member #96 Joined: Thu Feb 09 2006, 05:37PM
Location: CI, Earth
Posts: 4059
yeah, this helped when I had a failed CPU once.

I've had laptops where the onboard graphics fails, made by a certain "Famous Brand"... this can cause all manner of strange problems.

-A
Back to top
Dr. Slack
Mon Feb 07 2011, 09:22AM
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
If Prime95 errors, will it do so in a way that distinguishes between mobo and CPU being faulty. Is it the case that if it runs, then it pretty much exonerates the CPU?

If so, that only leaves the mobo, and they aren't that expensive. I was just a bit reluctant to buy a new one before nailing the fault as "very likely" to be there, it would pi$$ me off a treat to buy something new and still have the fault present. I don't have a spare lying around to try before I buy.

Swapping and testing RAM and HD has ruled them out, XP and Linux rules out OS and drivers, swapping and measuring the PSU shows it's not that. The heatsinks are clean, and the fault is very temperature insensitive. It's unlikely to be the graphics as it fails the same for onboard or PCI, I wonder how common mode their data/control path is on the mobo, does the onboard graphics use the PCI mechanism? They are both Nvidia, but it's a fault that's developed gradually, just the way that cracks or solder joint fails might, and the way software doesn't (usually).

Just thinking out loud really. I think if the CPU survives the Prime95 workout, then it's worth a new mobo?

----------------- edit --------------------------------

It's survived several hours of Prime95, big and small FFTs, so the CPU and RAM are given a few more trust points.

I notice that the mobo has several bulging capacitors near the CPU, and I know this is not a good sign, I should have spotted it earlier. On the one hand, I could spend an hour replacing these caps. But OTOH, I would have thought the sort of saggy power problem that they would cause wouldn't result in such a specific fail as the one I'm seeing. Perhaps I'll just stop burning time and spend £32 on a new, and preferrably better-known brand, mobo.

------------------ edit ---------------------------------

So I bought a new mobo from my favourite supplier eBuyer. I bought it on super-saver 3-5 working day delivery, because they always arrive quicker than that, and because I'm too tight to pay extra for next day. Well, it appears that 5 working days sometimes *means* 5 working days, because I still haven't got it, which is a pity as I was hoping to muck about with it at the weekend.
Back to top
Dr. Slack
Tue Feb 15 2011, 08:39PM
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
bump

Thanks for your moral support. The new Gigabyte mobo is doing its thang, I just need to find a suitable way to recycle the old one. Perhaps I'll start with 10kV from 25uF?

One thing that did not impress me about the old one is that where the gigabyte has a metal pressure plate under the CPU, the old one had plastic, which was banana-shaped with the heatsink pressure.

I'm not sure that caps have advanced hugely in the last two years, perhaps higher frequency switchmode controllers and low voltage FETs have, as the new board has nothing like the uFage in ranks of 'lytics alongside the CPU, the old board was covered in them.
Back to top
1 2 

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob

Go to:

Powered by e107 Forum System
 
Legal Information
This site is powered by e107, which is released under the GNU GPL License. All work on this site, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. By submitting any information to this site, you agree that anything submitted will be so licensed. Please read our Disclaimer and Policies page for information on your rights and responsibilities regarding this site.