Tuesday, September 25, 2007

Buy a Good Power Supply

If you'd like to read the story, feel free. It's not a real page turner, so I'll provide my advice first.

  1. Buy a good power supply. Cheep power supplies are not reliable and can be very tough to troubleshoot unless you are fortunate enough to own a PSU tester that actually works (mine didn't).
  2. Don't blow dust and heat directly at the bottom of your computer (i.e. make sure you don't have a heating vent in a place that directly affects the running temperature of your PC)
  3. When you have many components fail simultaneously, at least suspect the PSU. In my case, since I knew the Motherboard and Processor were dead I figured they could have been the reason that the drives failed.
  4. If you have a suitable, working spare PSU, try that and see if your problems don't go away.
  5. Buy Seagate or Maxtor hard drives. Their RMA process is incredible. They have my business for the rest of my computing days.
And here's the story . . .
I have an HTPC running Snapstream Media's Beyond TV. It's a fantastic application that is easy to get up and running on Windows. Though it's not free and there are certainly alternatives, it doesn't come with subscription fees and due to various historical reasons I won't get into in this post, I have been using it for a while and switching to MythTV would be painful as a result.

So, the rig is:
High-end Asus Motherboard running a Pentium D 850 over-clocked to 3.6GHz.
(2) 300GB Seagate Hard Drives
(1) 400GB Seagate Hard Drive
Creative Labs Soudblaster Audigy 2ZS
Hauppauge PVR-150MCE
Plextor Mpeg-4 USB TV Capture Device
AMD/ATI X1800XT PCIe Video (I'm outputting 1080i to a Sony HDTV)
Tons of custom software I've written to do various tasks that are shortcomings of BeyondTV.
Antec 500W power supply (not great, but need 500W to overclock this chip with what this box has installed)
1G of DDR2 memory to able to run at speed necessary for overclocking this chip.

About a month ago, the box started rebooting at random. As this machine is very overclocked, I assume that I'm having heat issues. The first step was to put everything back to spec. Yes, I know "overclocking bad". In this case, I disagree. The system was incredibly stable for over a year and the parts were hand picked for the task (including a very nice Zalman CPU cooler).

Things seem ok for a few days, until another sudden reboot. I had no time to troubleshoot it so I left it to its own for about a week. Unfortunately, when I did get to it ... after hours of troubleshooting ... I discovered the following had happened:
The processor is dead.
The motherboard has a black mark next to a place where a capacitor belonged and there are pieces of it throughout the case, so the board is toast too. The capacitor in question handled power to the processor, which explains my dead chip.
Two of my drives are clicking, data is unreadable. I don't back this machine up because in my short-sightedness I thought "gee, it's only TV shows, I can just record them again later". Unfortunately, one of the dead drives had all of that custom software on it and I hadn't backed up that folder in over a month. Oops!

So I put a few things on order, and send away for warranty repair on a few others. I decided to go AMD with an Athlon X2 4200+ and a suitable Asus Motherboard and for good measure, pick up a Maxtor 500GB hard drive. Still figuring this was a heat issue, I decide to go a little crazy on cable organization and end up with a very suitable, clean, airflow friendly rig.

...Except, my hard drives are clicking, I'm getting read/write failures and the one good drive that I had left is now reporting a S.M.A.R.T. failure (useless feature). I'm at a loss at this point. I've replaced almost everything in this box and it's still failing. I go down the path of troubleshooting drivers, even installing different operating systems (I always wanted a Myth box!). Nothing resolves the issue.

Suddenly it occurs to me that the drives only fail when all three are plugged in to power. They didn't even need to be plugged into the motherboard, just power. It dawns on me that I have a dead PSU. Well, not actually dead, but failing under heavy load conditions.
Something I had failed to notice is that while BeyondTV is recompressing video and I'm simultaneously watching video, my system becomes totally unstable. Similarly in Linux when I'm running the graphics test and load testing, the system becomes unstable.
Under idle or even minor load, the system is fine (mostly).

Back to Newegg, this time I purchased a very overpriced Pure80 600W power supply.

Now, there's a twist here that I completely missed when I installed the system the first time. The Antec power supply is located in the back/right side of the case (as is typical). When the case is positioned in my stand, the back right part of the case sits immediately above a heat register which during the summer blows cool air and a lot of dust, but in the winter blows a lot of piping hot air. It's a credit to this particular power supply that it didn't die any earlier. I have since blocked off this particular vent to prevent any airflow or dust.

Monday, September 3, 2007

Blank Screen after Setup is Inspecting your computer's hardware configuration in Windows XP

It's been a little while since I've been able to write, but this one hit me again this weekend, and again I spent hours trying to figure out what was going on.

The symptom is this:
Blank Screen occurs when "Setup is Inspecting your Computer's Hardware Configuration"
or
Blank Screen before Setup even starts.
Further, waiting it out does nothing. The drive eventually spins down and the PC is unresponsive. Of course, the monitor never goes to DPMS mode, so it looks like the video card is still receiving a signal.

This saga was part of my other adventure.

The cause of the black screen is simple:
Windows XPs setup utility cannot properly read the hard disk partition tables. I'm not sure if it chokes on every linux installation, or the specific LVM setup I had done with Fedora 7, but the hang immediately after setup starts is usually always unreadable partition tables.

A couple of fixes to try:
Unplug every hard drive and USB, Compact Flash, SD, or other "hard drive like" device, except for the one you intend to boot from and install the operating system to. Try to rerun setup. If successful, plug in a drive at a time after Windows is installed and updated.
If that doesn't work, you still have options, but the only ones I can present to you will cause your data to be destroyed, so here's hoping you have a backup.
  1. Download Knoppix (or a Linux based live CD that comes ... at least ... with fdisk).
  2. Burn the CD/DVD on another computer.
  3. Boot Knoppix ... wait.
  4. When the GUI comes up (or perhaps "if" the GUI comes up), hit CTRL+ALT+F2, this will get you to a "root shell" in text mode. (The reason I prefer this route is that it doesn't require a working mouse, which I didn't have since Knoppix couldn't initialize it)
  5. type "fdisk /dev/sda", if the hard disk you're dealing with is SCSI or SATA, otherwise type "fdisk /dev/hda" if the hard disk you're dealing with is IDE/EIDE/PATA.
  6. type "d" (for delete), hit enter.
  7. If you have one partition, type "w" (writes out the partition table), and shut down. If you have more than one, type a partition number and repeat until all partitions are gone. You can verify that all partitions are deleted by hitting "p", and seeing if any show up. Note that after you hit "w", you're going to lose all of the data on the partitions that you have deleted.
  8. Rerun Setup.
If you're comfortable with partitioning, skip step 6 and start by hitting "p". This will list out all partitions on that drive. You may find a particular partition that looks suspect. Try deleting the suspect partition and leaving the ones that look good, then rerun setup.

Of course, you've booted to a Knoppix CD, so you might try using some of the tools that are included with Knoppix to diagnose your problems, recover some of your data and copy it to another computer or drive, or do any other number of recovery/hardware tests. In my case, the data was already gone, so wiping out the partitions was an easy choice.