Linux
Fedora 9 to 12 – Disk Partitions Issue?
by balleman on Dec.06, 2009, under Linux
Fedora 9 to 12
Over the past few days I’ve been upgrading my file/media server from Fedora 9 to Fedora 12. I did this with yum, upgrading from 9 to 10, 10 to 11, and 11 to 12 incrementally, removing conflicts as necessary. This actually went surprisingly well, and at the end, with just one reboot, I had gone from 9 to 12.
It may have booted, but there were a number of things I had to fix afterwards:
- The kernel video mode setting had to be disabled to not break the NVIDIA binary driver (“blacklist nouveau” somewhere in /etc/modprobe.d/). Apparently this is supposed to be handled by installing rpmfusion’s kmod package, but that wasn’t the case for me.
- The X server would crash immediately upon a login. I eventually figured this out to be a missing gnome-session-xsession package. I’m not sure if this is something I had removed for dependency reasons earlier, or if it was split from another package at some point and yum missed it. Either way, it was a real pain to debug, but easy to fix.
- The kernel would not recognize the partitions on three of my disks. This was a major pain, and the main focus of this article.
- I don’t like MythTV 0.22’s new video gallery
Fedora 12 (2.6.31.6-145.fc12.x86_64) Disk Partitions Issue?
So… sdb, sdc, and sde each have a single partition on them, but the kernel (per /proc/partitions and other means) would only report block devices for sdb, sdc, and sde not sdb1, sdc1, and sde1 as should have additionally existed. Naturally, the first thing to consult would be dmesg:
sda: sda1 sda2 sda3 sdb: sdc: sdb1 sdc1 sdd: sde: sdd1 sdd2 sdd3 sde1 sdf: sdf1
At first glance, this looks really, really bad. sdb1 existing on sdc? That’s not supposed to happen. But after looking at it further, and having had some experience debugging multi-threaded things, I became convinced this was the mangled output of multiple parallel partition discovery processes. If that were the case, it looked like it should have been successful, but was not.
So, is this what happened? Googling turned up a bit on the so-called “fastboot” patches to the Linux kernel, at least portions of which have been accepted into recent mainline kernels. Supposedly these would only be enabled with the “fastboot” kernel parameter, but searching the source and docs for the latest kernel didn’t turn up this option. The async libata device discovery does indeed appear to be in 2.6.31 mainline, and I was unable to find a knob to turn it off. There were also references to this interfering with partition discovery. I started the process of rebuilding the kernel to disable this, to see if it fixed the problem, but gave up in favor of a workaround. I’m not fond of maintaining custom kernels – I think the last I did this was to support a boca card.
The workaround. I had noticed that the kernel could be instructed to reread the partition tables (partprobe, for instance) and the missing partitions would appear. I threw in a quick init script to do this and assemble the array early in the startup process:
mdadm --stop /dev/md2 sleep 1 partprobe sleep 2 mdadm --assemble --scan /dev/md2 vgchange -ay mount /storage
So… if I have time, I should complete that kernel rebuild and report this somewhere. In the meantime, I’m posting this for the benefit of others. Lucky for me the partitions affected did not contain my root partition, or this could have been less-work-aroundable.
MythVideo Thumbnails
by balleman on Jun.19, 2009, under Linux
After having been embarrased by seeing Windows Media Center have thumbnails of all video content, I found this script to populate the thumbails in MythVideo from within the video. A lot more useful for me than the manual IMDB lookup process that I think MythVideo has natively. Definetly spruces up the MythTV box!
PVR remote upgrade
by balleman on May.26, 2009, under Linux
I’ve had a Logitech Harmony 550 for a few months now, but I’ve only used it to replace the remote for the TV and receiver up to this point. More recently, but still awhile ago, I bought a serial IR receiver (yeah, should have make it myself, but was lazy). This evening I finally put the pieces together and got MythTV working with it. Fortunately, someone already has a decent remote template and config files for the Harmony which I put to use. Between the Harmony and the receiver, it works at almost any angle, so I’m not losing as much flexibiltiy from my RF-USB remote as I thought.
Brute-Force WoL
by balleman on Mar.21, 2009, under Linux
For some time now, wake-on-LAN has not been working on my home server PC. I had been using WoL to wake the machine via a cron job on my firewall before I got home from work, and also to power the machine on remotely if I needed to get a file or such. I had guessed a kernel change was to blame, but several updates since have not resolved, and my search for related bugs only turned up advice that didn’t work. So, I threw in an Intel e1000 and cabled it up (using the on-board for most purposes, as the e1000 is only PCI), and WoL “just works” with the e1000. FWIW, the on-board is an Nvidia MCP55. Problem solved… or at least worked around.
PVR Booting with LCD off
by balleman on Feb.10, 2009, under Linux
When things are working well, my PVR box is supposed to wake-on-LAN about the time I get home, and be ready for use. Since the LCD, the box wouldn’t be driving the display after boot. I would have to kill and restart X with the LCD powered on. Apparently this is due the NVIDIA driver needing to read the EDID from the LCD to figure out what resolution to use. I tried a variety of ways forcing it to work without validating the resolution, but I ended up downloading the EDID to a file (using the nvidia-settings gui) and having the driver run against that. Not that inelegant, I think.
Section "Device" Identifier "Videocard0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce 7300 GS" Option "UseEvents" "True" Option "CustomEDID" "DFP-0: /var/lib/mythtv/edid.bin" Option "ConnectedMonitor" "DFP" Option "MetaModes" "DFP: 1920x1080" EndSection
Storage
by balleman on Mar.17, 2007, under Computers, Linux
Warning – long and boring. This is
as much for my reference than anything else.
It started Friday – several weeks
ago. In the evening, one of my drives, a 250GB SATA, threw some
errors. The RAID5 wasn’t terribly concerned, it corrected the reads
and was happy. There were probably less than a dozen errors, and it
didn’t kick the drive from the array. I made a note of it, but
didn’t bother kicking it out manually.
Early Saturday, a 400GB drive dropped
out of the array. This drive does this from time to time, going
utterly unresponsive but fine upon reboot. I re-added it to the
array, and it began to sync up. Chris and I went to the new Circuit
City in Chambersburg, as he needed to get a power supply for
debugging a box lockup issue. I decided to buy one of those
new-fangled DVD burner thingies, as it was probably about time I had
one. Upon getting home, my array was not happy. I had 3 active
members on a 5 device RAID5. Rebuilding the 400GB had sent the
ailing 250GB over the edge, kicking them both out of the array. It’s
a curious thing to see in /proc/mdstat. The metadevice stayed
active, but degraded. Ext3 freaked out and dropped to read-only. I
really would have expected the metadevice to deactivate under those
conditions… or better yet, be very reluctant to kick a drive from
an already degraded array. If only I had kicked the 250GB manually,
this would have been a bit less stressful. So, then the contingency
planning starts. Do I force the array back together, and try
resyncing the 400 again? Will the 250 be so badly corrupted that it
makes more sense to force the mostly-current 400 back in the array
instead of the 250? Should I dd the 250 to another drive, since dd
should at least keep going instead of giving up on the errored
sectors? Not pleasant thoughts or options. SMART data was
indicating that the temperature of the drive was over 60C–hotter
than the box’s CPU. I moved it to another machine for diagnostics,
which didn’t turn up anything. The Hardware_ECC_Recovered was
varying rapidly (not that that necessarily means anything…), so I
decided it was time to be replaced. I ordered a 500G (WD5000YS) and
another Promise SATA-II TX4 PCI card from Newegg. Later that night,
I put the 250G back in the box and tried the resync again. I watched
the resync all night (something like 4am), waiting for it to either
fail, or complete. I wanted to boot the 250G from the array at
completion, so this wouldn’t happen again. Yes, I could have and
should have scripted it. I was worried about my data! The resync
completed successfully with no errors. Seems the 250G was much
happier after it had flagged its bad sectors.
On Sunday, I really couldn’t do
anything about the array, so I started down the second storage path
of death for the week: the DVD-R drive. I installed it in my
desktop, fired up k3b, burned a backup DVD of several years of
photos, and it seemed fine. But I could mount it anywhere. Turns
out that it (k3b and/or growisofs) wants to burn DVD+Rs as unclosed
multi-session discs. Fine. Turned that off, and burned myself
another one. It was fine. It was nice to have something work for
once.
On Monday evening, feeling lucky from
the day before, I tried burning some more photos to DVD, but it was
not to be. IDE errors would start spewing into dmesg, growisofs
(which had elevated itself to a nice of -20) began consuming the
entire machine, making it unusable. I tried different speed
settings, just about any option k3b had to offer. I moved the IDE
cable to a different controller, tried changing cables, anything…
DMA settings, I looked for firmware, but the thing is a no-name OEM
drive probably originally from Lite-On, but their firmware won’t load
on it, and the site supposedly having the firmware genericizer was
down. Of course, I gave up at some point and burned something in
Windows which was fine… ARRGH!
Tuesday was supposed to be the day of
productivity. The new drive and controller arrived, and I installed
them. I spent a little time tooling the partition table and began
the resync. The mirror resync’d very quickly at 30-40MB/s. The
RAID5 resync stayed around 27MB/s when the system was idle, but
dropped considerably otherwise. The old setup would only resync
around 20MB/s, and was otherwise usable. But at 27MB/s, the system
crawled, yet wasn’t using up 100% CPU. I think this is the surreal
PCI bus exhaustion experience… 27*5=135, and 133 is the maximum for
a 33MHZ, 32-bit, PCI bus. But many of my PCI devices (including the
northbridge), are 66MHz capable, and from what I’ve read, 33MHz
devices shouldn’t be holding back the 66MHz ones entirely, but I
couldn’t find out how to test/debug this further. Later I found out
that the 66MHz-capable bit doesn’t mean very much, and what you
really need is a 66MHz-capable PCI bus – which mine isn’t. Myth
wasn’t happy about this, as ivtv wasn’t getting read from fast
enough. The system otherwise felt very sluggish. I left the box up
to resync overnight. I fought with the DVD drive some more, too.
Wednesday morning, I checked on the
status of the resync. But the box had locked up. I rebooted it and
checked the logs. The resync did complete, but sometime later, there
was an unhandled interrupt on the IRQ shared between a SATA
controller and the video card. Linux then disabled the IRQ, causing
all of the drives to fall out incrementally. I brought the box back
up, and had to force the array to be “clean” so that it would
re-assemble (echo clean > /sys/block/md0/md/array_state … there
appears to be no mdadm equivalent of this action. And you have to
write something to the device then for the superblocks to get
updated.) I eventually and experimentally determined that running
SMART commands on the new 500G drive is what causes the unhandled
interrupt. It takes time for the problem to manifest though -
maybe it’s a race condition somewhere. I haven’t found anything in
the kernel mailing lists about this, so I will have to research
further and maybe post about it.
As far as the DVD drive, I tried
different media with equally mixed results. I eventually returned it
to Circuit City and bought a better and cheaper Samsung from Newegg.
It seems to work much better… no spewing of IDE messages. I almost
made myself buy a SATA one, but I didn’t want to buy yet another SATA
controller and risk more problems with compatibility.
And the 250G seems fine now, so it
didn’t get tossed either. Sigh.
PVR-500 w/ Samsung tuners – FC5 to FC6
by balleman on Feb.13, 2007, under Linux
When I purchased and installed my PVR-500 this fall for MythTV stuff, it had very poor signal quality. I attributed this to my cable provider, and bought a ridiculous +24dB amplifier from WalMart to rectify the problem, which it did. After upgrading the box from Fedora Core 5 to Fedora Core 6, the reception on the box was awful. I confirmed that the change in kernel from 2.6.17 to 2.6.19, or the accompanying changes to the ivtv driver were the cause. After some research, it turns out that I have an “evil” Samsung tuner card, and that in 2.6.17 the internal amplifier on the tuner is not activated by the Video for Linux drivers. So, my original amplifier purchase was to compensate for a software problem. After removing the amplifier (not just turning it off as I had been foolishly trying as a test), the reception was mostly better. Some channels are better than before, but some are worse, and overall I think this was not a good change. However, there is no software setting for enabling/disabling the internal amp (apparently no good reason to turn it off), so I’m going to go with the internal amp instead of maintaining a custom kernel on the box, which is always a pain.
Frustrating Day
by balleman on Jul.12, 2006, under Happenings, Linux
Unfortunately, it seems that anything requiring cooperation and coordination is hard to do. I spent the day investigating a likely impossible solution to a problem that doesn’t end up needing solved. While at the same time still not having the ability to work on the problem that really needs solved. And no documentation or plan has been given to me other than 3rd party verbal snippets to the overarching project responsible for the whole situation. Sigh.
Luckily, other than this blog, I’ve been pretty successful at not letting the day ruin my evening. I got to visit with Chris a bit, following his trip to Illinois.
And one more note… recent kernel upgrades for FC5 require a new firmware image for the ipw2200 wireless driver, so remember that.
Computer Stuff
by balleman on Mar.18, 2006, under Computers, Linux
The storage capacity upgrade and RAID5-ification has been completed, following a week of computers and their components strewn throughout the house. Having a RAID5 include a linear md as a component was a bit challenging… had to make the kernel not try to assemble the RAID5 automagically, but wait for the mdadm.conf to do it. Unfortunately, that wasn’t the end of the computer fun this week. Chris had a drive fail in his firewall, and the machine employed various means to make it impossible to install an OS on another drive. Still not sure what its problems are, having spent the afternoon in futile efforts to fix it.
[balleman@oak ~]$ df -h /storage
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/storage0-lstorage0
1.5T 739G 729G 51% /storage
Updates
by balleman on Dec.04, 2005, under Happenings, Linux, Networking
No structure here… just some random goings-on:
For the last several kernel updates for FC4, my DVD sharing using GNBD hasn’t worked. I guess those special ioctl()s aren’t getting translated over or something. And NFS or SMB sharing an encrypted DVD just doesn’t do anything good at all (ignoring the fact that NFS seems really sucky with the latest FC4 updates). So, after months of not being able to watch DVDs, I gave up, and bought a USB drive cage to attach a DVD drive to Oak. Works perfectly. Despite performance issues and cabling evilness, I still can’t completely rule out a stack of USB drives RAIDed as a bulk storage solution, especially with all of the device mapper coolness in Linux. Too early to be thinking of that, though, as the computer storage fund hasn’t matured yet, despite the fact that Oak is at 99% capacity.
As Doug has mentioned, Asterisk and VoIP is still pretty neat. I’ve setup a teliax account, since they have pricing like nufone with a whole lot of rate centers worth of DIDs (they’re essentially a Level3 reseller). So, we just need to get some VPN’ing set up. I’m in desperate need of some UT, and I think VPN might be a useful substitute for a LAN party this winter.
My grandfather (on my Dad’s side) has been in the hospital on and off for more than a month now. Currently he has pneumonia, is very weak, and not entirely coherent. Your prayers would be appreciated for what could be a difficult Christmas season for the family.
Things at Ship are going fairly well. I did shoot myself in the foot with the “ip arp inspection” feature of the Sup720 this week though. Does 15 pps of ARP traffic seem like a good default threshold for shutting down trunk ports to you? Me neither. Of course, I asked that question after two ports had been err-disabled. Hopefully Tim and I will get to do a real test of some VMPS soon, too.