Didier Godefroy
2009-05-13 20:26:08 UTC
Hi all,
I posted earlier about my lsm issue and now it's turning into a nightmare
with an advfs problem on top of it.
What happened in the first place was a crash when I deleted a big folder
that contained a corrupt file. I had accidently found that corrupt file, it
wouldn't show up with ls and would cause an error when doing du on the
contents of the directory.
I deleted that big folder (over 600mb) and when that corrupted file was hit,
it caused a system crash.
The system wouldn't come back because fsck would cause the crash again every
time it booted, causing a loop.
I removed a drive from the mirrored set containing the volume with that
corrupt file, which caused the lsm volume to be disabled and the mirror
relocation to the hot spare but allowed the boot to succeed.
Then the problem was that that volume wouldn't resync properly and an other
crash happened before syncing was done.
I then had a mirror with a stale plex and the volume would stay disabled and
I couldn't clear that status.
After plugging the removed drive back in, the system was seeing it, but not
lsm, so it needed a reboot. Before rebooting again, I use scu and ran a
complete surface scan of all drives and they're all fine.
So I rebooted again with all the drives back in normally, the system booted
fine, the disabled volume was still in that state and the previously removed
drive was back in lsm normally, although it wasn't put back into place
because everything that used to be in it was relocated to the hot spare.
However I was able to bring back the biggest plex from that drive by adding
it back to the mirror set, having a mirror set with 3 plexes.
I was then able to change the state of the plexes so the volume could get a
needsync and I triggered the resync and it was syncing fine.
And that's where the advfs problem showed up again, when I tried mounting
that volume back on, the system crashed again and I had to do the drive
removal again to break out of the boot loop.
Once rebooted, I had the volume syncing going on again, except on that
failed volume with the advfs issue and the removed plex from that pulled out
drive.
That mirror set on the 2 drives has 3 lsm volumes, 2 small ones and the
large one. The two small ones resync just fine, only the large one won't
resync by itself.
The big problem now is with the advfs domain that's supposed to be on that
large volume. While the resyncing was still going on, I tried doing a
showfdmn on it and there was an other crash.
Basically every time I try to access that advfs domain in any way, wether by
trying to mount it, looking up some info on it, or when the lsm syncing is
finished on it and then it tries to mount, I get a crash.
I don't think this is advfs domain panic, I think that wouldn't bring the
system down entirely and it would keep everything else running.
The system just crashes every time the advfs data is "touched" in any way.
Now how could I get that fixed and back under control if it crashes without
provocation???
That system has now been down for more than half a day and it's getting
critical.
Is it possible to clean up and repair advfs corruption without causing the
system crashes?????
Help please,
I posted earlier about my lsm issue and now it's turning into a nightmare
with an advfs problem on top of it.
What happened in the first place was a crash when I deleted a big folder
that contained a corrupt file. I had accidently found that corrupt file, it
wouldn't show up with ls and would cause an error when doing du on the
contents of the directory.
I deleted that big folder (over 600mb) and when that corrupted file was hit,
it caused a system crash.
The system wouldn't come back because fsck would cause the crash again every
time it booted, causing a loop.
I removed a drive from the mirrored set containing the volume with that
corrupt file, which caused the lsm volume to be disabled and the mirror
relocation to the hot spare but allowed the boot to succeed.
Then the problem was that that volume wouldn't resync properly and an other
crash happened before syncing was done.
I then had a mirror with a stale plex and the volume would stay disabled and
I couldn't clear that status.
After plugging the removed drive back in, the system was seeing it, but not
lsm, so it needed a reboot. Before rebooting again, I use scu and ran a
complete surface scan of all drives and they're all fine.
So I rebooted again with all the drives back in normally, the system booted
fine, the disabled volume was still in that state and the previously removed
drive was back in lsm normally, although it wasn't put back into place
because everything that used to be in it was relocated to the hot spare.
However I was able to bring back the biggest plex from that drive by adding
it back to the mirror set, having a mirror set with 3 plexes.
I was then able to change the state of the plexes so the volume could get a
needsync and I triggered the resync and it was syncing fine.
And that's where the advfs problem showed up again, when I tried mounting
that volume back on, the system crashed again and I had to do the drive
removal again to break out of the boot loop.
Once rebooted, I had the volume syncing going on again, except on that
failed volume with the advfs issue and the removed plex from that pulled out
drive.
That mirror set on the 2 drives has 3 lsm volumes, 2 small ones and the
large one. The two small ones resync just fine, only the large one won't
resync by itself.
The big problem now is with the advfs domain that's supposed to be on that
large volume. While the resyncing was still going on, I tried doing a
showfdmn on it and there was an other crash.
Basically every time I try to access that advfs domain in any way, wether by
trying to mount it, looking up some info on it, or when the lsm syncing is
finished on it and then it tries to mount, I get a crash.
I don't think this is advfs domain panic, I think that wouldn't bring the
system down entirely and it would keep everything else running.
The system just crashes every time the advfs data is "touched" in any way.
Now how could I get that fixed and back under control if it crashes without
provocation???
That system has now been down for more than half a day and it's getting
critical.
Is it possible to clean up and repair advfs corruption without causing the
system crashes?????
Help please,
--
Didier Godefroy
mailto:***@ulysium.net
Didier Godefroy
mailto:***@ulysium.net