Wednesday, April 21, 2010

More ZFS awesomeness!

I recently had the "privilege" of getting to interact with my ZFS server at an administrative level. I use my ZFS server as a file server. I access it from my Apple, Linux, and Solaris hosts through NFS. I also access it from my windows hosts through CIFS. CIFS is Sun's implementation of the Microsoft SMB service. There is a known bug in OpenSolaris 2009.06 where the SMB service can hang, and the only way to fix it is to reboot. Think of that irony! How do you cause a unix server to reboot, install a service to support windows. The bug is fixed in OpenSolaris next, but with the Oracle acquisition of Sun, who knows when that is going to see the light of day.

Back to the privilege I was talking about. My SMB service had hung yet again. It honestly does not happen that often. Down to the server room I went. Shutdown my running virtual box VM's, and issued the reboot command. The server did not reboot, which was odd. Tried a shutdown command and the server hung. This was starting to get weird it had never done that before. At this point I had no choice but to power cycle and of course the server did not come back up. The screen was full of errors that a quick google led me to the worse case scenario, my disk was blown.

Now I know many of you have heard me beat the drum about bad disks. It is why I built the ZFS server in the first place! My complaint has been about my Western Digital drives that fail just after they turn a year old and are out of warranty. When I cracked the case and pulled my OS drive, I saw that the manufacture date was 21Nov2003. I bought the computer in December of 2003 and that drive has been on-line pretty much 24x7 since then. One really can't complain about a HD that ran 24x7 for over 6 years! The drive was a Maxtor which was bought by Seagate. BTW I only by Seagate HD now, and well I have not had one crash since leaving Western Digital.

I just happened to have another SATA disk laying around, doesn't everyone? I popped it in and re-installed OpenSolaris. The whole time from crash, to OS disk swap, OS install, to running again was less than an hour. While I was back up and running, what about my data?

My data is stored on a separate ZFS pool. I had been wanting to test a zfs import and export, and well it looks like I was getting the chance. I ran the import command, shown below, and zfs saw my zfs raid with all of the disks.

#zpool import

pool: tweetie

id: 16215195261796119547

state: ONLINE

status: The pool was last accessed by another system.

action: The pool can be imported using its name or numeric identifier and

the '-f' flag.

   see: http://www.sun.com/msg/ZFS-8000-EY

config:


tweetie ONLINE

raidz1 ONLINE

c7d0 ONLINE

c7d1 ONLINE

c8d0 ONLINE

c8d1 ONLINE


Next up I ran the zpool import command with the -f flag to force the import. Note if I had not crashed and had exported the file system this would not have been necessary, but hey who is complaining?

# zpool import -f tweetie

And that was it. My file server was back on-line and ready to roll!