Setup Hardware RAID on a T2000
Tuesday, August 5th, 2008Today I did some testing with a Sun Fire T2000. I have always used the default Solaris Disksuite to setup software RAID partitions and was very curious to see how the hardware raid works on a T2000.
We start of by booting to single user mode from the Solaris DVD so on the OK prompt we do:
ok boot cdrom -s
When booted from cdrom we type raidctl to see what we have.
# raidctl
Controller: 0
Disk: 0.0.0
Disk: 0.1.0
Ok now lets try to setup the mirror
# raidctl -c c0t0d0 c0t1d0
Creating RAID volume will destroy all data on spare space of member disks, proceed (yes/no)? y
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk 0 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk 1 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk (target 1) is |out of sync||online|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||degraded|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||resyncing||degraded|
Volume c0t0d0 is created successfully!
It seems that the mirror is created, lets check this:
# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A SYNC N/A RAID1
0.0.0 136.6G GOOD
0.1.0 136.6G GOOD
Ok so the new volume is c0t0d0 and its currently syncing. Unfortunately it is not possible to view the percentage that has been synced or any other indicator how far the sync is.
Do not forget that we now have to label our new volume!
#format (ignore warnings)
label
Now start the Solaris installation. You will notice that the installation will only find 1 disk, c0t0d0 which is our RAID partition.
When the installation is finished and the system booted we check the status of our mirror with:
# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A OPTIMAL N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD
Ok, our RAID volume is looking good now, lets do some tests with removing disks.
Lets start the test by removing disk0 and try if we can still boot (from disk1)
/var/adm/messages now shows us the following:
Aug 4 08:53:05 tst01 Physical disk (target 0) is |missing|
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 Physical disk (target 0) is |out of sync||missing|
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 One or more disks on volume 0 changed.
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 One or more disks on volume 0 changed.
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 Volume 0 is |enabled||degraded|
Aug 4 08:53:09 tst01 SC Alert: [ID 209909 daemon.error] DISK at HDD0 has been removed
So it detected our disk has been removed. Lets see what raidctl has got to say about that.
# raidctl
Device record is invalid
I must say I am a little bit suprised that it complains about the Device record being invalid but ok. Lets reboot the system and see if it can boot without any problems.
Rebooting with command: boot
Boot device: disk File and args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-T200/ufsboot
Loading: /platform/sun4v/ufsboot
SunOS Release 5.10 Version Generic_120011-14 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
os-io WARNING: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is degraded
After the reboot, lets check the status of our RAID volume:
# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A DEGRADED N/A RAID1
0.1.0 136.6G GOOD
N/A 136.6G FAILED
Thats looks better. As you can see our first disk has the status failed. Let’s put disk0 back and see if we can resync the volume.
Following messages in /var/adm/messages
SC Alert: DISK at HDD0 has been inserted.
Aug 4 08:59:55 tst01 scsi: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:59:55 tst01 Physical disk (target 0) is |out of sync||online|
Aug 4 08:59:55 tst01 scsi: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:59:55 tst01 Volume 0 is |enabled||resyncing||degraded|
It detected our disk has been reinserted. Lets see what happens with our volume:
# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A SYNC N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD
As you can see, after reinserting the disk, the volume started syncing automatically. I tested the same without rebooting the system because it should be hotpluggable. When removing the disk for a view seconds and reinserting it without a reboot the system starts syncing by itself just fine, so there is no need to reboot it.
My conclusion is that the hardware RAID functionality of the T2000 is a lot more easy then using the default Solaris Disksuite. I did a lot of RAID configurations in the past and have seen disksuite eating up our configurations a view times because of some minor mistakes in our setup so this is really a nice feature of the T2000 (most new Sparc systems has this feature by default).