Archive for the ‘Unix / Linux’ Category

Setup Hardware RAID on a T2000

Tuesday, August 5th, 2008

Today I did some testing with a Sun Fire T2000. I have always used the default Solaris Disksuite to setup software RAID partitions and was very curious to see how the hardware raid works on a T2000.

We start of by booting to single user mode from the Solaris DVD so on the OK prompt we do:

ok boot cdrom -s

When booted from cdrom we type raidctl to see what we have.

# raidctl
Controller: 0
Disk: 0.0.0
Disk: 0.1.0

Ok now lets try to setup the mirror

# raidctl -c c0t0d0 c0t1d0
Creating RAID volume will destroy all data on spare space of member disks, proceed (yes/no)? y
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk 0 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk 1 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk (target 1) is |out of sync||online|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||degraded|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||resyncing||degraded|
Volume c0t0d0 is created successfully!

It seems that the mirror is created, lets check this:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A SYNC N/A RAID1
0.0.0 136.6G GOOD
0.1.0 136.6G GOOD

Ok so the new volume is c0t0d0 and its currently syncing. Unfortunately it is not possible to view the percentage that has been synced or any other indicator how far the sync is.

Do not forget that we now have to label our new volume!

#format (ignore warnings)
label

Now start the Solaris installation. You will notice that the installation will only find 1 disk, c0t0d0 which is our RAID partition.

When the installation is finished and the system booted we check the status of our mirror with:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A OPTIMAL N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD

Ok, our RAID volume is looking good now, lets do some tests with removing disks.

Lets start the test by removing disk0 and try if we can still boot (from disk1)

/var/adm/messages now shows us the following:

Aug 4 08:53:05 tst01 Physical disk (target 0) is |missing|
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 Physical disk (target 0) is |out of sync||missing|
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 One or more disks on volume 0 changed.
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 One or more disks on volume 0 changed.
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 Volume 0 is |enabled||degraded|
Aug 4 08:53:09 tst01 SC Alert: [ID 209909 daemon.error] DISK at HDD0 has been removed

So it detected our disk has been removed. Lets see what raidctl has got to say about that.

# raidctl
Device record is invalid

I must say I am a little bit suprised that it complains about the Device record being invalid but ok. Lets reboot the system and see if it can boot without any problems.

Rebooting with command: boot
Boot device: disk File and args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-T200/ufsboot
Loading: /platform/sun4v/ufsboot
SunOS Release 5.10 Version Generic_120011-14 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
os-io WARNING: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is degraded

After the reboot, lets check the status of our RAID volume:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A DEGRADED N/A RAID1
0.1.0 136.6G GOOD
N/A 136.6G FAILED

Thats looks better. As you can see our first disk has the status failed. Let’s put disk0 back and see if we can resync the volume.

Following messages in /var/adm/messages
SC Alert: DISK at HDD0 has been inserted.
Aug 4 08:59:55 tst01 scsi: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:59:55 tst01 Physical disk (target 0) is |out of sync||online|
Aug 4 08:59:55 tst01 scsi: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:59:55 tst01 Volume 0 is |enabled||resyncing||degraded|

It detected our disk has been reinserted. Lets see what happens with our volume:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A SYNC N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD

As you can see, after reinserting the disk, the volume started syncing automatically. I tested the same without rebooting the system because it should be hotpluggable. When removing the disk for a view seconds and reinserting it without a reboot the system starts syncing by itself just fine, so there is no need to reboot it.

My conclusion is that the hardware RAID functionality of the T2000 is a lot more easy then using the default Solaris Disksuite. I did a lot of RAID configurations in the past and have seen disksuite eating up our configurations a view times because of some minor mistakes in our setup so this is really a nice feature of the T2000 (most new Sparc systems has this feature by default).

No Adobe Acrobat Reader for Solaris X86

Friday, August 31st, 2007

As I am testdriving an x2100-m2 (1 cpu, 2GB RAM) with the sunray server software to see how good it runs compared with the E4500 (8 cpu’s, 8GB RAM) I ran into a major problem. Adobe only has a Solaris Sparc version of there Acrobat Reader program, there is NO x86 version availlable.

I would like to request everyone who would like to see Acrobat Reader for Solaris X86 to leave a not at the feature request form from adobe.
ps. the performance of the x2100-M2 with the sunray server software is amazing so far. It would save as huge electricity costs and gain a lot of performance.

Solaris 10 Kernel Corruption

Wednesday, August 29th, 2007

Last week we wanted to see how much amps our Sun E4500 was using. Its got 8 400Mhz Sparc cpu’s and 8GB RAM. I had to should down the server and put a device between it to measure the power. Turned out that he was using about 3.5 Ampere when just being idle.

After removing the device and rebooting the server, this server didn’t come up anymore and gave the following error:

Boot device: sol10  File and args:
not found: rtt_ctx_end
not found: rtt_ctx_end
not found: rtt_ctx_end
not found: rtt_ctx_end
not found: rtt_ctx_start
not found: rtt_ctx_start
not found: rtt_ctx_start
not found: rtt_ctx_start
do_relocations: /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-II do_relocate failed
krtld: error during initial load/link phase
panic - boot: exitto64 returned from client program
Program terminated

So it seems the kernel got a corrupted file (/platform/sun4u/kernel/cpu/sparcv9/
SUNW,UltraSPARC-II). I booted from CD and checked the above file. It did excist. I checked witch checksum and compared it to a file from another sparc server. It seemed that the checksum was different but the filesize the same, so I copied it from the other sparc server and replaced this file, however it still didn’t work.

In the end I ran an install from cdrom and choose an upgrade installation from the install menu. The upgrade took about 5 hours but fixed everything. All my configurations kepped unchanged (SunRay Server, Metadb’s etc.). After the upgrade everyone was able to run on this server again.

Testing Fileserver, storage & DLT S4

Wednesday, August 15th, 2007

Today I spend most of my day building a test environment. The test environment has 1 Sun Fire X2100-M2 with a Sun StorageTek SCSI Adapter connected to a Promise VTrak M210 with 8 x 320GB configured with Hardware RAID6.
For networking a Cisco ASA 5505 is used (because of its VPN functionality) and a 3Com 24 x 10/100/1000 switch.

The X2100-M2 has 4 NIC’s. 2 NIC’s are configured with aggregation, 1 is used for the Service Processor and the other one is not used.

All seems to be working, solaris x64 is installed on the x2100-M2. The VTrak works perfectly and shows a 1.76TB RAID6 partition which is perfectly mountable with Solaris 10. Tomorrow all will be installed at the client on location and will be the heart of the network which consists of severel Mac workstations.

Server Crash

Thursday, May 3rd, 2007

Have’nt written much in my blog lately because I have been to busy with work. The provider where we placed our 19″ rack and provides us with electriciy and bandwith for our hosting platform has decided to upgrade the power facilities. As a cause of this, they would take the power off our rack at the night from 24/25 April 2007, at 1.50AM.

After shutting down all servers at night and turning them back on again, a problem occured with one of our webservers. It halted at the boot loader screen (Actually just before the daemon drawing). It seems as if the screen was not showing anything from that moment on but the server was still booting (FreeBSD5.3). After a while everything seemed to be working but it appears as if the server was started in single user mode. All services where down accept ssh. After starting most manually we tried to dig into the problem.

My collegue bayu from Indonesia started recompiling userland and I waited for that, after it was recompilled and rebooted, the screen still did not show the startup procedure however, after a while I was getting a login prompt. I went home as it was already 6:00 AM and went straight to bed when I got home, 6:35 AM.

At 9.00 I got a wake up call from my boss saying that the server was not reachable anymore. I went downstairs to check from my home pc and it was correct, nothing was working anymore and we where not able to login to the server, however it was still pingable. I went to the datacenter
Again the screen stopped just before the daemon should appear and I couldn’t do much else then booting from the cd-rom. Bayu told me the last thing he did was reinstalling exim because of some problems we had earlier with it.

We where able to boot into fix-it mode however nothing really helped. We took out the 2nd HD (RAID1) so that we had a copy and did a fsck on the first disk. It fixed some errors, tried booting it but unsuccesfull. We tried copying back an old kernel, but none worked. We decided to make a spare backup and where able to mount a drive from our backupserver using NFS, from fixit mode.

One very annoying problem was that fixit mode keeps giving errors (Sudden commands stop working and give a segmentation fault, core dump) once in a while and then had to be rebooted again and again and again.

It took very long to fix this, I went back home the next morning arround 10 AM, straight for bed while our guys from indonesia keept working on restoring the backups that I made on a different machine (We had to switch to a spare machine with debian 3.1). The complexity is that the old machine was BSD and the new one Linux, which might cause some compatibility issues.

At night we had everything back up and running, after that I have been busy fixing small issues with mod_evasive, firehol etc.

Evolution Problems

Thursday, April 12th, 2007

Using Ximian Evolution (standard solaris10 version) on our Sun Ray’s has caused a lot of weird problems. Sometimes causing errors saying it got disconnected from the server, while the server is perfectly reachable. In the past 1 user had to click on the evolution icon about 10 times before evolution really started (Had to throw away all temp files from evo from his home dir, and setup evolution again).

Today one of our users had a problem forwarding a mail as attachment. It hung on sending. When Switching forward as attachment to forward inline it worked fine. I tried setting up Thunderbird and forwarded it as attachment from thunderbird, that went fine as well so somehow evolution was having a problem with this.

Wondering when Sun will upgrade Ximian evolution 1.4.x to 2.x