Archive for the ‘My Work’ Category

Setup Hardware RAID on a T2000

Tuesday, August 5th, 2008

Today I did some testing with a Sun Fire T2000. I have always used the default Solaris Disksuite to setup software RAID partitions and was very curious to see how the hardware raid works on a T2000.

We start of by booting to single user mode from the Solaris DVD so on the OK prompt we do:

ok boot cdrom -s

When booted from cdrom we type raidctl to see what we have.

# raidctl
Controller: 0
Disk: 0.0.0
Disk: 0.1.0

Ok now lets try to setup the mirror

# raidctl -c c0t0d0 c0t1d0
Creating RAID volume will destroy all data on spare space of member disks, proceed (yes/no)? y
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk 0 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk 1 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 created.
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Physical disk (target 1) is |out of sync||online|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||degraded|
/pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is |enabled||resyncing||degraded|
Volume c0t0d0 is created successfully!

It seems that the mirror is created, lets check this:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A SYNC N/A RAID1
0.0.0 136.6G GOOD
0.1.0 136.6G GOOD

Ok so the new volume is c0t0d0 and its currently syncing. Unfortunately it is not possible to view the percentage that has been synced or any other indicator how far the sync is.

Do not forget that we now have to label our new volume!

#format (ignore warnings)
label

Now start the Solaris installation. You will notice that the installation will only find 1 disk, c0t0d0 which is our RAID partition.

When the installation is finished and the system booted we check the status of our mirror with:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A OPTIMAL N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD

Ok, our RAID volume is looking good now, lets do some tests with removing disks.

Lets start the test by removing disk0 and try if we can still boot (from disk1)

/var/adm/messages now shows us the following:

Aug 4 08:53:05 tst01 Physical disk (target 0) is |missing|
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 Physical disk (target 0) is |out of sync||missing|
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 One or more disks on volume 0 changed.
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 One or more disks on volume 0 changed.
Aug 4 08:53:05 tst01 scsi: [ID 107833 kern.notice] /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:53:05 tst01 Volume 0 is |enabled||degraded|
Aug 4 08:53:09 tst01 SC Alert: [ID 209909 daemon.error] DISK at HDD0 has been removed

So it detected our disk has been removed. Lets see what raidctl has got to say about that.

# raidctl
Device record is invalid

I must say I am a little bit suprised that it complains about the Device record being invalid but ok. Lets reboot the system and see if it can boot without any problems.

Rebooting with command: boot
Boot device: disk File and args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-T200/ufsboot
Loading: /platform/sun4v/ufsboot
SunOS Release 5.10 Version Generic_120011-14 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
os-io WARNING: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Volume 0 is degraded

After the reboot, lets check the status of our RAID volume:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A DEGRADED N/A RAID1
0.1.0 136.6G GOOD
N/A 136.6G FAILED

Thats looks better. As you can see our first disk has the status failed. Let’s put disk0 back and see if we can resync the volume.

Following messages in /var/adm/messages
SC Alert: DISK at HDD0 has been inserted.
Aug 4 08:59:55 tst01 scsi: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:59:55 tst01 Physical disk (target 0) is |out of sync||online|
Aug 4 08:59:55 tst01 scsi: /pci@780/pci@0/pci@9/scsi@0 (mpt0):
Aug 4 08:59:55 tst01 Volume 0 is |enabled||resyncing||degraded|

It detected our disk has been reinserted. Lets see what happens with our volume:

# raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 136.6G N/A SYNC N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD

As you can see, after reinserting the disk, the volume started syncing automatically. I tested the same without rebooting the system because it should be hotpluggable. When removing the disk for a view seconds and reinserting it without a reboot the system starts syncing by itself just fine, so there is no need to reboot it.

My conclusion is that the hardware RAID functionality of the T2000 is a lot more easy then using the default Solaris Disksuite. I did a lot of RAID configurations in the past and have seen disksuite eating up our configurations a view times because of some minor mistakes in our setup so this is really a nice feature of the T2000 (most new Sparc systems has this feature by default).

Searching for text files

Thursday, November 15th, 2007

For communication between a php application and a ruby I thought it might be handy to use simple plain text files with the words seperated by comma. Its kinda like csv. This way a php frontend can create these csv files while my ruby daemon can handle the background process.

In my case I use it to create a webbased installer for some some software. So inside the php frontend we can choose to install a certain software, next php writes it inside a csv file and ruby picks it up, downloads the software and does what is needed.

First we get the daemon part, which looks like:

#!/opt/csw/bin/ruby
#start.rb
require 'rubygems'
require 'daemons'

if (VERSION < "1.8.0")
  printf "Use Ruby 1.8.0 or later, your Ruby version: %s...bye\n",VERSION
  exit(1)
end

Daemons.run('dir.rb')

Now we get the script itself (which I called dir.rb)


#!/usr/bin/env ruby

loop do

 # Define where the log and which dir to scan for csv's
 log = '/export/home/sander/ruby/test/log.txt'
 tmp = '/export/home/sander/ruby/test/mydir'
 counter = 0
 #Opening log file for writing
 f = File.open(log, "a")

 require 'date'

 Dir.foreach tmp do |dir|
        file = tmp+'/'+dir

        if file == tmp+'/.'
        else
                if file == tmp+'/..'
                else
                        counter = 0
                        f.puts(DateTime.now.to_s + ' Processing' +file)
                        counter = counter + 1
                        dataArray= Array.new
                        n = 0
                        File.new(file).each_line do |line|
                                line.chomp!
                                dataArray[n] = line.split(',')
                                n += 1
                        end
                        i=0

                        while(i < dataArray.length)
                                # go to address query start

                                type = dataArray[i][0].to_s
                                program = dataArray[i][1].to_s
                                domain = dataArray[i][2].to_s

                                case type
                                        when "install"
                                                case program
                                                        when "someprogram"
                                                                #do some things to install someprogram on domain
                                                                f.puts(DateTime.now.to_s + ' Installing someprogram')
                                                                require 'net/ftp'
                                                                server = 'hostname'
                                                                ftpuser = 'username'
                                                                ftppass = 'password'
                                                                ftp = Net::FTP.open(server) do |ftp|
                                                                 ftp.login(ftpuser, ftppass)
                                                                 ftp.getbinaryfile(program +'.tar', '/export/home/sander/ruby/bla/dir2/' + program +'.tar')
                                                                 f.puts(DateTime.now.to_s + ' Downloading ' + program +'.tar')
                                                                end
                                                                system('gtar xfC /export/home/sander/ruby/bla/dir2/' + program +'.tar /export/home/sander/ruby/bla/dir2/')
                                                                f.puts(DateTime.now.to_s + ' Extracting ' + program +'.tar')

                                                        when "someotherprogram"
                                                                f.puts(DateTime.now.to_s + ' Installing Someotherprogram')
                                                                #do some things to install someotherprogram on domain
                                                end
                                                #puts "Installing: "+ program +" on: " + domain
                                        when "remove"
                                                f.puts "Removing: "+ program +" on: " + domain
                                end

                                sleep 2
                                i = i + 1
                                sleep 2
                        end
                File.delete(file)
                end
        end
 end
 sleep 2
 f.close
end

For any questions/remarks, please leave a comment.

No Adobe Acrobat Reader for Solaris X86

Friday, August 31st, 2007

As I am testdriving an x2100-m2 (1 cpu, 2GB RAM) with the sunray server software to see how good it runs compared with the E4500 (8 cpu’s, 8GB RAM) I ran into a major problem. Adobe only has a Solaris Sparc version of there Acrobat Reader program, there is NO x86 version availlable.

I would like to request everyone who would like to see Acrobat Reader for Solaris X86 to leave a not at the feature request form from adobe.
ps. the performance of the x2100-M2 with the sunray server software is amazing so far. It would save as huge electricity costs and gain a lot of performance.

Solaris 10 Kernel Corruption

Wednesday, August 29th, 2007

Last week we wanted to see how much amps our Sun E4500 was using. Its got 8 400Mhz Sparc cpu’s and 8GB RAM. I had to should down the server and put a device between it to measure the power. Turned out that he was using about 3.5 Ampere when just being idle.

After removing the device and rebooting the server, this server didn’t come up anymore and gave the following error:

Boot device: sol10  File and args:
not found: rtt_ctx_end
not found: rtt_ctx_end
not found: rtt_ctx_end
not found: rtt_ctx_end
not found: rtt_ctx_start
not found: rtt_ctx_start
not found: rtt_ctx_start
not found: rtt_ctx_start
do_relocations: /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-II do_relocate failed
krtld: error during initial load/link phase
panic - boot: exitto64 returned from client program
Program terminated

So it seems the kernel got a corrupted file (/platform/sun4u/kernel/cpu/sparcv9/
SUNW,UltraSPARC-II). I booted from CD and checked the above file. It did excist. I checked witch checksum and compared it to a file from another sparc server. It seemed that the checksum was different but the filesize the same, so I copied it from the other sparc server and replaced this file, however it still didn’t work.

In the end I ran an install from cdrom and choose an upgrade installation from the install menu. The upgrade took about 5 hours but fixed everything. All my configurations kepped unchanged (SunRay Server, Metadb’s etc.). After the upgrade everyone was able to run on this server again.

Testing Fileserver, storage & DLT S4

Wednesday, August 15th, 2007

Today I spend most of my day building a test environment. The test environment has 1 Sun Fire X2100-M2 with a Sun StorageTek SCSI Adapter connected to a Promise VTrak M210 with 8 x 320GB configured with Hardware RAID6.
For networking a Cisco ASA 5505 is used (because of its VPN functionality) and a 3Com 24 x 10/100/1000 switch.

The X2100-M2 has 4 NIC’s. 2 NIC’s are configured with aggregation, 1 is used for the Service Processor and the other one is not used.

All seems to be working, solaris x64 is installed on the x2100-M2. The VTrak works perfectly and shows a 1.76TB RAID6 partition which is perfectly mountable with Solaris 10. Tomorrow all will be installed at the client on location and will be the heart of the network which consists of severel Mac workstations.

Ugly mail backup script

Tuesday, June 5th, 2007

Created a small script with ruby thats being executed by a cronjob to automatically backup a mailserver thats uses the Maildir format on Solaris.

The script backups the user home dir’s (This is where the mail is located) to a mounted NFS partition from the backup server. The script also should delete all backups older then 7 days. This is where I had a problem because I am not a real programmer “yet” and I didn’t want to search how to store all in a text file or in a db. So here is what I did (I know it’s a pretty ugly way of doing it, if anyone wants to give me a better code, please do so :- ) )

#!/opt/csw/bin/ruby
#Ruby Mail Backup
#Created by Sander on 05-06-2007

#
require 'date'

class Date
   def Date.now
      return Date.jd(DateTime.now.jd)
   end
end

dateBack = Date.now.to_s
dateDel1 = Date.now - 7
dateDel2 = Date.now - 50

#Deleting old backups (ugly workaround)
while dateDel2 < dateDel1
   delstr = dateDel2.to_s
    puts 'removing ' + delstr
    system('rm /mnt/backup/mail/home' + delstr +'.tar')
    dateDel2 = dateDel2 + 1
end

#Making backup of all home dir's
system('tar cf /mnt/backup/mail/home' + dateBack + '.tar /export/home/')

Helping Install Vista

Tuesday, May 15th, 2007

Yesterday I spend whole evening helping a client to install there new pc. It was a compaq preinstalled with Vista Basic Home and Microsoft Works. I needed to install it with Office and get there e-mail incl. addressbook from there old computer and put it in the new one. Also they wanted norton (bought in june 2006) and there ADSL to work.

First finishing the pre installed Vista, this took a lot of time. After it was finished it was easy to get adsl working. Just needed to plugin the usb cable from the dsl modem into the pc and install the drivers from the cdrom. It worked straight away however, windows started to download its updates automatically (I hate all the automatically security settings and warnings in windows) and the system was getting extremely slow.

After finishing when I open IE or FF and download something the system gets very slow. Taskmanager shows 100 procent cpu usage however it doesn’t show any process using 100 procent cpu usage so this was confusing. After struggling for a while I noticed that the adsl modem also has an ethernet connection. I decided to disconnect the usb and use ethernet instead. This fixed the problem and vista was very fast again. Conclusion, adsl modem driver was not compatible with vista.

The installation of Norton 2006 was not compatible with vista as well in such a way that it doesn’t allow you to install it. It gives a message that you need win2000 sp3, xp or higher.

Importing the old mail was a disaster, old version of Office and ended up with a outlook including all e-mails however, it didn’t save passwords, didn’t save the smtp require authentication setting and the contactbook was acting strange. In the end I installed thunderbird and imported the settings from outlook, that fixed the problem.

So my first impression of Vista is that it has a lot of compatibilty issues with old software/windows releases.

Writing Ruby Daemons

Wednesday, May 9th, 2007

We got some bash scripts running via a cronjob to set permissions of certain directories. To improve this process I wrote a very simple ruby daemon process to handle this.

First I installed the daemons gem (#gem install daemons), next I created the following:

#control.rb
require ‘rubygems’
require ‘daemons’

Daemons.run(’permissions.rb’)

———————————————————————-
#permissions.rb
loop do
system(’<some unix commands’>
system(’<some unix commands’>
sleep 15
end

As you can see, I use the standard unix tools by invoking the system(”) command. I am still trying to find out if it would be benefitial to use File.chmod and File.chown from the fileutils package.

Server Crash

Thursday, May 3rd, 2007

Have’nt written much in my blog lately because I have been to busy with work. The provider where we placed our 19″ rack and provides us with electriciy and bandwith for our hosting platform has decided to upgrade the power facilities. As a cause of this, they would take the power off our rack at the night from 24/25 April 2007, at 1.50AM.

After shutting down all servers at night and turning them back on again, a problem occured with one of our webservers. It halted at the boot loader screen (Actually just before the daemon drawing). It seems as if the screen was not showing anything from that moment on but the server was still booting (FreeBSD5.3). After a while everything seemed to be working but it appears as if the server was started in single user mode. All services where down accept ssh. After starting most manually we tried to dig into the problem.

My collegue bayu from Indonesia started recompiling userland and I waited for that, after it was recompilled and rebooted, the screen still did not show the startup procedure however, after a while I was getting a login prompt. I went home as it was already 6:00 AM and went straight to bed when I got home, 6:35 AM.

At 9.00 I got a wake up call from my boss saying that the server was not reachable anymore. I went downstairs to check from my home pc and it was correct, nothing was working anymore and we where not able to login to the server, however it was still pingable. I went to the datacenter
Again the screen stopped just before the daemon should appear and I couldn’t do much else then booting from the cd-rom. Bayu told me the last thing he did was reinstalling exim because of some problems we had earlier with it.

We where able to boot into fix-it mode however nothing really helped. We took out the 2nd HD (RAID1) so that we had a copy and did a fsck on the first disk. It fixed some errors, tried booting it but unsuccesfull. We tried copying back an old kernel, but none worked. We decided to make a spare backup and where able to mount a drive from our backupserver using NFS, from fixit mode.

One very annoying problem was that fixit mode keeps giving errors (Sudden commands stop working and give a segmentation fault, core dump) once in a while and then had to be rebooted again and again and again.

It took very long to fix this, I went back home the next morning arround 10 AM, straight for bed while our guys from indonesia keept working on restoring the backups that I made on a different machine (We had to switch to a spare machine with debian 3.1). The complexity is that the old machine was BSD and the new one Linux, which might cause some compatibility issues.

At night we had everything back up and running, after that I have been busy fixing small issues with mod_evasive, firehol etc.

Evolution Problems

Thursday, April 12th, 2007

Using Ximian Evolution (standard solaris10 version) on our Sun Ray’s has caused a lot of weird problems. Sometimes causing errors saying it got disconnected from the server, while the server is perfectly reachable. In the past 1 user had to click on the evolution icon about 10 times before evolution really started (Had to throw away all temp files from evo from his home dir, and setup evolution again).

Today one of our users had a problem forwarding a mail as attachment. It hung on sending. When Switching forward as attachment to forward inline it worked fine. I tried setting up Thunderbird and forwarded it as attachment from thunderbird, that went fine as well so somehow evolution was having a problem with this.

Wondering when Sun will upgrade Ximian evolution 1.4.x to 2.x