3ware RAID and tw_cli
We use RAID-1 mirroring, so all of the information here relates to RAID-1, though some of it may be useful for other RAID types.
You can only run tw_cli as root, so either su or sudo it.
You can either run it as as program with its own command line i.e. cd to wherever you've installed it and then
[user@box name]# ./tw_cli
to get the command line, and then run the the commands e.g.
//box name> show
Or you can run it as a shell utility e.g.
[user@box name]# ./tw_cli show
We'll assume from here on in that we're running it at the shell command line.
Some commands and tw_cli's responses explained
First of all, let's get the big picture
[user@box name]# ./tw_cli show
or
[user@box name]# ./tw_cli info
which both return, for example:
Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU
------------------------------------------------------------------------
c0 8006-2LP 2 2 1 1 3 - -
This means controller c0 has two drives on two ports, one of which has a problem.
Now try
[user@box name]# ./tw_cli info c0
This asks for info on unit c0
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 DEGRADED - - - 139.735 ON -
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 DEGRADED u0 139.73 GB 293046768 WD-WMAP41084290
p1 OK u0 139.73 GB 293046768 WD-WXC0CA9D2877
Very similar to this is
[user@box name]# ./tw_cli info c0 u0
this produces the same info as above but in a slightly more compact form:
Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB)
------------------------------------------------------------------------
u0 RAID-1 DEGRADED - - - - 139.735
u0-0 DISK DEGRADED - - p0 - 139.735
u0-1 DISK OK - - p1 - 139.735
Here u0-0 means unit u0, port p0
Both of the above outputs show there are two drives in our RAID-1 array. Our array has only one unit - u0, which I think is standard for RAID-1. I think that other RAID configurations RAID-10 or RAID-6 might have more than one unit per controller, but as this doesn't affect us, I haven't paid too much attention.
The RAID array is degraded i.e. it is not functioning properly. In this case it is because the disk on port p0 is itself degraded. This probably means it has errors, but it may just mean it has stopped working properly for another reason, so it may be worth trying to rebuild the array again. You do this as follows:
[user@box name]# ./tw_cli maint remove c0 p0
This removes the degraded disk from the array, producing the following output
Removing port /c0/p0 ... Done.
If we now run:
[user@box name]# ./tw_cli info c0 u0
we get a slightly different result
Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB)
------------------------------------------------------------------------
u0 RAID-1 DEGRADED - - - - 139.735
u0-0 DISK DEGRADED - - - - 139.735
u0-1 DISK OK - - p1 - 139.735
The only difference here is that disk u0-0 is no longer assigned to port 0. Now you have to find the disk again...
[user@box name]# ./tw_cli maint rescan c0
This produces the following output, if it finds the disk i.e. it hasn't stopped spinning or something.
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p0].
If you run it again, it won't find anything second time around, cos it has already found it once! At this point it's possible that info c0 u0 still won't show the degraded disk as being on port p0, but
[user@box name]# ./tw_cli info c0
gives
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 DEGRADED - - - 139.735 ON -
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK - 139.73 GB 293046768 WD-WMAP41084290
p1 OK u0 139.73 GB 293046768 WD-WXC0CA9D2877
so perhaps it's a latency thing, or perhaps they just differ in what they show at this stage.
Anyway, if the disk was going to work again, we now have to rebuild the array as follows:
[user@box name]# ./tw_cli maint rebuild c0 u0 p0
This returns:
Sending rebuild start request to /c0/u0 on 1 disk(s) [0] ... Done.
If it's happy with your request. You now need to check whether the rebuild is actually in progress.
[user@box name]# ./tw_cli /c0/u0 show rebuildstatus
which in our case returned
/c0/u0 is not rebuilding, its current state is DEGRADED
Which means it didn't work. I think the next step is to run FSCK to try and repair the disk, and then try and rebuild the array.
A different kind of problem
On another of our boxes, running:
[user@box name]# ./tw_cli info c0
gives the following:
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 DEGRADED - - - 139.688 ON OFF
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 NOT-PRESENT - - - -
p1 OK u0 139.73 GB 293046768 WD-WMAP41398693
Here the disk at port p0 is listed a NOT-PRESENT, which suggests is has failed altogether and may have stopped spinning. Interestingly (I use this word in a fairly loose sense),
[user@box name]# ./tw_cli info c0 u0
gives the following:
Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB)
------------------------------------------------------------------------
u0 RAID-1 DEGRADED - - - - 139.688
u0-0 DISK OK - - p1 - 139.688
u0-1 DISK DEGRADED - - - - 139.688
u0/v0 Volume - - - - - 139.688
where the offending disk is listed as degraded, with no port assigned.
Anyway, running:
[user@box name]# ./tw_cli maint rescan c0
Gives the bleak output:
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [none].
This suggests it hasn't just lost track of the disk, but it really has failed. It may be unseated of course, so get someone to remove it and plug it in again, if possible. Trying the following:
[user@box name]# ./tw_cli maint remove c0 p0
Gives the output:
Removing port /c0/p0 ... Failed.
(0x0B:0x002E): Port empty
Yes. It's really not there, and it really can't find it. So either it has become unseated or it is dead.
I *think* that another meaning for NOT-PRESENT might be that there is a disk there but it hasn't been added to any array, or it has failed and is therefore not part of an array, but is still okay. In that case do this:
[user@box name]# ./tw_cli /c0/p0 export
This comes back with:
Removing /c0/p0 will take the disk offline.
Do you want to continue ? Y|N [N]:
Respond Y and if the disk is okay, you'll get:
Exporting port /c0/p0 ... Done.
Then you can add it to the array again with a maint rescan followed by a maint rebuild.
In our case it responded with:
Removing port /c0/p0 ... Failed.
(0x0B:0x002E): Port empty
Which confirms the deadness of the disk. There's loads of other stuff you can do with tw_cli, so here are some useful links where I managed to grab most of this information. I'd also like to add the rider, that as both of the boxes used in the examples above had disks which were dead, and we haven't had a situation where we've managed to rescue an array and bring a failed disk back to life, we can't vouch for the Lazarus techniques listed above from personal experience, and they're here so we know what to try next time it happens.
- Man page for tw_cli
- Useful description of what to do
- Something written based on the above
- Some examples and commands
- An example of how to rectify a NOT-PRESENT failed disk
- Possibly useful
- Another example
BB4 - New LSI controller
We now use
sas2ircu 0 display
for info on the RAID in BB4. This utility comes courtesy of Supermicro and there's a link here:
http://www.natecarlson.com/2010/08/23/lsi-command-line-utility-for-sas2-non-raid-controllers/