5

I have the following setup:

A single server with two LSI MegaRAID SAS 9380-8e controllers which are both connected to two 60-bay disk shelves while roughly following the design by Edmund White (see https://github.com/ewwhite/zfs-ha/wiki). The goal is to replicate the exact setup, but it's currently mid-migration.

After wiring the first shelf, all 60 disks were seen by both controllers and multipathing was setup and works smoothly. When adding the second disk shelf, there was still some old RAID configuration on the 60 disks which was dutifully reported by both controllers. Using the first controller I removed the configuration from disks and set them to being JBOD. All 60 disks are now visible to the OS and could be registered with multipath but only report a single path (going through controller 1), the second controller still reports all 60 disks as foreign (UGood F) and there is seemingly no way to forcibly make the controller rescan the devices or forget the current config for just this shelf:

# /opt/MegaRAID/storcli/storcli64 /c1 /e71 /sall show | head -n20
Controller = 1
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

-----------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
-----------------------------------------------------------------------
71:0     74 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:1    107 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:2     72 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:3     95 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:4     90 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:5     77 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:6     73 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:7     76 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:8     83 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  

This is the same shelf as seen by the other controller:

# /opt/MegaRAID/storcli/storcli64 /c0 /e165 /sall show | head -n20
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

-----------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
-----------------------------------------------------------------------
165:0   127 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:1   121 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:2   118 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:3   116 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:4   146 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:5   122 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:6   115 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:7   142 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:8   145 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  

But trying to clear the (wrong) info from the second controller does not work:

# /opt/MegaRAID/storcli/storcli64 /c1 /fall show
Controller = 1
Status = Success
Description = Couldn't find any foreign Configuration

# /opt/MegaRAID/storcli/storcli64 /c1 /fall delete
Controller = 1
Status = Success
Description = Couldn't find any foreign Configuration

# /opt/MegaRAID/storcli/storcli64 /c1 /fall import
Controller = 1
Status = Success
Description = Couldn't find any foreign Configuration

Forcing the disks into JBOD on the second controller does not work either:

# /opt/MegaRAID/storcli/storcli64 /c1 /e71 /sall set jbod | head -n20
Controller = 1
Status = Failure
Description = Set Drive JBOD Failed.

Detailed Status :
===============

-------------------------------------------------
Drive       Status  ErrCd ErrMsg                 
-------------------------------------------------
/c1/e71/s0  Failure   255 Operation not allowed. 
/c1/e71/s1  Failure   255 Operation not allowed. 
/c1/e71/s2  Failure   255 Operation not allowed. 
/c1/e71/s3  Failure   255 Operation not allowed. 
/c1/e71/s4  Failure   255 Operation not allowed. 
/c1/e71/s5  Failure   255 Operation not allowed. 
/c1/e71/s6  Failure   255 Operation not allowed. 
/c1/e71/s7  Failure   255 Operation not allowed. 
/c1/e71/s8  Failure   255 Operation not allowed. 
/c1/e71/s9  Failure   255 Operation not allowed. 

Is there any way to tell the RAID controller those disks do no longer have a foreign config and should be seen as JBODs?

Michael
  • 280
  • 3
  • 15
  • Could you try `/cx rescan`? – Lenniey Sep 08 '17 at 09:50
  • This yields a syntax error. rescan is not a supported subcommand in storcli. – Michael Sep 08 '17 at 09:55
  • Sorry, I was on 3Ware...did all your disks come from the same old machine / vendor? Some controllers install their own firmware and can only be used by another one if you low-level-format the disk or remove the config from the old controller. Also I assume the controllers are all on the same firmware / BIOS etc.? – Lenniey Sep 08 '17 at 10:03
  • Does the controller have a JBOD mode? Why aren't you using a SAS HBA for ZFS? Are these a bunch of RAID0 arrays? – ewwhite Sep 08 '17 at 11:55
  • @eewhite: Yes, there is a JBOD mode (see sample output of controller c0 above). I am migrating from a different setup and had those 4 relatively new RAID controllers around. And it already works well with the first shelf. The problem when adding the second shelf was simply that the controller detected the existing config and pulled it in. – Michael Sep 08 '17 at 12:40
  • @Lenniey: Yes, all disks were connected to a LSI controller (same brand) before. The controller 0 also showed the foreign config at first. I changed the disks to JBOD on this contoller and can access them from the OS. Only now, controller 1 is not updateing it's config to the change. – Michael Sep 08 '17 at 12:44
  • I'd try disconnecting controller 1 (PCI-wise) + storage, reboot etc., reconnecting and rebooting again. I had so many strange troubles with RAID-controllers of different vendors / HDD incompatibilities and whatever you can think of, that this is ususally my "workflow". Next would be to attach only one single disk, try to format (or initialize) it and see what happens. – Lenniey Sep 08 '17 at 12:56

2 Answers2

1

Restart the out-of-sync controller (eg c1)

/opt/MegaRAID/storcli/storcli64 /c1 restart
jeffre
  • 11
  • 1
  • 1
    I'd like to add that you may want to consult with Broadcom on your configuration, as I was recently informed by their support what you (and I) were doing is not supported: "You cannot have 2 MegaRAID controllers taking charge of the same set of drives on 1 enclosure even if there are 2 SAS expander chips on the backplane." – jeffre Apr 29 '19 at 19:31
0

seems like the jbod option in the controller is disabled, try this command storclif64 /c0 show jbod

and if the JBOD is OFF you can enable it using storclif64 /c0 set jbod=ON (storcli /c0 set jbod=)

Controller Properties :


Ctrl_Prop Value

JBOD ON