lpfc + multipath + ubuntu - path keeps switching

Question

I am having issues configuring multipath using Emulex (lpfc). Although I do not detect data corruption the SAN administrator has a tool that shows that the paths are being switched every 20 seconds or so. Here are the details:

# multipath -l
san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 3:0:0:0 sdb 8:16  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 4:0:0:0 sdc 8:32  [active][undef]

The multiple paths are connected to the same LUN.

# /lib/udev/scsi_id -g -u -d /dev/sdb
3600a0b80002a042200002cb44a9a29ca
# /lib/udev/scsi_id -g -u -d /dev/sdc
3600a0b80002a042200002cb44a9a29ca

Here's the /etc/multipath.conf

defaults {
        udev_dir                /dev
        polling_interval        5
        selector                "round-robin 0"
        path_grouping_policy    failover
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
        path_checker            readsector
        failback                immediate
        user_friendly_names     yes
}
multipaths {
        multipath {
                wwid    3600a0b80002a042200002cb44a9a29ca
                alias   san01
        }
}

fdisk -l

Disk /dev/sdb: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x61b4bf95

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       13054   104856223+  83  Linux

Disk /dev/sdc: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x61b4bf95

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       13054   104856223+  83  Linux

I increased the verbosity for lpfc and now I get the following on dmesg:

[ 2519.241119] lpfc 0000:07:00.0: 1:0336 Rsp Ring 0 error: IOCB Data: xff000018 x37a120c0 x0 x0 xeb x0 x1b108db xa29b16
[ 2519.241124] lpfc 0000:07:00.0: 1:(0):0729 FCP cmd x12 failed <0/0> status: x1 result: xeb Data: x1b1 x8db
[ 2519.241127] lpfc 0000:07:00.0: 1:(0):0730 FCP command x12 failed: x0 SNS x0 x0 Data: x8 xeb x0 x0 x0
[ 2519.241130] lpfc 0000:07:00.0: 1:(0):0716 FCP Read Underrun, expected 254, residual 235 Data: xeb x12 x0
[ 2519.241275] lpfc 0000:07:00.0: 1:0336 Rsp Ring 0 error: IOCB Data: xff000018 x37a14c48 x0 x0 xd2 x0 x1b208e6 xa29b16
[ 2519.241279] lpfc 0000:07:00.0: 1:(0):0729 FCP cmd x12 failed <0/0> status: x1 result: xd2 Data: x1b2 x8e6
[ 2519.241283] lpfc 0000:07:00.0: 1:(0):0730 FCP command x12 failed: x0 SNS x0 x0 Data: x8 xd2 x0 x0 x0
[ 2519.241286] lpfc 0000:07:00.0: 1:(0):0716 FCP Read Underrun, expected 254, residual 210 Data: xd2 x12 x0

Can someone see anything wrong with this config? Thank you.

Based on janneb's comments I changed the configuration in multipath.conf to:

defaults {
        udev_dir                /dev
        polling_interval        5
        selector                "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
        failback                immediate
        user_friendly_names     yes
}

Which now gives:

san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
 \_ 4:0:0:0 sdc 8:32  [active][ready]

But it still goes [active][undef] after a while, then back to [ready].

Oh I just noticed something, when I run 'multipath -l' I get [undef], however if I run 'multipath -ll' I get [ready].

-l     show the current multipath topology from information fetched in sysfs and the device mapper
-ll    show the current multipath topology from all available information (sysfs, the device mapper, path checkers ...)

Is the setup wrong? How can I debug? Thanks.

Thank you janneb and zerolagtime for helping out.

Here's how it gets complicated, I thought I would not need to explain all this, and I am currently leaning towards hardware setup mixup.

There are actually two servers connected to the same LUN using FC. On the OS level only one server would access the filesystem (although the same LUN is exposed to both) , since it is ext3 (not a clustering filesystem). If server 1 goes down, server 2 kicks in (linux-ha) and mounts the filesystem.

Server 1 (multipath -ll):

san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
 \_ 4:0:0:0 sdc 8:32  [active][ready]

Server 2 (multipath -ll):

san01 (3600a0b80002a042200002cb44a9a29ca) dm-2 IBM     ,1815      FASt
[size=100G][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 3:0:0:0 sdb 8:16  [active][ready]
 \_ 4:0:0:0 sdc 8:32  [active][ready

Server 1 port names:

# cat /sys/class/fc_host/host3/port_name 
0x10000000c96c5fdb
# cat /sys/class/fc_host/host4/port_name 
0x10000000c96c5df5
root@web-db-1:~#

Server 2 port names:

#cat /sys/class/fc_host/host3/port_name 
0x10000000c97b0917
# cat /sys/class/fc_host/host4/port_name 
0x10000000c980a2d8

Is this setup wrong? Is the way that the LUN exposed to both server wrong? I am thinking that the hardware hookup is incorrect, what could be wrong? Could server1 path_checker interfering with server2's operation? Thanks.

janneb · Answer 1 · 2010-11-07T14:34:54.210

Your configuration looks weird; normally you'd have 4 paths to the same device (that is, 4 /dev/sdX devices per multipath device). The array controller typically is able to inform the host about the priority for each path, so you have 2 paths with higher priority and 2 with lower priority. Then dm-multipath multiplexes IO over the 2 high priority paths (the "selector" option with the default rr_min_io=100). Now, you have 2 path groups both with the same prioruty, so maybe dm-multipath is spreading IO over both of them, which might not be what your SAN admin wants you to do. Another weird thing is that the devices are marked with "undef" rather than "ready". Yet another strange thing is that your path numbering looks quite weird (everything goes along the same path?). Are you really sure everything is properly cabled together, properly zoned etc.?

A typical output from "multipath -ll" should look like

sanarch3 (3600508b4000683de0000c00000a20000) dm-6 HP,HSV200
[size=2.0T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=100][active]
 \_ 0:0:0:5 sdc 8:32  [active][ready]
 \_ 1:0:0:5 sdk 8:160 [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 0:0:1:5 sdg 8:96  [active][ready]
 \_ 1:0:1:5 sdo 8:224 [active][ready]

There you see 4 paths grouped into 2 priority groups, and IO is done over devices sdc and sdk while sdg and sdo are idle and used only during a failure.

EDIT So the reason why you should see 4 paths is that you have 2 HBA ports and the array has 2 redundant controllers. Then you have 2 redundant networks with a final switch layer providing cross-network connections. Thus both HBA's see both controllers, hence 4 paths for each LUN. You can see that in my example above for the SCSI ID numbering, which goes as [host controller ID]:[channel ID]:[target controller ID]:[LUN ID]. What you then can see above is that the active paths are both on controller #0, since in this case controller #0 happens to "own" the LUN; IO is possible via the other controller but at a performance penalty since the other controller would (depending on the controller implementation) need to forward the IO to the owning controller. Hence the controller reports that the paths that go to controller #0 have higher priority.

So from your question one sees that there is no path to the other controller at all. And, in case you don't have redundant controllers and networks, why bother with multipath in the first place?

Both sdb1 and sdc1 are attached to the same LUN, so in fact I have 1 LUN with 100GBs. /dev/sdb1 is one path and /dev/sdc1 is another. Can't I configure it like that? I have 2 cables coming out of the server, I assumed each device would be a path. — A4A, Nov 07 '10 at 09:21
Hello janneb, I added some more info to the original question, can you please take a look and let me know :) Thank you. — A4A, Nov 07 '10 at 10:26
If janneb's description is correct, each cable on the host represents two devices in the operating system. Think of it like this: the LUN has two targets T1 and T2. Your host has two HBA ports acting as initiators I1 and I2. The paths then represent I1->T1, I1->T2, I2->T1, and I2->T2. This is true if you have dual fabrics or if you only have one SAN switch. — zerolagtime, Nov 08 '10 at 01:30
If you have two, redundant SAN switches that are part of the same fabric (not recommended), then things can get REALLY confusing. So confusing, that I'm not sure I can provide a clear picture at the moment. — zerolagtime, Nov 08 '10 at 01:33

ppetraki · Answer 2 · 2012-01-31T13:37:28.570

The IBM SANs usually have a well defined multipath.conf examples in their documentations, did you not start there? I'll leave that part as an exercise to the reader. Also, your SAN admin owes you a little more support. Some quick points

Path oscillations like you described are usually due to path checker being miss configured, in your two iterations you when from readsector0 to none, which is probably taking the multipath default for that make and model, likely tur (test unit ready).
No priority checker defined, no priority checker, no priorities.
A hardware handler is probably required which is well defined in the documentation.

Best IBM 1815 war story I found was this, summary:

Install rdac driver, modprobe scsi_dh_rdac, and add it to your initrd
Use the following multipath.conf:

blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z]"
    devnode "^sda"
    device {
        vendor "Maxtor*"
        product "OneTouch*"
    }
}
blacklist_exceptions {
    device {
            vendor  "IBM"
            product "1815*"
    }
}
defaults {
    failback                immediate
    no_path_retry           queue
    user_friendly_names     no
    path_grouping_policy    failover
}
devices {
    device {
            vendor                  "IBM"
            product                 "1815*"
            failback                manual
            hardware_handler        "1 rdac"
            path_checker            rdac
            prio_callout            "/sbin/mpath_prio_rdac /dev/%n"
    }
}

Let us know how it turns out. Good luck!

score 0 · Answer 3 · answered Oct 09 '11 at 11:44

First of all, you define multibus, are you sure your Storage support this? Ask your SAN admin if your storage is a real active/active one, active passive storage do not allow to switch from controler all the time, this has a cost for the storage and will give you problems on the client side as well. In the first config it was not defined in the config, meaning you take the default config define in multipath (check /usr/share/doc/mulitpath.conf.anotted) or look at the output of multipathd -k show config to have a better view.(anyway aleays review the default config with your storage specs, because they are not always the best: i've had some issue with HDS et rhel)

The second thing, you said no integrity problem on the FS, are your sure your FS is using the multipathed device??? If I assume you use LVM, did you check the Filter settings in lvm.conf? if this is not well setted, lvm will use the device in direct instead of using MPIO, this can be even more a problem with active/passive storage, since lvm will force the use of a controller thay may not be the prefered one.... I hope it helps Regard

lpfc + multipath + ubuntu - path keeps switching

3 Answers3