Mixed ocfs2 clusters: direct access and iscsi

Question

Good morning,
I have this configuration:
diagram of configuration

One "shared disk" with one partition formatted as OCFS2
Host A and host B with direct access to the "shared disk"
Host C with direct access to the "shared disk" used as iSCSI target with linux tgt (iSCSI) configured for access to the "shared disk"
Host D and host E with access to the "shared disk" as iSCSI initiators using Host C
Host A, B, D and E are part of the same OCFS2 cluster

I made these tests:
Mounted at the same time the shared fs using OCFS2 on host A and B only (on each host ocfs2 daemon logs correctly joinings to cluster when mounting partition)
Mounted at the same time the shared fs using OCFS2 on host D and E only. Each host in this case is using the iSCSI path to the shared disk and the mount of ocfs2 partition works correctly on each host at the same time.
Mounted the shared fs on host A and D (A with direct access to shared disk and B using the iSCSI path to iSCSI target on host C) without success. When mounting on one host the other seems not to see the join to the cluster and data corruption occours with ocfs2 errors on log (heartbeat errors)

This last case is the argument of the question. Can be a cache problem on iSCSI target (block device cache or whatelse)? Is this cluster configuration in some way possible with some tuning on iSCSI or is it impossible?

Thank you

Matteo

matlocat · Answer 1 · 2020-03-05T10:38:41.073

After some digging into the problem I think I could have found a solution.
Instead of using /dev/sd* device file for backing-store at the iscsi target i used the /dev/sg*
counterpart with this configuration:

<target iqn.2018-02.test.it:lun1>
   <backing-store /dev/sg0>
      bs-type sg
      device-type pt
   </direct-store>
</target>

As I thought, maybe the solution could be related to the page cache on block device, maybe instead sg using direct commands is bypassing this...I don't know.
After changing the configuration i tried mount on host A, D and E successfully.
On /var/log/messages I found log of join of A, D and E as expected when ocfs2 is working.
I also created a test file and found updated on every host.
I will make some more tests this week to see if there could be data corruption and update the post accordingly.

Matteo

UPDATE
It seemed to work correctly but when doing a big copy (more than 1Gb of files) on the shared disk from an host on the iscsi (initiator), the host crashes and on the target tgt shows this error:

mar 04 17:09:29 debian-shdisk-iscsi tgtd[1515]: tgtd: bs_sg_cmd_submit(228) failed to start cmd 0x0x559a50768510
mar 04 17:09:29 debian-shdisk-iscsi tgtd[1515]: tgtd: graceful_write(87) sg device 11 write failed, errno: 33

The errno 33 (EDOM) in scsi generic linux kernel 4.19 driver seems to be triggered in a write to the cmd queue when it's full. On the tgtd side this seems not to be handled and it triggers that error generating block errors on the initiator side. Any suggestion?

Mixed ocfs2 clusters: direct access and iscsi

1 Answers1