3

I read some answers related to this problems. Will the OS crash if the system partition can't be access for a short period? But I cannot solve this problem.

When using ISCSI as Storage Repository at XenServer and DomU (VM) is in heavy disk I/O, If ISCSI connection lost ( mainly network connection problem/ storage failover), DomU filesystem ( specially ext3 linux filesystem ) crashed. In this case, ext3 filesystem of DomU becomes read-only or unrecoverable.

How can I protect the filesystem of VM in case ISCSI connection lost at Dom0 ?

This is my XenServer environment.

[root@cnode01-m ~]# iscsiadm -m session
tcp: [1] 10.32.1.240:3260,2 iqn.1986-03.com.sun:02:c5544ae6-9715-6f38-f83b-a446896ac614
tcp: [3569] 10.32.1.240:3260,2 iqn.1986-03.com.sun:02:5c41ce31-3fbb-c6aa-d479-947e85515ac7

[root@cnode01-m ~]# vgs
  VG                                                 #PV #LV #SN Attr   VSize   VFree  
  VG_XenStorage-1aeee13b-2a87-1d0d-1834-7b8c868009b0   1  40   0 wz--n-   6.35T   4.93T
  VG_XenStorage-28e2c663-dae5-9504-9733-e05063ff081d   1  57   0 wz--n-   6.35T   4.52T
  VG_XenStorage-365d6e13-5caa-1fea-9940-e1bb553e3513   1  42   0 wz--n-   6.35T   5.13T
  VG_XenStorage-4ea23f9a-f945-5d45-cbd2-f3eab3fe75b3   1  42   0 wz--n-   6.35T   5.40T
  VG_XenStorage-54d69165-2eed-c058-d587-1b84d488adea   1  37   0 wz--n-   6.35T   5.01T
  VG_XenStorage-598b7237-282b-ea61-8edc-5101a70ea001   1  63   0 wz--n-   6.35T   5.01T
  VG_XenStorage-6a063762-26de-a3f8-f18c-734fce25433a   1  49   0 wz--n-   6.35T   5.56T
  VG_XenStorage-6b7bea84-7269-fa88-7b95-23dce431e1aa   1  71   0 wz--n-   6.35T   4.80T
  VG_XenStorage-6d6d263b-243c-fb24-4f0c-28b226a22bab   1  47   0 wz--n-   6.35T   4.94T
  VG_XenStorage-76fe6d6d-a37a-698d-9af2-50ea3f55e127   1  44   0 wz--n-   6.35T   5.37T
  VG_XenStorage-80e2df33-268c-b8a6-cc02-71f27ebe3326   1  39   0 wz--n-   6.35T   5.80T
  VG_XenStorage-886070b7-34e8-eb96-0931-2c31952608a6   1  13   0 wz--n- 457.65G 369.31G
  VG_XenStorage-97136f70-cf33-2593-38e0-b8c09785a754   1  60   0 wz--n-   6.35T   5.14T
  VG_XenStorage-c910e9fd-8817-0b99-8c8d-1ee0883705de   1  37   0 wz--n-   6.35T   5.67T
  VG_XenStorage-cd709bcb-d46a-8483-acbf-49b2b0c59c06   1  58   0 wz--n-   6.35T   4.80T
  VG_XenStorage-e153d09a-716a-9764-8967-f704278d55bd   1  43   0 wz--n-   6.35T   4.45T
  VG_XenStorage-f8574b51-31d4-7b0e-c71e-8253e1cdd230   1  61   0 wz--n-   6.35T   4.20T

[root@cnode01-m ~]# ls -la /dev/sd[a-z]
brw-r----- 1 root disk  8,   0 Jun  8 17:37 /dev/sda
brw-r----- 1 root disk  8,  16 Aug  1 10:14 /dev/sdb
brw-r----- 1 root disk  8,  32 Jun  8 17:38 /dev/sdc
brw-r----- 1 root disk  8,  48 Jul 31 14:49 /dev/sdd
brw-r----- 1 root disk  8,  64 Jul 31 14:46 /dev/sde
brw-r----- 1 root disk  8,  80 Jul 31 14:51 /dev/sdf
brw-r----- 1 root disk  8,  96 Aug  3 13:52 /dev/sdg
brw-r----- 1 root disk  8, 112 Aug  3 10:53 /dev/sdh
brw-r----- 1 root disk  8, 128 Aug  2 13:40 /dev/sdi
brw-r----- 1 root disk  8, 144 Jul 30 00:17 /dev/sdj
brw-r----- 1 root disk  8, 160 Jul 30 00:17 /dev/sdk
brw-r----- 1 root disk  8, 176 Jul 30 00:17 /dev/sdl
brw-r----- 1 root disk  8, 192 Jul 30 00:17 /dev/sdm
brw-r----- 1 root disk  8, 208 Jul 30 00:17 /dev/sdn
brw-r----- 1 root disk  8, 224 Jul 30 00:17 /dev/sdo
brw-r----- 1 root disk  8, 240 Jul 30 00:17 /dev/sdp
brw-r----- 1 root disk 65,   0 Jul 30 00:17 /dev/sdq

This is my DomU (VM) enviroment.

[root@i-58-7172-VM ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       16G  1.5G   14G  11% /
/dev/xvda1             99M   30M   65M  32% /boot
tmpfs                 512M     0  512M   0% /dev/shm

When I put heavy I/O load to / partition at VM and ISCSI connection have some problems (network problem, ISCSI target failover event) / partition crashed.

How can I solve this problem ? In advance, Thank you so much.

Added

This is my iscsid.conf at Dom0

[root@cnode01-m ~]# more /etc/iscsi/iscsid.conf
node.startup = manual
node.session.timeo.replacement_timeout = 86400
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.session.iscsi.FastAbort = No

10G ethernet and Jumbo Frame are implemented at storage layer. And Citrix XenServer have also command for pausing VMs when storage service have some issue But Pausing and Unpausing VM operation cause unintegrity of the VM's system clock. So it may have side effect, nomally at application layer. I think.

sw Han
  • 31
  • 1
  • 3

1 Answers1

1

first you should address the source of the issue - storage access. with iscsi you can tweak iscsi.conf, and increase the queue length, buffer sizes and timeout, so the connection will be able to sustain longer outages. besides, implementing multipathing, 10G ethernet (if the SAN supports it) and jumbo frames is a good idea.

I'm no Xen expert, but with KVM, there is an option to pause the VMs when there is an EIO or ENOSPACE returned by the storage layer, it should be possible with Xen, if you dig into the options IMO, and if not - I'd try and file a feature request with the developers.

dyasny
  • 18,482
  • 6
  • 48
  • 63
  • Thank you so much for your relay. I updated my question. – sw Han Aug 04 '11 at 00:26
  • What you call multipath is XenServer --> Storage Node == multi path. right ? – sw Han Aug 04 '11 at 02:10
  • could be - I'm no Xen expert. So, to start, you could look for the best values for your particular iSCSI target. As for pausing, while timing might be an issue, corrupted filesystem is much more of an issue, IMO – dyasny Aug 04 '11 at 12:42