0

I've got a really strange problem running Xenserver 7.0 (tried 7.1 as well) on a Dell M620 with BCM57810 network card.

The whole setup is fine and running flawlessly with no traffic. I've got a Windows Server 2016 running and can access it with RDC through a Vyos firewall etc. On another virtual machine I want to run an owncloud instance and add another IP to the network interface and forward the traffic to it. As soon as I access the owncloud http interface, the whole server is crashing with a kernel panic and error messages relating to the Broadcom network driver.

device tap13.0 left promiscuous mode
device vif13.0 left promiscuous mode
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1a4/0x280()
NETDEV WATCHDOG: eth0 (bnx2x): transmit queue 0 timed out
Modules linked in: btrfs zlib_deflate raid6_pq xor xfs tun nfsv3 nfs fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt openvswitch(O) gre 8021q garp mrp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_multiport dm_multipath xt_conntrack nf_conntrack iptable_filter ipmi_devintf coretemp crc32_pclmul aesni_intel aes_x86_64 ablk_helper cryptd lrw lpc_ich mfd_core sg ipmi_si ipmi_msghandler wmi sb_edac edac_core hed shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd nls_utf8 isofs sunrpc ip_tables x_tables hid_generic usbhid hid sd_mod ahci libahci libata bnx2x(O) ehci_pci ehci_hcd mdio libcrc32c ptp megaraid_sas(O) pps_core scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh scsi_mod ipv6 autofs4
CPU: 6 PID: 0 Comm: swapper/6 Tainted: G           O 3.10.0+10 #1
Hardware name: Dell Inc. PowerEdge M620/0VHRN7, BIOS 2.5.4 01/27/2016
 0000000000000009 ffff8801354c3d58 ffffffff815427c7 ffff8801354c3d90
 ffffffff81054da1 ffff88012e210000 0000000000000000 0000000000000006
 ffff88012efe7100 ffff88012efe7080 ffff8801354c3df0 ffffffff81054e0c
Call Trace:
 <IRQ>  [<ffffffff815427c7>] dump_stack+0x19/0x1b
 [<ffffffff81054da1>] warn_slowpath_common+0x61/0x80
 [<ffffffff81054e0c>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff8149cd44>] dev_watchdog+0x1a4/0x280
 [<ffffffff8149cba0>] ? dev_deactivate_queue.constprop.29+0x60/0x60
 [<ffffffff81063cd3>] call_timer_fn+0x53/0x130
 [<ffffffff8149cba0>] ? dev_deactivate_queue.constprop.29+0x60/0x60
 [<ffffffff810658fd>] run_timer_softirq+0x22d/0x290
 [<ffffffff8105d48b>] __do_softirq+0xfb/0x240
 [<ffffffff8155255c>] call_softirq+0x1c/0x30
 [<ffffffff81014203>] do_softirq+0x43/0x80
 [<ffffffff8105d6d9>] irq_exit+0x49/0xa0
 [<ffffffff81384b55>] xen_evtchn_do_upcall+0x35/0x50
 [<ffffffff815525be>] xen_do_hypervisor_callback+0x1e/0xa0
 <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
 [<ffffffff8100a340>] ? xen_safe_halt+0x10/0x30
 [<ffffffff8101a844>] ? default_idle+0x44/0xd0
 [<ffffffff8101b038>] ? arch_cpu_idle+0x18/0x30
 [<ffffffff810a3532>] ? cpu_startup_entry+0x1c2/0x280
 [<ffffffff8152e11d>] ? cpu_bringup_and_idle+0x13/0x15
---[ end trace 3267d319304e6e4c ]---
ULP_STOP
bnx2fc: ERROR:bnx2fc_destroy_timer - Destroy compl not received!!
bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[0]: txdata->tx_pkt_prod(17962) != txdata->tx_pkt_cons(17955)
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[24]: txdata->tx_pkt_prod(49476) != txdata->tx_pkt_cons(49474)
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[0]: txdata->tx_pkt_prod(17962) != txdata->tx_pkt_cons(17955)
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[24]: txdata->tx_pkt_prod(49476) != txdata->tx_pkt_cons(49474)
[bnx2x_state_wait:329(eth0)]timeout waiting for state 0
bnx2x: [bnx2x_del_all_macs:9335(eth0)]Failed to delete MACs: -16
bnx2x: [bnx2x_chip_cleanup:10164(eth0)]Failed to schedule DEL commands for UC MACs list: -16
[bnx2x_state_wait:329(eth0)]timeout waiting for state 9
[bnx2x_state_wait:329(eth0)]timeout waiting for state 2
bnx2x: [bnx2x_func_stop:9935(eth0)]FUNC_STOP ramrod failed. Running a dry transaction
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1

The network diagram is as follows: enter image description here

Unfortunately I cannot install the vendor driver since I don't have the kernel headers to compile the driver manually.

I tried to disable virtual interfaces in the NIC configuration but without any success. Also disable_tpa or other module parameter didn't give me any success.

Hope anyone has any ideas.

Meiko Watu
  • 334
  • 3
  • 14
  • Kind of related: https://support.citrix.com/article/CTX136517 but doesn't help -.- – Meiko Watu Mar 14 '17 at 09:55
  • another comment: even though the adapter is in the HCL it seems, that only Ethernet is supported and not FIBRE. http://hcl.vmd.citrix.com/networkadapters/77/Broadcom_Corporation_NetXtreme_II_BCM57810_10_Gigabit_Ethernet – Meiko Watu Mar 14 '17 at 11:58

1 Answers1

0

I've recently had the same trouble with Xenserver 7.1 and Ubuntu VM

Server Dell R730

NIC Broadcom Limited NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)

In my case the trouble was in the vlan handling.

When I tried to handle Vlan on Xen and connect 4 virtual nic's with selected Vlan's from Xenserver to a VM - the whole hardware server had repeatedly crashed 7-10 minutes after starting this VM.

A workaround was to pass the whole eth0 interface to VM, and after that to handle Vlan's inside the VM itself (eth0.100, eth0.200 etc)