1

i have a ubuntu 18.04.1 server running with 8-core processor and 8 Gigabyte's of memory. it's a cloud server which is based on KVM virtualization. i am using haproxy 1.8.8 for load balancing on my servers. problem is when i run load-test on my server with ab or wrk tool i can see that only one of cpu core are filled up to 100% (core7), and that because of too many si(soft interupts), so i checked /proc/interrupts file :

      CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:         30          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          9          0          0          0          0          0          0   IO-APIC   1-edge      i8042
  6:          0          0          0          3          0          0          0          0   IO-APIC   6-edge      floppy
  8:          0          0          1          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 10:          0          0        102          0          0          0          0   24228261   IO-APIC  10-fasteoi   virtio0, eth1, eth0
 11:          0          0          0          0          0          0          0         32   IO-APIC  11-fasteoi   uhci_hcd:usb1
 12:         15          0          0          0          0          0          0          0   IO-APIC  12-edge      i8042
 14:          0          0          0          0          0          0          0          0   IO-APIC  14-edge      ata_piix
 15:          0          0          0          0          0          0    1453248          0   IO-APIC  15-edge      ata_piix
 24:          0          0          0          0          0          0          0          0   PCI-MSI 131072-edge      virtio1-config
 25:          0          0          0          0         15          0          0          0   PCI-MSI 131073-edge      virtio1-virtqueues
 26:          0          0          0          0       5791    2805745          0          0   PCI-MSI 114688-edge      ahci[0000:00:07.0]
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:   14654751    6657243    5811366    5270649   14966993    4797078    5687129    8545399   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          1   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:    2572806    2980772    2435576    2151656    1887449    2366833    2404309    1967901   Rescheduling interrupts
CAL:     638862     508650     531191     579853     596146     636037     652622     655700   Function call interrupts
TLB:      62859      43397      20200       6237       4423      11681      18652       4408   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:       4706       4706       4706       4706       4706       4706       4706       4706   Machine check polls
HYP:          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event

which shows me there is a lot of interrupts sending from NIC, so i hve tried to some how tune NIC to cause fewer interrupts,

options that i have tried and doesn't have any effect:

  1. i have tried to disable irqbalance,
  2. use smp_affinity to distribute irq 10 across multiple cores (that doesn't even worked, irq 10 just stucked with one core no matter how i change smp_affinity, although i have read some articles which say in this situation that won't make performance better)
  3. increasing size of MTU up to 9000
  4. increasing rx ring buffer size up to 2048
  5. and many many sysctl tunings!

i have also noticed that i have many rx error in my eth0 :

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 185.8.174.227  netmask 255.255.255.0  broadcast 185.8.174.255
    inet6 fe80::84f9:91ff:fe5e:c862  prefixlen 64  scopeid 0x20<link>
    ether 86:f9:91:5e:c8:62  txqueuelen 1000  (Ethernet)
    RX packets 19862876  bytes 8071862301 (8.0 GB)
    RX errors 1746656  dropped 0  overruns 0  frame 1746656
    TX packets 22127410  bytes 13038619281 (13.0 GB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

here is also my haproxy.cfg global section:

global
nbproc 2

#log /dev/log   local0
#log /dev/log   local1 notice
chroot /var/lib/haproxy
#stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
#stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 10000
   # tune.ssl.default-dh-param 2048


# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private

# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
#  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
# An alternative list with additional directives can be obtained from
#  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
#ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
#ssl-default-bind-options no-sslv3

so whats wrong with my server? any help would be accepted.

  • Have you duplicated this result on a different instance? The errors could be hardware failure. – John Mahowald Oct 26 '19 at 19:49
  • @JohnMahowald i have only one load-balancer server in my cluster, you mean that interrupts might be from a hardware failure? like a bad cable or ethernet device? – aliopalopina Oct 29 '19 at 17:39
  • @JohnMahowald also i have tested my app servers directly with out being proxied with my load- balancer an the result of nginx-uwsgi stack is three times better than my load-balancer, also i am utilizing most of my cpu cores equally in my app servers with direct testing – aliopalopina Oct 29 '19 at 17:43
  • RX frame errors could be hardware failures or something. Try the same software stack with no tuning differences on different physical hardware, such as a different data center region where available. I'm skeptical you are at the scale where interrupt tuning helps you, but keep testing. – John Mahowald Oct 31 '19 at 13:40
  • @JohnMahowald thanks for your tips, we contact our server providers and they said there was a hardware issue and they solved it!, maybe that was because of kvm virtualization that they are using !! – aliopalopina Nov 02 '19 at 07:03

0 Answers0