0

We bought a Dell R720 server and installed CentOS 6.3 on it. We have another older Dell server which also has CentOS 6.3 installed in it. When we ran a simple benchmark for disk performance the older server is 10 times faster for that benchmark as the new one. The benchmark process involves writing something into the disk and flushing it in a loop. We want to track down why this is slow. There are two disks in the new server and we configured them as RAID-0. df -h yields these:

[Older  server]

[xxx@xxx ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 97G 28G 64G 31% /
tmpfs 1.9G 11M 1.9G 1% /dev/shm
/dev/sda2 193G 103G 80G 57% /home

[New server]

[root@snap ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       116G  664M  109G   1% /
tmpfs            12G     0   12G   0% /dev/shm
/dev/sda1       7.7G  195M  7.2G   3% /boot
/dev/sdb2        77G  192M   73G   1% /home
/dev/sdb1       154G  232M  146G   1% /tmp
/dev/sda3        77G  2.4G   71G   4% /usr

How to figure out what's making the newer server 10 times slower? How to fix it? Thanks.

EDIT: Adding lshw output.

[Older Server]
[duminda@snapoffice src]$ sudo ./lshw -class storage
[sudo] password for duminda: 
  *-storage               
       description: Serial Attached SCSI controller
       product: SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
       vendor: LSI Logic / Symbios Logic
       physical id: 0
       bus info: pci@0000:05:00.0
       logical name: scsi0
       version: 03
       width: 64 bits
       clock: 33MHz
       capabilities: storage pm pciexpress vpd msi msix bus_master cap_list rom
       configuration: driver=mpt2sas latency=0
       resources: irq:16 ioport:fc00(size=256) memory:df2b0000-df2bffff memory:df2c0000-df2fffff memory:df100000-df1fffff(prefetchable)
  *-ide:0
       description: IDE interface
       product: 5 Series/3400 Series Chipset 4 port SATA IDE Controller
       vendor: Intel Corporation
       physical id: 1f.2
       bus info: pci@0000:00:1f.2
       version: 05
       width: 32 bits
       clock: 66MHz
       capabilities: ide pm bus_master cap_list
       configuration: driver=ata_piix latency=0
       resources: irq:20 ioport:eca0(size=8) ioport:ec90(size=4) ioport:eca8(size=8) ioport:ec94(size=4) ioport:ecc0(size=16) ioport:ecd0(size=16)
  *-ide:1
       description: IDE interface
       product: 5 Series/3400 Series Chipset 2 port SATA IDE Controller
       vendor: Intel Corporation
       physical id: 1f.5
       bus info: pci@0000:00:1f.5
       logical name: scsi3
       version: 05
       width: 32 bits
       clock: 66MHz
       capabilities: ide pm bus_master cap_list emulated
       configuration: driver=ata_piix latency=0
       resources: irq:21 ioport:ecb0(size=8) ioport:ec98(size=4) ioport:ecb8(size=8) ioport:ec9c(size=4) ioport:ece0(size=16) ioport:ecf0(size=16)

[Newer Server]
[root@Snap src]# ./lshw -class storage
  *-storage               
       description: RAID bus controller
       product: MegaRAID SAS 2208 [Thunderbolt]
       vendor: LSI Logic / Symbios Logic
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: scsi0
       version: 05
       width: 64 bits
       clock: 33MHz
       capabilities: storage pm pciexpress vpd msi msix bus_master cap_list rom
       configuration: driver=megaraid_sas latency=0
       resources: irq:42 ioport:fc00(size=256) memory:ddffc000-ddffffff memory:ddf80000-ddfbffff memory:dd000000-dd01ffff(prefetchable)
  *-storage
       description: SATA controller
       product: C600/X79 series chipset 6-Port SATA AHCI Controller
       vendor: Intel Corporation
       physical id: 1f.2
       bus info: pci@0000:00:1f.2
       logical name: scsi5
       version: 05
       width: 32 bits
       clock: 66MHz
       capabilities: storage msi pm ahci_1.0 bus_master cap_list emulated
       configuration: driver=ahci latency=0
       resources: irq:124 ioport:ece8(size=8) ioport:ecf8(size=4) ioport:ecf0(size=8) ioport:ecfc(size=4) ioport:ecc0(size=32) memory:df8ff000-df8ff7ff

EDIT: Further info about disks:

[Older Server]
[duminda@snapoffice ~]$ find /sys/ -type f -name "model"
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host0/target0:1:0/0:1:0:0/model
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/model
/sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/model
/sys/devices/pci0000:00/0000:00:1f.5/host3/target3:0:0/3:0:0:0/model
[duminda@snapoffice ~]$ cat /sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host0/target0:1:0/0:1:0:0/model
Virtual Disk    
[duminda@snapoffice ~]$ cat /sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/model
ST500NM0001     
[duminda@snapoffice ~]$ cat /sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/model
ST500NM0001     
[duminda@snapoffice ~]$ cat /sys/devices/pci0000:00/0000:00:1f.5/host3/target3:0:0/3:0:0:0/model
DVD+-RW TS-L633J

googling ST500NM0001

Storage Capacity: 500 GB
Maximum External Data Transfer Rate: 600 MBps (4.7 Gbps)
Rotational Speed: 7200 rpm
Buffer: 64 MB
Drive Interface: SAS
Drive Interface Standard: 6Gb/s SAS
Drive Type: Internal
Drive Width: 3.5"
Height: 1"
Width: 4"
Depth: 5.8"
Weight (Approximate): 1.34 lb
Limited Warranty: 3 Year

However, Newer server give this:

[Newer Server]
[root@Snap ~]# find /sys/ -type f -name "model"
/sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/host0/target0:2:0/0:2:0:0/model
/sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/host0/target0:2:1/0:2:1:0/model
/sys/devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0/model
[root@Snap ~]# cat /sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/host0/target0:2:0/0:2:0:0/model
PERC H710       
[root@Snap ~]# cat /sys/devices/pci0000:00/0000:00:02.2/0000:03:00.0/host0/target0:2:1/0:2:1:0/model
PERC H710       
[root@Snap ~]# cat /sys/devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0/model
DVD+-RW DS-8A9SH

EDIT: The new server has 2 of these drives:

300GB  15K RPM, 6Gbps   SAS 3.5" Hot Plug Hard Drive

EDIT: I/O schedular changes in new server:

[snap@Snap ~]$ cat /sys/block/sda/queue/scheduler 
[noop] anticipatory deadline cfq 
[snap@Snap ~]$ cat /sys/block/sdb/queue/scheduler 
[noop] anticipatory deadline cfq 
[snap@Snap ~]$ time ./test_depth 

real    0m0.990s
user    0m0.239s
sys 0m0.352s
[snap@Snap ~]$ cat /sys/block/sda/queue/scheduler 
noop [anticipatory] deadline cfq 
[snap@Snap ~]$ cat /sys/block/sdb/queue/scheduler 
noop [anticipatory] deadline cfq 
[snap@Snap ~]$ time ./test_depth 

real    0m1.031s
user    0m0.172s
sys 0m0.444s
[snap@Snap ~]$ cat /sys/block/sda/queue/scheduler 
noop anticipatory [deadline] cfq 
[snap@Snap ~]$ cat /sys/block/sdb/queue/scheduler 
noop anticipatory [deadline] cfq 
[snap@Snap ~]$ time ./test_depth 

real    0m0.998s
user    0m0.150s
sys 0m0.448s
[snap@Snap ~]$ cat /sys/block/sda/queue/scheduler 
noop anticipatory deadline [cfq] 
[snap@Snap ~]$ cat /sys/block/sdb/queue/scheduler 
noop anticipatory deadline [cfq] 
[snap@Snap ~]$ time ./test_depth 

real    0m1.078s
user    0m0.228s
sys 0m0.350s
[snap@Snap ~]$ 

One run on each schedule may not be enough. But it seems there is not much of a difference.

EDIT:

Reinstalled CentOS again without logical volumes. Just used ext4 partitions. Still no performance improvement.

EDIT: Benchmark program - Very simple.

  (run with these env vars)
  export GLOG_logbufsecs=0
  export GLOG_log_dir=/tmp

  ====================

  #include <glog/logging.h>
  #include <iostream>


  int main(int argc, char **argv)
  {
      google::InitGoogleLogging(argv[0]);

      for (int i = 0; i <100000; ++i)
      {
          DLOG(INFO) << "TEST";
      }

      return 0;
  }

CPU INFORMATION

==========================================

[Old server CPU]
[duminda@snapoffice mdata]$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 30
model name  : Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz
stepping    : 5
cpu MHz     : 2393.786
cache size  : 8192 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips    : 4787.57
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

...... 3 more like this

================================================
[New server CPUs]
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
stepping    : 4
cpu MHz     : 1999.988
cache size  : 20480 KB
physical id : 0
siblings    : 16
core id     : 0
cpu cores   : 8
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand lahf_lm ida arat xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips    : 3999.97
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

........ 31 more like this

========================

bonnie++ output

[Old server]

[root@snapoffice bonnie++-1.03e]# time ./bonnie++ -n 0 -d /tmp/duminda -r 512 -b -u duminda
Using uid:511, gid:511.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
snapoffice       1G 54925  97 105195  24 123526   8 65593  99 +++++ +++ 384.3   0
snapoffice,1G,54925,97,105195,24,123526,8,65593,99,+++++,+++,384.3,0,,,,,,,,,,,,,

real    1m20.473s
user    0m33.528s
sys 0m4.819s

[New server]

[root@snap ~]# time bonnie++ -n 0 -d /tmp -r 512 -u snap -b
Using uid:500, gid:500.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
snap.R720        1G 86507  99 217958  31 187624  20 88467  99 +++++ +++ +++++ +++
snap.R720,1G,86507,99,217958,31,187624,20,88467,99,+++++,+++,+++++,+++,,,,,,,,,,,,,

real    0m40.172s
user    0m22.907s
sys 0m4.516s

============================================

Memory

[Old server]
[duminda@snapoffice mdata]$ cat /proc/meminfo 
MemTotal:        3913604 kB
MemFree:         1272208 kB
Buffers:          196168 kB
Cached:          1459716 kB
SwapCached:        73752 kB
Active:           867288 kB
Inactive:        1396600 kB
Active(anon):     325104 kB
Inactive(anon):   293588 kB
Active(file):     542184 kB
Inactive(file):  1103012 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8191992 kB
SwapFree:        7683524 kB
Dirty:                80 kB
Writeback:             0 kB
AnonPages:        549976 kB
Mapped:            48912 kB
Shmem:             10684 kB
Slab:             247592 kB
SReclaimable:      86080 kB
SUnreclaim:       161512 kB
KernelStack:        7024 kB
PageTables:        79016 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10148792 kB
Committed_AS:    7679752 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      285540 kB
VmallocChunk:   34359445552 kB
HardwareCorrupted:     0 kB
AnonHugePages:    204800 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        6756 kB
DirectMap2M:     4177920 kB

[New server]
[root@snap bonnie++-1.03e]# cat /proc/meminfo 
MemTotal:       24554684 kB
MemFree:        23312840 kB
Buffers:          217960 kB
Cached:           523140 kB
SwapCached:            0 kB
Active:           346236 kB
Inactive:         414888 kB
Active(anon):      20208 kB
Inactive(anon):       28 kB
Active(file):     326028 kB
Inactive(file):   414860 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      20479992 kB
SwapFree:       20479992 kB
Dirty:                 8 kB
Writeback:             0 kB
AnonPages:         20032 kB
Mapped:            14532 kB
Shmem:               220 kB
Slab:             163140 kB
SReclaimable:      86032 kB
SUnreclaim:        77108 kB
KernelStack:        6320 kB
PageTables:         3544 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    32757332 kB
Committed_AS:     120740 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      338928 kB
VmallocChunk:   34346663592 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        5056 kB
DirectMap2M:     2045952 kB
DirectMap1G:    23068672 kB

EDIT: Adding bounty

I want to know why my test program runs slow on the new server and how to fix it (without removing the use of glog - because our programs make extensive use of that). Perhaps Matthew's answer leads in the correct direction?

Let me know if you need any more information...

nakiya
  • 59
  • 1
  • 1
  • 11
  • 3
    You know RAID0 is only for suckers, right? – EEAA Feb 12 '14 at 04:43
  • Yep, apparently there is no way to bypass the PERC H710 raid controller in the server and access the disks directly :) – nakiya Feb 12 '14 at 04:44
  • 6
    @nakiya - the point EEAA is making is that RAID0 is a really bad configuration for a production server. Lose one drive, lose all your data, and your chances of losing a drive are doubled. – Mark Henderson Feb 12 '14 at 04:55
  • 4
    You don't mention the interface type and rotational speeds of the disks. Were your old disks 15k SAS and your new disks 7.2k SATA? What about controller cache between the two systems, are both battery/flash backed with write-through disabled? How many disks were in the old server and what was the RAID layout there? – MDMarra Feb 12 '14 at 05:00
  • @MDMarra : Added lshw output on two systems. I am not sure about the last system, but in the newer one I configured RAID-0 with write-back. – nakiya Feb 12 '14 at 06:35
  • @MarkHenderson: Yeah, but performance is the key factor here. Anyway what we are using the disk for is logs only. Our processes are CPU intensive so we want the disks to be as fast as possible. – nakiya Feb 12 '14 at 06:37
  • `we want the disks to be as fast as possible` - Then replace them with SSDs. – HopelessN00b Feb 12 '14 at 13:00
  • The PERC controller is "hiding" the disks from the OS in the output you've pasted. Find out what disks you actually have. How do you purchase a server and not know what hard disks you've purchased, especially when you say disk performance is important? – MDMarra Feb 13 '14 at 01:23
  • @MDMarra: Updated with new server disk info. – nakiya Feb 13 '14 at 02:12
  • 5k disks? In 300GB capacity? That's gotta be a typo for 15K surely. Only 5k disks I know of are entry level, large-capacity drives (3-4tb). – Mark Henderson Feb 13 '14 at 02:18
  • @MarkHenderson: I guess you are right. Sadly, this is what is present in the quotation. – nakiya Feb 13 '14 at 02:28
  • @MarkHenderson: Ok, they are 15K rpm disks. I do not understand the downvotes lol. – nakiya Feb 13 '14 at 07:08
  • @nakiya Without knowing the full details of the differences between the servers, we cannot help you. In fact, without knowing the details of the benchmark, we can't help you. There are too many variables here. The question is too local to your environment and isn't easily answerable without giving a list of things to try. You haven't even provided CPU details! – ewwhite Feb 13 '14 at 07:13
  • @ewwhite: Edited the question with the benchmark program. Added cpu info. Will run bonnie++ when server is free and post comparison. Please let me know if you need any other info. ty. – nakiya Feb 13 '14 at 07:27
  • Um, how much RAM do you have in both of these servers? – ewwhite Feb 13 '14 at 07:55
  • @ewwhite: updated with `/proc/meminfo` – nakiya Feb 13 '14 at 07:59
  • Your example program doesn't build. Where is `cmdLine` defined? – Matthew Ife Feb 13 '14 at 22:34
  • @MatthewIfe: It is of no use. Removed. – nakiya Feb 14 '14 at 01:42
  • You've changed the premise of the question from 'why are my disks slow' to 'why is my code slow'. The original question 'why are my disks slow' was, in my opinion tackled appropriately and you need to accept an answer. The question 'why is my code slow' is a question that should be answered by stackexchange, not serverfault. – Matthew Ife Feb 14 '14 at 09:09
  • @MatthewIfe: Guess you are right. I'm tired anyway. – nakiya Feb 14 '14 at 09:24

4 Answers4

3

Can you provide more detail about your benchmark? Does your benchmark reflect real application usage scenarios? If it doesn't, it may not be the best way to gauge the performance of the disks. Other things to consider, though...

  • Filesystem choice and tuning.
  • Mount parameters (noatime, nobarriers, etc.).
  • Raw partitions versus LVM.
  • Partition alignment.
  • I/O scheduler.
  • RAID controller cache settings.
  • Disk type.

On the filesystem and tuning front, EL6.2+ have the tuned-adm framework that helps set some basic I/O performance parameters for your hardware. This includes setting the deadline I/O scheduler and disabling write barriers where it makes sense. For your new system, you'll want to run yum install tuned tuned-utils and run tuned-adm profile enterprise-storage.

For your partitioning scheme, you old server has fixed partition sizes while the new system has LVM. LVM is another layer of abstraction and may have an impact in this case.

Please make sure your RAID cache is configured properly. You typically want to cache to be biased towards writes.

Find out what disks are in the new server... but that may not matter, as your old server's disks are the slowest enterprise nearline SAS disks available. So your new system likely has drives capable of performance greater than or equal to your older server's disks.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Hi, our benchmark is pretty similar to our processes. The benchmark process was just made because one of our processes was running very slowly on the new server. After some diagnosing we found out that it was the disk writes it was stuck in. So we wrote the benchmark program to reflect that scenario. – nakiya Feb 13 '14 at 02:15
  • @nakiya Well, your new disks are twice the rotational speed of your old disks. Your issue is probably one of the bullet points above. – ewwhite Feb 13 '14 at 02:21
  • `tuned-adm` command does not change performance. RAID is configured as RAID-0 write-back. – nakiya Feb 13 '14 at 03:33
  • @nakiya Seeing your benchmark, it's not a substantial measure and it's too short to be of any use. I don't think there's anything wrong with your system. – ewwhite Feb 13 '14 at 03:41
  • After increasing the number of writes to 10x as before in the benchmark, I get 1.391s vs 10.999s in the old and new servers respectively – nakiya Feb 13 '14 at 03:58
  • 2
    @nakiya Okay. Use the old server, then. – ewwhite Feb 13 '14 at 07:14
  • I tried almost all of your suggestions except partition alignment I guess. Any other suggestions? – nakiya Feb 13 '14 at 08:56
3

The benchmark process involves writing something into the disk and flushing it in a loop

Well, no it doesnt, compiling and running it indicates its not flushing this data at all, so somethings amiss.

I get the following from strace using the options you specified and the environment variables you used..

open("/tmp/glog.home.localdomain.matthew.log.INFO.20140213-224037.24470", O_WRONLY|O_CREAT|O_EXCL, 0664) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fcntl(3, F_GETFL)                       = 0x8001 (flags O_WRONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_WRONLY|O_APPEND|O_LARGEFILE) = 0
...

No open flags here indicating its going to flush it to disk...

How about on write,

write(3, "I0213 22:40:37.370820 24470 glog"..., 46) = 46
fadvise64(3, 0, 4096, POSIX_FADV_DONTNEED) = 0
gettid()                                = 24470
write(3, "I0213 22:40:37.370925 24470 glog"..., 46) = 46
fadvise64(3, 0, 4096, POSIX_FADV_DONTNEED) = 0
gettid()                                = 24470
write(3, "I0213 22:40:37.370987 24470 glog"..., 46) = 46
fadvise64(3, 0, 4096, POSIX_FADV_DONTNEED) = 0
gettid()                                = 24470
...

No flushing there either..

It might be plausible that on the new system it really does write dirty pages to disk after getting the POSIX_FADV_DONTNEED whereas on the old system it does not. This needs to be eliminated from the test and test made fairer to avoid any possible issues relating to how the two systems might treat the disk.

Note that using fadvise this way is stupid, and asking for the pid every time instead of saving it is also stupid, but I digress..

Now, in all honestly I expect similar results to what you were seeing before; but this is a test which actually will flush to disk, unlike what your application is doing..

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
#include <string.h>
#include <sysexits.h>
#include <err.h>
#include <limits.h>

/* Deliberate choice here to force writes 4 times to the same sector */
#define BUFSZ 128

int main() {
  char buf[BUFSZ];
  char path[PATH_MAX];
  char name[NAME_MAX];
  char *home = NULL;
  int fd, i;
  memset(name, 0, NAME_MAX);
  memset(path, 0, PATH_MAX);
  memset(buf, 'A', BUFSZ);
  buf[BUFSZ-1] = '\n';

  /* Figure out some useful path to put this */
  home = getenv("HOME");
  if (!path)
    errx(EX_SOFTWARE, "No HOME environment variable set. I give in!");

  /* Dont use this without using open with O_EXCL! */
  strcpy(name, "writethis.bin.XXXXXX");
  mktemp(name);

  snprintf(path, PATH_MAX, "%s/%s", home, name);

  /* Open the file with flushy flags */
  fd = open(path, O_WRONLY|O_APPEND|O_EXCL|O_CREAT|O_SYNC,
                                          S_IRUSR|S_IWUSR);
  if (fd < 0)
    err(EX_OSERR, "Cannot open file");

  /* Just need an inode, dont want it showing up in VFS.. */
  if (unlink(path) < 0)
    err(EX_OSERR, "Unlink failed. Something horrible probably happened");

  /* Dont pollute cache */
  if (posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED))
    err(EX_OSERR, "Fadvise failed?!");

  /* Write */    
  for (i=0; i < 1000; i++) {
    if (write(fd, buf, BUFSZ) < 0)
      err(EX_OSERR, "Cannot write to file");
  }

  close(fd);
}

Please run this in time and give the results.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
  • Ran your program. In the old system it takes 10.157 seconds while in the new one it takes 0.082 seconds! So, does this mean this is a GLOG problem? – nakiya Feb 14 '14 at 02:06
  • Well, now you have conflicting results, but this test more accurately tests the assertion that you have a disk problem, so it *probably* bears more relevence as a more accurate benchmark. – Matthew Ife Feb 14 '14 at 02:08
  • Also, this is writing to /home not /tmp. Note that this may be changing the results of the test too! If you know how, alter the paths to write to /tmp instead. – Matthew Ife Feb 14 '14 at 02:11
  • @Matthewlfe: I changed the path to /tmp before running the program. – nakiya Feb 14 '14 at 02:12
  • My suggestion then is your assertion there is a disk problem is now at best non-existent or at worse *undefined*. If you want to be super sure profile it using a more sophisticated tool like `fio` to try out different IO workloads. – Matthew Ife Feb 14 '14 at 02:14
  • I updated question with comparisons of two bonnie++ runs on the two servers which also suggest that there is nothing wrong with disk. Is it safe to assume that this is a GLOG related problem? – nakiya Feb 14 '14 at 02:21
  • Possibly, its doing weird things on writes, but I dont know the root cause as to why fadvise behaves differently on the old system to the new system. Ultimately that is probably irrelevant since the way its using these system calls is not the way they are meant to be utilized. – Matthew Ife Feb 14 '14 at 02:26
  • Running my program with `strace -ct` indicates that in the in the older server `write` and `fadvise64` take 0usecs. But in the newer one these are 2usecs and 1usec respectively – nakiya Feb 14 '14 at 02:35
2

In almost all cases, our Dell server performance issues are related to the RAID card in use. Some of the cards they sell have abysmal Linux performance.

cf. http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS vs. http://hwraid.le-vert.net/wiki/LSIFusionMPTSAS2

to compare the two cards you have in those machines. The latter specifically is a low-end card that probably has no hardware acceleration at all while the former is a very good high-end card with good hardware on-board.

mikebabcock
  • 390
  • 3
  • 11
  • running `lspci -nn | grep RAID` in new server gives `03:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] [1000:005b] (rev 05)` – nakiya Feb 14 '14 at 09:27
0

As many have stated, you may need to ensure your test is actually measuring what you think it is - the problem could very well lie within the test itself, and the perceived latency or slowness could be due to I/O being tied up within the kernel/OS before it even reaches the disks.

You could consider reviewing Dell's documentation on performance tuning within the BIOS settings of 12th gen servers - most don't know this documentation exists, but you would be surprised the differences it can make.

http://en.community.dell.com/cfs-file.ashx/__key/telligent-evolution-components-attachments/13-4491-00-00-20-24-87-40/12g_5F00_bios_5F00_tuning_5F00_for_5F00_performance_5F00_power.pdf

JimNim
  • 2,736
  • 12
  • 23
  • `and the perceived latency or slowness could be due to I/O being tied up within the kernel/OS before it even reaches the disks` : How to figure out if this is the case? – nakiya Feb 14 '14 at 07:02