High I/O system hanging

5

1

I have a Fedora 20 box that i'm trying to import a massive amount of data in to postgres on (billions of rows). System spec:

16GB RAM, 6 core 3.2 ghz, 500GB RAID0 PCIe SSD

Everytime I/O load gets very high (such as when indexing) the box (soft) hangs after a while. Keyboard interrupts don't work, no caps lock, HDD light off.

Before hanging, the UI becomes very slow and flickers when redrawing. I have tried tweaking the I/O scheduling, running off the SSD RAID0 and a slow SATA disk. Also tried combos of non-soft RAID backed storage and RAID0'd incase md was the cause. Basically, under high I/O the system is unusable and will often crash.

Temps, RAM usage and CPU usage are all fine. Memcheck is fine as is CPU test.

Beginning to think that it might be a motherboard fault? Anything else I can try software wise? I'm pretty sure this amount of I/O lockup isn't normal.

Aiden Bell

Posted 2014-03-26T13:19:56.673

Reputation: 692

Is it only happening while importing this data? How exactly are you importing it into Postgres? – Ƭᴇcʜιᴇ007 – 2014-03-26T13:23:03.837

If I copy large files, the system is stalling, but recovers once disk IO finishes. The data is the OSM database dump through the import tool which is disk IO bound. Importing through local socket rather than network. I get 700MB/s write on my SSD but can't finish the import due to crashing. – Aiden Bell – 2014-03-26T13:24:27.913

Also as a note, I have kernel crash dumping enabled, but the system is so hung I never get a dump to debug. – Aiden Bell – 2014-03-26T13:33:41.227

So even a large dd from /dev/zero to a file will hang your system eventually (not due to filesystem full, etc)? – rickhg12hs – 2014-03-26T21:44:19.830

No, file transfers are just sluggish disk-to-disk, not from psudo-file. Interestingly, when doing big disk IO, the system might look fine, but if you move the mouse or press a key, the whole thing stalls for ~30 seconds. I appriciate IO and the OS switching between IOs is a bottleneck, but enough to bog down the system to a crash? – Aiden Bell – 2014-03-27T09:56:07.233

Update: Looking through logs, I may have a faulty SSD (not used for tablespace, but for OS). Many ata1.00: failed command: WRITE FPDMA QUEUED logs – Aiden Bell – 2014-03-27T11:51:04.993

My Fedora 20 system hangs too when there's intensive I/O, but I haven't figured out yet why. – Cristian Ciupitu – 2014-05-29T22:15:49.453

Well I ended up killing my mobo with IO, so that's worth checking. But removing the soft RAID helped things too. – Aiden Bell – 2014-05-30T14:31:55.583

Answers

0

Replaced the motherboard. That solved the issue for the import, but the system still hangs for short periods when doing intensive I/O. Removed the software RAID (md) and that seemed to help.

Aiden Bell

Posted 2014-03-26T13:19:56.673

Reputation: 692

0

Maybe you can check your autovacuum. As you are adding a lot of new rows, you can increase the autovacuum_vacuum_threshold parameter, or disable autovacuum completely. Use tools like iotop and iostat in order to get more information about the I/O of your system.

You can find more information about autovacuum here.

Eduardo Ramos

Posted 2014-03-26T13:19:56.673

Reputation: 178