Xserver becomes non-responsive due to aggressive swapping

2

cat /proc/sys/vm/swappiness
1

free -h
              total        used        free      shared  buff/cache   available
Mem:           3.7G        579M        1.6G        1.1G        1.6G        1.8G
Swap:          1.0G        144K        1.0G

I have a Firefox bug running amok - XServer becomes unresponsive for 10+ seconds, sometimes requiring a hard-shutdown of the desktop - here is vmstat usage as I trigger it and then finally manage to terminate the process:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp-----
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st                 EDT
 0  0      0 193940 2540804 1002716    0    0     0     0  827 2947  2  1 97  0  0 2015-05-16 16:13:30
 0  0      0 193940 2540804 1002740    0    0     0     0  785 3747  3  2 96  0  0 2015-05-16 16:13:31
 0  0      0 196064 2540804 1000400    0    0     0     0  865 4064  6  1 93  0  0 2015-05-16 16:13:32
 0  0      0 195692 2540804 1000524    0    0     0     0  790 4699  4  2 94  0  0 2015-05-16 16:13:33
 0  0      0 195532 2540804 1000644    0    0     0   857  866 4770  5  1 94  0  0 2015-05-16 16:13:34
 0  0      0 195284 2540804 1000660    0    0     0    48  743 3755  2  1 97  0  0 2015-05-16 16:13:35
 0  0      0 195284 2540804 1000700    0    0     0     0  758 4037  3  1 96  0  0 2015-05-16 16:13:36
 1  0    148 119156 2745740 893480    0  148  3356   148 10443 7868 33 15 51  1  0 2015-05-16 16:13:37
 0  2 225360 126552 2764572 867432    0 225212 24260 225252 15027 2811  4  7 49 40  0 2015-05-16 16:13:38
 0  3 427808 121044 2756804 875764    0 202448 20084 202812 1825 1717  9  7 52 32  0 2015-05-16 16:13:39
 0  2 549012 136656 2740740 876064    0 121204  5060 121204 1327 1573 10  4 60 27  0 2015-05-16 16:13:40
 0  0 613996 139208 2741352 878332    0 64984 15284 65048 1169 1586  2  2 81 15  0 2015-05-16 16:13:41
 0  2 765516 131056 2743152 878224    0 151520   644 151520  517  981  9  4 77 10  0 2015-05-16 16:13:42
 1  0 908672 184712 2691932 877260    0 143156  3676 143156  638 1094  3  3 62 32  0 2015-05-16 16:13:43
 1  1 906072 164200 2717744 873548 2160    0  9124     0 1137 2246  8  2 81 10  0 2015-05-16 16:13:44
 1  0 1028568 217116 2662856 877792 5632 128156 15956 128212 1344 2189  8  3 61 28  0 2015-05-16 16:13:45
 0  0 1028568 214064 2663532 879556    0    0   344     0  789  899  1  1 98  0  0 2015-05-16 16:13:46
 0  0 1028564 207536 2667876 881500    0    0  6456    12  962 2349  3  1 95  1  0 2015-05-16 16:13:47
 0  0 1028564 202708 2669568 884756    0    0  2284     0  733  874  2  1 97  1  0 2015-05-16 16:13:48
 0  0 1028564 199732 2672484 885092    0    0  3084     0  286  937  1  0 98  1  0 2015-05-16 16:13:49
 0  0 1028564 197004 2674716 885376    0    0  2440     0  259 1080  1  0 98  0  0 2015-05-16 16:13:50
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp-----
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st                 EDT
 1  0 1028564 196756 2674860 885660    0    0    20     0  235  904  1  1 98  0  0 2015-05-16 16:13:51
 0  0 1028564 196740 2674876 885668    0    0     0     0  318 1012  1  1 98  0  0 2015-05-16 16:13:52
 0  0 1028564 196740 2674880 885680    0    0   100    44  656 1003  1  0 98  1  0 2015-05-16 16:13:53
 0  0 1028564 195356 2675532 886456    0    0  1488     0  766 2099  2  2 96  0  0 2015-05-16 16:13:54
 1  0 1028460 168444 2682144 904076   96    0  7588     0 1334 5119  9  3 87  1  0 2015-05-16 16:13:55
 0  0 1028460 152100 2704240 899988    0    0  3216     0  937 2737  6  2 92  1  0 2015-05-16 16:13:56
 0  0 1028460 139956 2709516 905816    0    0   580     0  732 3588  4  1 94  0  0 2015-05-16 16:13:57
 0  0 1028460 139708 2709780 906108    0    0   128     0  814 3768  3  2 95  0  0 2015-05-16 16:13:58
 0  0 1028460 138456 2711344 906320    0    0    12    52  835 4109  3  1 96  0  0 2015-05-16 16:13:59
 0  0 1028460 139588 2711440 904452    0    0     0     0  856 3445  3  1 95  0  0 2015-05-16 16:14:00
 0  0 1028460 142440 2711440 901300    0    0     4   424  887 5238  3  1 96  0  0 2015-05-16 16:14:01
 1  0 1028460 115192 2730492 910708    0    0 18948     8 2228 4712  6  3 87  4  0 2015-05-16 16:14:02
 0  0 1028460 114080 2731276 911640    0    0   396     0  976 4185  6  2 92  0  0 2015-05-16 16:14:03
 1  7 1048572  84512 2776408 900196    0 20112 162296 21044 26462 8939 25 10 33 33  0 2015-05-16 16:14:04
 0 10 1048572 206548 2803588 751176    0    0 1305648   120 77614 206123  0  4 74 22  0 2015-05-16 16:14:23
 0  1 1048572  85332 2865180 812416  528  476 120320  2296 22658 7102 14  8 48 30  0 2015-05-16 16:14:24
 3  0   9944 3423940 143324 203924  340    0 171776   764 6222 17487  2 17 32 49  0 2015-05-16 16:14:25
 0  0   9944 3493676 103216 177288    0    0  3436 15072 1124 1624  5  1 88  6  0 2015-05-16 16:14:26
 0  0   9944 3493676 103280 177316    0    0    68     0  575  684  0  1 99  0  0 2015-05-16 16:14:27
 0  0   9944 3492900 103724 177316    0    0   604     0  596  765  1  0 99  0  0 2015-05-16 16:14:28
 0  0   9944 3492684 104184 177316    0    0   348     0  622  751  0  1 99  0  0 2015-05-16 16:14:29

You can see the 0 swap before the bug triggers and Firefox decides to do some weird stuff with the memory -- X-server becomes non-responsive soon there-after, when it finally responds to the single keybind for kill -9 (ps -aux|grep [f]irefox | awk '{print $2}') you can see the swap go back down to 0.

I don't think this is an OOMKiller problem, but something is seriously wrong with the way this is being handled by the kernel.

user3467349

Posted 2015-05-16T20:26:30.590

Reputation: 151

Answers

1

I would investigate your kernel's OOM-killer configuration (read: "how your distro apparently broke it" :P)

As an immediate practical solution that I am very sure will tangibly help... add more swap. Really.

I... have a bit too much experience with systems with insufficient RAM. :P

When Linux runs out of RAM, before it gets the OOM-killer out it tries REALLY hard to "make do" by constantly purging the various efficiency caches it keeps in memory. You can see this happening when the whole system practically freezes and the disk goes crazy - the kernel's continuously killing the disk cache.

To fix this... add some more swap space. It doesn't fix your problem, but it means your system will stay sufficiently usable that you might be able to track down what's going on.

Note that swap areas can be physical files; just create a new file in / or some corner it won't be touched (because rm'ing the file = insta-kernel panic) with dd, mkswap it, then swapon it. Add the file to /etc/fstab AFTER the line that mounts the filesystem it's on to automount it.

You may also want to explore the fascinatingness that is zram-as-swap.

Also possibly do random target practice with Mozilla's FTP and try random old versions of FF (the Linux binaries run directly without installation, FYI) to identify if older versions go stupid. (EDIT: To see if this is because of a regression.)

(Duh/commonsense/probably already done) Also consider killing extensions and tracking what websites you were at when this occurs.

i336_

Posted 2015-05-16T20:26:30.590

Reputation: 144

+1 for adding swap and / or zram – linuxdev2013 – 2015-05-17T00:25:20.730

The problem isn't what website this occurs on, the problem is that an application related bug shouldn't cause the entire system to hang. And I shouldn't need to add swap or more hardware (even if it were possible to add RAM) to get around bad kernel behaviour. – user3467349 – 2015-05-17T14:19:18.963

@user3467349: You do have a valid point from the perspective of "mathematically perfect design", but consider the architecture here. If a bug in Firefox is causing it to require a lot of RAM, the kernel needs to fulfill those requests. If the kernel runs out of physical RAM it will use swap. If it runs out of swap it will eat its efficiency caches. If those run out it will kill tasks as a last resort. If you want different behavior, concretely define what system actions you want to occur; you may be able to achieve them. (1/3) – i336_ – 2015-05-19T00:04:04.673

@user3467349: Personally I would recommend accepting that this bug is occurring and that the best thing to do about it is make the effort to fix it so it doesn't have to affect other people too. To that effect, I'd start by adding swap to the system to keep it going while these bugs are doing their worst, so you can analyse what's going on and submit information to the Mozilla team. (2/3) – i336_ – 2015-05-19T00:09:06.483

@user3467349: It may turn out that the issues are already known, or the technicalities of the particular issues you're dealing with may be too fiddly to be interesting to explore fully, but it still can't hurt to have a look. (3/3) – i336_ – 2015-05-19T00:11:19.000

@i336_ xserver freezes and the offending task (firefox) does not get killed as it should also my systemd.journal is getting dumped so I'm having trouble getting an exact diagnosis -- but I am looking into it. Of course the bug might turn out to be some very narrow-configuration specific rabbit-hole (my system is pretty modern and standard though). – user3467349 – 2015-05-19T00:58:03.887

@user3467349: Wow, huh. As for OOM-killer configuration, I think that's usually configured in userspace; what distro are you using? – i336_ – 2015-05-19T04:38:38.423