29

When running any sort of long-running command in the terminal, the program instantly dies and the terminal outputs the text Killed.

Any pointers? Maybe there is a log file with data explaining why the commands are being killed?

Update

Here is a snippet from dmesg that hopefully should illuminate what's causing the issue. Another note that might be helpful is that this is an Amazon EC2 instance.

May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184209] Call Trace:
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184218]  [<c01e49ea>] dump_header+0x7a/0xb0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184221]  [<c01e4a7c>] oom_kill_process+0x5c/0x160
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184224]  [<c01e4fe9>] ? select_bad_process+0xa9/0xe0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184227]  [<c01e5071>] __out_of_memory+0x51/0xb0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184229]  [<c01e5128>] out_of_memory+0x58/0xd0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184232]  [<c01e7f16>] __alloc_pages_slowpath+0x416/0x4b0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184235]  [<c01e811f>] __alloc_pages_nodemask+0x16f/0x1c0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184238]  [<c01ea2ca>] __do_page_cache_readahead+0xea/0x210
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184241]  [<c01ea416>] ra_submit+0x26/0x30
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184244]  [<c01e3aef>] filemap_fault+0x3cf/0x400
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184247]  [<c02329ad>] ? core_sys_select+0x19d/0x240
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184252]  [<c01fb65c>] __do_fault+0x4c/0x5e0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184254]  [<c01e4161>] ? generic_file_aio_write+0xa1/0xc0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184257]  [<c01fd60b>] handle_mm_fault+0x19b/0x510
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184262]  [<c05f80d6>] do_page_fault+0x146/0x440
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184265]  [<c0232c62>] ? sys_select+0x42/0xc0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184268]  [<c05f7f90>] ? do_page_fault+0x0/0x440
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184270]  [<c05f53c7>] error_code+0x73/0x78
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184274]  [<c05f007b>] ? setup_local_APIC+0xce/0x33e
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272161]  [<c05f0000>] ? setup_local_APIC+0x53/0x33e
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272163] Mem-Info:
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272164] DMA per-cpu:
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272166] CPU    0: hi:    0, btch:   1 usd:   0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272168] Normal per-cpu:
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272169] CPU    0: hi:  186, btch:  31 usd:  50
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272171] HighMem per-cpu:
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272172] CPU    0: hi:  186, btch:  31 usd:  30
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272176] active_anon:204223 inactive_anon:204177 isolated_anon:0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272177]  active_file:47 inactive_file:141 isolated_file:0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272178]  unevictable:0 dirty:0 writeback:0 unstable:0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272179]  free:10375 slab_reclaimable:1650 slab_unreclaimable:1856
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272180]  mapped:2127 shmem:3918 pagetables:1812 bounce:0May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272186] DMA free:6744kB min:72kB low:88kB high:108kB active_anon:300kB inactive_anon:308kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15812kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:8kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272190] lowmem_reserve[]: 0 702 1670 1670May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272197] Normal free:34256kB min:3352kB low:4188kB high:5028kB active_anon:317736kB inactive_anon:317308kB active_file:144kB inactive_file:16kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:719320kB mlocked:0kB dirty:4kB writeback:0kB mapped:32kB shmem:0kB slab_reclaimable:6592kB slab_unreclaimable:7424kB kernel_stack:2592kB pagetables:7248kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:571 all_unreclaimable? yes
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272201] lowmem_reserve[]: 0 0 7747 7747May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272207] HighMem free:500kB min:512kB low:1668kB high:2824kB active_anon:498856kB inactive_anon:499092kB active_file:44kB inactive_file:548kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:991620kB mlocked:0kB dirty:0kB writeback:0kB mapped:8472kB shmem:15672kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:430 all_unreclaimable? yes
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272211] lowmem_reserve[]: 0 0 0 0May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272215] DMA: 10*4kB 22*8kB 38*16kB 33*32kB 16*64kB 10*128kB 4*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 6744kBMay 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272223] Normal: 476*4kB 1396*8kB 676*16kB 206*32kB 23*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 34256kBMay 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272231] HighMem: 1*4kB 2*8kB 28*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 500kB
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272238] 4108 total pagecache pages
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272240] 0 pages in swap cache
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272242] Swap cache stats: add 0, delete 0, find 0/0
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272243] Free swap  = 0kB
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272244] Total swap = 0kB
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276842] 435199 pages RAM
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276845] 249858 pages HighMem
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276846] 8771 pages reserved
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276847] 23955 pages shared
May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276849] 405696 pages non-shared
dwlz
  • 891
  • 3
  • 10
  • 19

5 Answers5

40

You should be able to find out what killed your process by looking at the output of the dmesg command; or at the logfiles /var/log/kern.log, /var/log/messages, or /var/log/syslog.

There are a number of things that can cause a process to be summarily killed:

  • If it exceeds the hard ulimit for various memory or cpu usage types that you can examine using ulimit -H -a
  • If the system is low on virtual memory, processes can get killed by the kernel oom-killer to free up memory (In your case, it's probably not this)
  • If the system has SELinux, and/or PaX/grsecurity installed, a process could get killed if it tries to do something that's not allowed by security policy, or if it tries to execute self-modified code.

The logs or dmesg should tell you why the process was killed.

Heath
  • 1,240
  • 9
  • 4
  • Thanks for your answer! Just checked out the log files you mentioned, but I can't seem to find much useful data. Check out the update to my answer to see a glimpse. – dwlz May 14 '11 at 20:34
  • 3
    Yep, you're getting bit by the oom-killer; which means you've run out of memory. Try adding some swap space to your instance (even just a few hundred megs of swap can help a lot in a low-memory situation). – Heath May 15 '11 at 15:18
  • For others who wondered how to add swap to an EC2 instance, this answer helped me (after SSHing into the instance): https://stackoverflow.com/a/17173973/4900327 – Abhishek Divekar Jul 15 '17 at 10:23
  • actually it's not necessarily a memory heavy process, it might be some long process, yet not taking the whole memory, correct? @Heath – boldnik Jul 07 '20 at 22:37
12

The logs you posted as in update indicate your system is running out of memory and the OOM killer is being invoked to kill off processes in order to maintain free memory when "all else fails". The selection algorithm for the OOM killer might be favorably targeting your "long running" processes. See the linked page for a description of the selection algorithm.

The obvious solution is more memory but you might be running out of memory due to a memory leak somewhere and adding more memory would likely only delay the OOM killer being invoked if that's the case. Check your process table for processes using the most memory with your favourite tool (top, ps, etc.) and go from there.

rthomson
  • 1,059
  • 9
  • 14
  • The OOM killer has a definite preference for long running, low activity processes. Having it kill sshd on a production server makes debugging tricky. – mfarver May 15 '11 at 20:03
  • Sshd adjusts it's own /proc/pid/oom_adj score so it can't be killed by oom killer (before it kills everything else). – yaplik May 17 '11 at 01:22
  • @yaplik This doesn't seem to apply any longer to recent distributions. As child processes inherit the value of oom_adj, a malicious user might cause a DoS by consuming all the memory without his/her processes being killed by the OOM killer. – ikso May 17 '11 at 14:27
5

As already explained by others, you're running out of memory, so out of memory killer gets triggered and kills some process.

You can fix this doing either:

a) upgrade your ec2 machine to more powerful one, 'small instance' has 2.5x more memory (1.7GB) than 'micro instance' (0.64GB), costs additional money

b) adding swap partition - add additional EBS drive, mkswap /dev/sdx, swapon /dev/sdx, costs EBS storage and IO fees

c) adding swap file - dd if=/dev/zero of=/swap bs=1M count=500, mkswap /swap, swapon /swap, costs IO fees and free space on root EBS

The c) should be sufficient, but keep in mind that micro instance is not supposed to run long-running cpu-intensive tasks due cpu limits (only short bursts allowed).

yaplik
  • 401
  • 2
  • 3
3

I had the same problem. My processes were being killed.

I found out that the Ubuntu AMI I was using did not have a swap space set up. When the memory is full and there is no swap space available, the kernel will unpredictably start killing processes to protect itself. Swap space prevents that. (This problem is especially relevant to the Micro instance because of the small 613 MB of memory.)

To check if you have a swap space set up type: swapon -s

Set up swap space: http://www.linux.com/news/software/applications/8208-all-about-linux-swap-space

Other resources: http://wiki.sysconfig.org.uk/display/howto/Build+your+own+Core+CentOS+5.x+AMI+for+Amazon+EC2

Delicious
  • 131
  • 4
  • Worked for me! My dmesg contained only many "select _proccess_name_ to kill" one after the other and I had no /var/log/messages or any useful logs, but running "free -h" showed there was almost no memory left. Many thanks! – divieira Sep 14 '13 at 21:32
1

The log says you are running out of swap/cache memory.

    May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272240] 0 pages in swap cache
    May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272242] Swap cache stats: add 0, delete 0, find 0/0
    May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272243] Free swap  = 0kB
    May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272244] Total swap = 0kB

Can you split the job/process you are running in batches? Perhaps you can try running it in isolation after stopping the other processes?