Mesos → Completed Tasks i Sandbox In stdout file, i can see killTask signal:
Received killTask for task sources.b4e2c8e6-5b42-11e7-aec0-024227901b13
Full snap of stdout file is as follows. You can see even after receiving killTask
signal my process is still running. i.e. my process doesn't terminates itself.
2017-06-27 14:16:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:48,337 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:58,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:08,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:38,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:08,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:38,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:18,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:18,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:48,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far Received killTask for task sources.b4e2c8e6-5b42-11e7-aec0-024227901b13 2017-06-27 14:20:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far
Full Snap of stderr file is as follows:
I0627 19:42:51.959991 7613 fetcher.cpp:533] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0\/frameworks\/0e528b66-37aa-4d7a-933e-4638aabf494a-0000\/executors\/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13\/runs\/219c102b-28ae-41d5-b98f-11829315119e"} I0627 19:42:51.963241 7613 fetcher.cpp:444] Fetching URI 'file:///etc/docker.tar.gz' I0627 19:42:51.963279 7613 fetcher.cpp:285] Fetching directly into the sandbox directory I0627 19:42:51.963295 7613 fetcher.cpp:222] Fetching URI 'file:///etc/docker.tar.gz' I0627 19:42:51.964923 7613 fetcher.cpp:207] Copied resource '/etc/docker.tar.gz' to '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' I0627 19:42:52.070482 7613 fetcher.cpp:123] Extracted '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' into '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e' I0627 19:42:52.070533 7613 fetcher.cpp:582] Fetched 'file:///etc/docker.tar.gz' to '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' I0627 19:42:56.096325 7643 exec.cpp:162] Version: 1.3.0 I0627 19:42:56.101958 7647 exec.cpp:237] Executor registered on agent 632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0 WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 221 100 138 100 83 8657 5207 --:--:-- --:--:-- --:--:-- 9200 E0627 19:51:03.219312 7652 process.cpp:951] Failed to accept socket: future discarded
Messages Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. and Failed to accept socket: future discarded seems to be culprit which are killing my container.
My question is who is killing my container after 5 to 10 minutes again and again?
I also updated the /etc/default/grub
file with
GRUB_CMDLINE_LINUX_DEFAULT="cgroup_enable=memory swapaccount=1"
and rebooted my system but no progress.
Any insights on this issue.
My Ubuntu VMWare Configuration is as:
- Cores assigned 3
- memory : 6GB
- HDD: 32 GB
- I am running only one container still it exits after few minutes.
[EDIT: Adding stderr file contents from mesos UI at: /var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000 executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284
]
Adding stderr file content of another job.
I0628 10:15:45.951104 4357 fetcher.cpp:533] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/29df799b-4797-41df-a005-465f211d286b-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/29df799b-4797-41df-a005-465f211d286b-S0\/frameworks\/0e528b66-37aa-4d7a-933e-4638aabf494a-0000\/executors\/sources.a634642c-5bbc-11e7-ba8b-024239f32c24\/runs\/1bda209c-c2b8-4bb5-a41b-26361e00a284"} I0628 10:15:45.953835 4357 fetcher.cpp:444] Fetching URI 'file:///etc/docker.tar.gz' I0628 10:15:45.953881 4357 fetcher.cpp:285] Fetching directly into the sandbox directory I0628 10:15:45.953974 4357 fetcher.cpp:222] Fetching URI 'file:///etc/docker.tar.gz' I0628 10:15:45.956663 4357 fetcher.cpp:207] Copied resource '/etc/docker.tar.gz' to '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' I0628 10:15:46.061069 4357 fetcher.cpp:123] Extracted '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' into '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284' I0628 10:15:46.061148 4357 fetcher.cpp:582] Fetched 'file:///etc/docker.tar.gz' to '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' I0628 10:15:49.898803 4389 exec.cpp:162] Version: 1.3.0 I0628 10:15:49.903390 4390 exec.cpp:237] Executor registered on agent 29df799b-4797-41df-a005-465f211d286b-S0 WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 221 100 138 100 83 5385 3239 --:--:-- --:--:-- --:--:-- 11500 W0628 10:15:49.903390 4389 logging.cpp:91] RAW: Received signal SIGTERM from process 3287 of user 0; exiting
No new logs created in /var/lib/mesos-master.ERROR
file today
Content of /var/log/mesos-master.WARNING
file:
Log file created at: 2017/06/28 10:04:56 Running on machine: ubuntu Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg W0628 10:04:56.387049 3193 authenticator.cpp:512] No credentials provided, authentication requests will be refused W0628 10:14:56.617103 3221 master.cpp:2011] Agent 632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0 (ubuntu) did not re-register within 10mins after master failover; marking it unreachable
Content of /var/log/mesos-slave.WARNING
file is same as that if mesos-slave.ERROR
file.
Content of /var/log/mesos-slave.ERROR
file:
Log file created at: 2017/06/28 10:05:00 Running on machine: ubuntu Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0628 10:05:00.712286 3287 shell.hpp:107] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found E0628 10:24:45.502921 3326 slave.cpp:4496] Failed to update resources for container 1bda209c-c2b8-4bb5-a41b-26361e00a284 of executor 'sources.a634642c-5bbc-11e7-ba8b-024239f32c24' running task sources.a634642c-5bbc-11e7-ba8b-024239f32c24 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/4469/cgroup: Failed to open file: No such file or directory E0628 10:33:45.789072 3327 slave.cpp:4496] Failed to update resources for container 858170ce-0775-48be-8c85-3a1dbf320569 of executor 'sources.e7e069ed-5bbd-11e7-ba8b-024239f32c24' running task sources.e7e069ed-5bbd-11e7-ba8b-024239f32c24 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/5215/cgroup: Failed to open file: No such file or directory
I observed that the message:
Failed to read /proc/5215/cgroup: Failed to open file: No such file or directory
comes only when container/task get killed. Whereas these files exists for current running containers. Thanks.