At job we have an old C program which works with industrial hand terminals from Honeywell.
That terminal has its own ssh client to connect to a linux redhat 6.6 server. Once it is connected to the linux box (using a certain user), a C program is launched by the bash shell with the following parameters
export TERM=vt200
stty raw icrnl -echo
$APLI_EXEC/program param1 param2
so the flow is like => client ssh --> ssh server-> bash --> c program
The application (or it seems) is working fine but sometimes (1-3-5 times per week) a randomly terminal stops receiving data from the server but the application receives the inputs from it. It is like if you writes Ctrl+S in a shell.
Debuging the application and the ssh process using strace I realized about something strange:
The app strace is fine
write(1, "1", 7) = 1
But the strace of the ssh process is not fine (I think.. and yes I saw the ioctl no echo param, but...)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(3, "\227\316\242\350\261\330)\300e\210\352\367\2VX\24\305\2474\272\371\34\273n{\323p.\211\17H\327"..., 16384) = 48
select(14, [3 9], [11], NULL, {900, 0}) = 1 (out [11], left {899, 999996})
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(11, "1", 1) = 1
ioctl(11, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
select(14, [3 9], [], NULL, {900, 0} <<<<
file descriptor used by the ssh process:
lr-x------ 1 root root 64 Feb 15 17:12 9 -> pipe:[383586491]
lr-x------ 1 root root 64 Feb 15 17:12 8 -> /var/lib/sss/mc/group
lrwx------ 1 root root 64 Feb 15 17:12 7 -> socket:[383586484]
lrwx------ 1 root root 64 Feb 15 17:12 6 -> socket:[383586478]
lrwx------ 1 root root 64 Feb 15 17:12 5 -> socket:[383586458]
lrwx------ 1 root root 64 Feb 15 17:12 4 -> socket:[383586457]
lrwx------ 1 root root 64 Feb 15 17:12 3 -> socket:[383585929]
lrwx------ 1 root root 64 Feb 15 17:12 2 -> /dev/null
lrwx------ 1 root root 64 Feb 15 17:12 14 -> /dev/ptmx
lrwx------ 1 root root 64 Feb 15 17:12 13 -> /dev/ptmx
lrwx------ 1 root root 64 Feb 15 17:12 11 -> /dev/ptmx
l-wx------ 1 root root 64 Feb 15 17:12 10 -> pipe:[383586491]
lrwx------ 1 root root 64 Feb 15 17:12 1 -> /dev/null
lrwx------ 1 root root 64 Feb 15 17:12 0 -> /dev/null
In the select call, I miss fd #11 or fd #13 in there
Comparing this with another call
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(3, "\365\354\354C\10|\336-\4\342\327B0P\275&\213)\367\32\24\333)#\364\355V\3\237\337\33\204"..., 16384) = 52
select(14, [3 9 13], [11], NULL, {900, 0}) = 1 (out [11], left {899, 999997})
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(11, "a", 1) = 1
ioctl(11, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
select(14, [3 9 13], [], NULL, {900, 0} <<<
What it is going on with fd #13 in the other call?
Is it possible the C program is doing something to lock the file descriptor of ssh process?
It is possible the hand terminal sends a combination of ctrl keys which they 'hangs' the standard output?
I ran out of ideas. Can anybody drive me to the right direction?
February, the 25th. 2016
I have some more information about this:
select(14, [3 9 13], [], NULL, {900, 0}) = 1 (in [13], left {899, 998835}) <<- sshd realizes about data in fd #13 from C application
read(13, "\33[1;23H1\33[1;24H", 16384) = 15 <<- sshd check data from th fd#13
select(14, [3 9 13], [3], NULL, {900, 0}) = 1 (out [3], left {899, 999998}) <<- sshd sends data to fd#3 (socket)
write(3, "\301\236W\250\333\260\r\204\316o]:*1K\203\242\204\257Vb,V\347l\242\352K\341,,\307d\273\277\202.l\32F\2471\257DJt3\36\303\5\256\21K6\27\212\253\326|l\33\270\262S", 64) = 64 (1) <<- sshd encrypts data to be sent
select(14, [3 9 13], [3], NULL, {900, 0}) = 1 (out [3], left {899, 999998}) <<-- sshd sends data thru the socket
select(14, [3 9 13], [], NULL, {900, 0}) = 1 (in [13], left {899, 998569}) <<- sshd realizes about data in fd #13 from C application
read(13, "\7\33[1;16H \33[6;6H_______\33[7;1H -INFORME CANT. RECOGIDA-\33[7;26H", 16384) = 67 <<- sshd check data from th fd#13
select(14, [3 9], [], NULL, {900, 0}) = 1 (in [3], left {892, 12016}) <<- sshd sends data to fd#3 (socket) but... where is fd#13 where sshd has to read it from?
read it from?
the terminal receives "\7\33[1;16H " but the rest of the string " \33[6;6H_______\33[7;1H -INFORME CANT. RECOGIDA-\33[7;26H"
is not received
why?
when the application writes into its output file descriptor (fd#1), sshd receives it (fd#13)..but is fd#1 cleared up at this moment? Or is sshd reading from the file descriptor char by char while it is sendind/encripting data?