5

A multi-cpu server is running several processes. One process has a thread that should always be in a spinning state, using 100% of the CPU it's been assigned. My current method (besides asking the developer...) is using strace on the process which waits for information to arrive at it's open file descriptor and checks it continuously using recvfrom(2) where erno is set to EAGAIN and method is returning -1 when no packets are to be read from network socket.

I'm not comfortable stack tracing production set-ups, and it's a unwieldy way of determining this information at best. I was poking about proc(5) and thought that the value of the flags field in /proc/[pid]/fdinfo might be useful to check if that process was using a socket that called open(2) with the O_NONBLOCK mode.

I'm struggling to reverse engineer this value at the moment. I know it represents the bitwise OR of the file status and file mode. So I think I can check the source headers for the value of constants open(2) uses on that particular kernel and then bitwise OR them until I find a value that matched what's in fdinfo. That seems rather clunky, if anybody can validate the above method (I can't yet) or provide a more elegant solution I'd be much obliged.

I also know fnctl(2) can set a file descriptor to a non-blocking state, but am treating that equivalent to open for the moment

inetplumber
  • 680
  • 4
  • 9
  • wouldn't the top command show the process using 100% of one core of your CPU ? – Sirex Mar 31 '14 at 21:53
  • Why does it matter to be able to prove that the program is misbehaving in some other way than `strace`? You already know it's broken, go send it back to the developers. – Michael Hampton Mar 31 '14 at 21:55
  • using top is valid, but not definitive, processes can use 100% of CPU yet still not be running in non-blocking mode. A human could easily tell watching top, but a script check just polling top would occasionally report false positives. @MichaelHampton - non-blocking is the desired functionality and not a bug in this case – inetplumber Mar 31 '14 at 22:00
  • The `s` prefix in `strace` is not "stack". – Ben Voigt Apr 01 '14 at 06:30

1 Answers1

11

Yes, this is a valid way to check that the socket is non-blocking.

The value for a non-blocking socket is 04000, non-blocking sockets in /proc/<pid>/fdinfo are represented in octal.

You can validate this behaviour with python.

Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from socket import *
>>> import os
>>> from os import O_NONBLOCK
>>> s = socket(AF_INET, SOCK_STREAM)
>>> s.setblocking(0)
>>> print open("/proc/self/fdinfo/{0}".format(s.fileno())).read(4096)
pos:    0
flags:  04002

>>> if 04002 & O_NONBLOCK:
...   print "yes"
... else:
...   print "no"
... 
yes

So, now you know how, I must point out that your developer is doing it wrong. If non-blocking sockets are something they want to use, thats fine - however they should setup an epoll(2) on the socket and block on the poll instead.

The program gains nothing from read(2) on a non blocking socket that produces EAGAIN -- as a matter of fact, the result is worse because nearly all system calls are a preemption point where the kernel can context switch you anyway.

This developer is wasting power, CPU cycles that could be used for idling threads and is not actually gaining any benefits he/she things they are from doing it this way.

If the developer wants to be 'cache-line' friendly, pin his tasks to a particular CPU and be done with it.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71