What is the meaning of the value 'tx_timeouts' of ethtool?

1

Using ethtool (Version 6) gives e.g. the following output:

$ ethtool -S eth0
NIC statistics:
     early_rx: 0
     tx_buf_mapped: 0
     tx_timeouts: 142
     rx_lost_in_ring: 0

What is the meaning of the value for tx_timeouts? What does the number 142 count?

Alex

Posted 2014-08-07T05:51:56.187

Reputation: 337

Answers

2

From O'reilly Linux Device Drivers section 'Transmission Timeouts':

Most drivers that deal with real hardware have to be prepared for that hardware to fail to respond occasionally. Interfaces can forget what they are doing, or the system can lose an interrupt. This sort of problem is common with some devices designed to run on personal computers.

Many drivers handle this problem by setting timers; if the operation has not completed by the time the timer expires, something is wrong. The network system, as it happens, is essentially a complicated assembly of state machines controlled by a mass of timers. As such, the networking code is in a good position to detect transmission timeouts automatically.

Thus, network drivers need not worry about detecting such problems themselves. Instead, they need only set a timeout period, which goes in the watchdog_timeo field of the net_device structure. This period, which is in jiffies, should be long enough to account for normal transmission delays (such as collisions caused by congestion on the network media).

If the current system time exceeds the device's trans_start time by at least the timeout period, the networking layer will eventually call the driver's tx_timeout method. That method's job is to do whatever is needed to clear up the problem and to ensure the proper completion of any transmissions that were already in progress. It is important, in particular, that the driver not lose track of any socket buffers that have been entrusted to it by the networking code.

So it seems the tx_timeout struct is there to make sure the system doesn't lock up when something goes wrong in the hardware. I have no idea why yours is not 0, but it might have something to do with the NIC driver.

mtak

Posted 2014-08-07T05:51:56.187

Reputation: 11 805

2

tx_timeout is, strictly speaking, the number of times the device driver's routine for handling timeouts has been called.

A transmission timeout occurs whenever transmission hardware fails to respond. This occurs in real life because, for instance, an interrupt is lost, or because your NIC has forgotten what is was doing. It is by no means a rare occurrence, especially on pcs.

Device drivers are designed to deal with these occurrences by means of a timer: this marks the time within which transmission should occur; if it doesn't, control is transferred to the tx_timeout routine which will take appropriate actions to resolve problem and to complete the transmission job. It will also mark the occurrence of the timeout in the driver's statistics, and restore it to a healthy, pristine state, such that both the completion of the current job, and the resumption of the queue processing can take place.

The number of events you have marked is small, and by no means worrysome. Should the problem persist, you may wish to investigate the presence of additional error messages in dmesg, and the various log files. As it stands, this simple number is not sufficient for a proper diagnosis.

MariusMatutiae

Posted 2014-08-07T05:51:56.187

Reputation: 41 321