3

OS: CentOS 5.5 64-bit
Software: Postgresql
Hardware: Sun X4200; dual-core AMD Opteron 1Ghz (x4); 8GB RAM; LSILogic raid controller + 2x146GB 10k drives.

Running net-snmp and using Traverse to monitor.

Seeing constant 2,000+ system interrupts per second. Traverse flags this as "critical" (default config) .... Is this number truly something to be concerned about?

looking at high number results for interrupts:

[~]# cat /proc/interrupts   
           CPU0       CPU1       CPU2       CPU3         
 14:        136   54655160    2332995     722234    IO-APIC-edge  ide0  
 66:        618  329180300   20802132     172490   IO-APIC-level  ohci_hcd:usb2  
 74:       4949   16107320    2295957     846017   IO-APIC-level  ioc0  
 82:         22  662837259        233  129090405   IO-APIC-level  eth0  
 90:        723  505860358          0   18967685   IO-APIC-level  eth2  
NMI:     187529     250006     100435     166795   
LOC: 2140313519 2140313343 2140313287 2140313203   
ERR:          0  
MIS:          0  

An additional question about the above output: Why do ide0 and usb2 show constant accrual of interrupts, even though there is no USB device connected, and the IDE device (CDROM) is not in use? This question is for my own curiosity.

80skeys
  • 745
  • 2
  • 8
  • 15

2 Answers2

1

LOC interrupts running at 1000 Hz are normal with those kernels — there is no dynamic tick support in that kernel version, therefore the timer interrupt is running constantly. Other interrupts are probably normal too, if there is high network and disk load on the system.

The most suspicious is the ohci_hcd:usb2 interrupt — maybe some USB device is (or was) misbehaving (or just heavily used, then it's normal).

Sergey Vlasov
  • 6,088
  • 1
  • 19
  • 30
  • There is high network and disk I/O, so thanks for your explanation.... Regarding the USB, there is no USB device connected. Very rarely, I will connect a USB mouse and keyboard at the console, then disconnect immediately after I'm done using them. Last time about 1 month ago. – 80skeys Dec 14 '10 at 19:43
  • Sergey: Is there a more "reasonable" threshold I can use in my monitoring software? What number of interrupts/sec should be considered for a "Warning" value? I know this depends on a lot of things, so maybe just a ballpark value so I'm not randomly picking numbers from a hat... – 80skeys Dec 16 '10 at 15:40
0

I see them too, so I'm thinking not:

http://www.teaparty.net/munin/net/teaparty.net-irqstats.html

(Hardware was completely replaced last June, hence the sudden rise)

This serverfault article and this offsite article that it references are also thought-provoking.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • I notice yours shows 0 nonmaskable interrupts (NMI), while my system shows NMI accruing at a rate of about one or two every five to ten seconds. According to Wikipedia (http://en.wikipedia.org/wiki/Non-maskable_interrupt) NMI usually indicates hardware failure. Does anyone have experience with the NMI indicator? – 80skeys Dec 14 '10 at 19:13
  • This may just be the NMI watchdog, which complains and prints a backtrace if a CPU is stuck with disabled interrupts for a long time (which can happen due to a bug or hardware misbehavior). – Sergey Vlasov Dec 14 '10 at 19:30
  • so on a multi-cpu machine, not a big deal? – 80skeys Dec 14 '10 at 19:38