3

In trying to analyze a 3thparty application's throughput we have collected a WPR trace on two systems. One has Symantecs Intrusion Detection System enabled, the other one has not.

Following observations for the system with Intrusion Detection System enabled

  • Many NDIS.SYS - ndisInterruptDpc function calls that take > 100µs.
    Over a period of 10s: 1.966 fragments out of 17.273 took longer than 100µs.
    Microsoft's recommendation for DPC is to not run longer than 100 microseconds.

  • The Microsoft-Windows-TCPIP provider shows everything is using TcpipSendSlowPath.
    I tried reading up on slow/fast paths but not really any idea what to make of this. Everything I can find about it is about routers or dedicated hardware and I doubt ETW can get any data from them (unless it's passed back in the request/response somewhere?).

  • The Stacksview shows the packets going through IDSvia64.sys wich is Symantecs driver for packet inspection. There's a lot more allocating and freeing of Pool memory because of this.

  • psping tests for bandwith are a factor 10 slower.

With that being said: the perceived performance of the 3thparty application on both servers is comparable. The only tangible difference to date are the psping tests being much worse on the system with IDS. A conondrum to me... .

Questions

  • How to find the culprit for taking to much time on DPC's? I suspect IDS is involved but I can not link the handling of DPC's to IDS (or anything else).
  • I would love to know what the Slow path actually means, what performance impact it might have and how to resolve it if possible/needed.

Edit

following the link provided by @Brian, this is how our DPC duration looks like on the system with IDS enabled (green) and on a system without IDS (blue)

  • for a duration of 10s
  • performing a task known to take some time

I'm pretty convinced that IDS is responsible for the longer duration of DPC's. If anyone can give some pointers on the second part of the question for the ammounts of TcpipSendSlowPath, there might be something worth investigating there too.

DPC Duration

A note on the fact that the amount of DPC's on the system with IDS is much larger.

  • The easiest explanation for this is that the system was used at the time by others (it's hard to find a slot with minimum activity)
  • I can imagine the settings of the NIC having a role in this but I wouldn't know where to look for that (depending on buffer size, throughput, ... more ISR's and DPC's getting triggered)
  • others (?) ...
  • 2
    If you turn off IDS and things improve then that would be a strong indication. Have you read : https://blogs.technet.microsoft.com/craigf/2014/02/03/a-backup-server-flooded-by-dpcs/ – Brian Jun 03 '18 at 11:32
  • @Brian - It's a kind of a chicken and egg problem. We are not going to be allowed to disable IDS on that production system without a solid reasoning why. The only perceivable performance impact between that production system and a test system without IDS is a psping bandwith test. There's no perceivable impact on the application that's being used. I have a strong feeling that there is but we haven't been able to trace that exact difference yet. The blog entry you've posted looks like a match *(apart from CPU spikes and stall's)*. Thank you for this find *(perhaps post an answer?)* – Lieven Keersmaekers Jun 04 '18 at 06:14
  • 1
    Inbound TCP packets take the 'slow path' through the tcp receive stack whenever the tcp segment is not either an ack or the next expected segment in an existing stream. This is not something to be concerned about, it just means that tcp is behaving normally for your traffic. It would be interesting if this is different on the two systems, and doing a packet trace would allow you to see why the slow path is being taken. for instance, is the IDS driver adding some tcp flags that were not expected. – smithian Jun 06 '18 at 14:40
  • @smithian - thanks, I'll try to get representable packet trace on both systems. – Lieven Keersmaekers Jun 06 '18 at 16:26
  • @smithian - unfortunately, a packet trace did not reveil any substantial difference between both systems so I conclude that the slow path route is off the table. – Lieven Keersmaekers Jun 11 '18 at 08:28

0 Answers0