no internet while downloading torrents - seems dns related

2

I have a very peculiar problem regarding the internet connection while downloading torrents. Before you conclude that I should "reduce the # of half-open & user connections", let me say I have done that.(10 half open connections, 20 users, it still doesn't work , and I don't get any downloading going on anymore).

I should also say that QoS shouldn't be necessary. usually in my experience with downloading torrents (in linux/windows nad mac) the internet connection was shared among all processes. Here it seems like torrents are chewing on all the available bandwidth. (Shouldn't the kernel be divide time among processes that request to send/receive packages?)

Finally, I should say that this problem started appearing after I updated to slack 64bit v14 (from v13.37).

So, the actual problem seems to be related with dns server not responding once I start download with ktorrent or rtorrent. And no webpages load anymore. torrent will be downloading at reasonable speed, but no websites will be loading. so "nslookup" and "dig" will tell me that the dns server (which btw is located on the same pc) was not found :

nslookup facebook.com
;; connection timed out; no servers could be reached

and

nass@stargaze:~$ dig !$
dig facebook.com
; <<>> DiG 9.9.1-P3 <<>> facebook.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26154
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;facebook.com.                  IN      A

;; Query time: 1125 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Aug  2 01:14:46 2013
;; MSG SIZE  rcvd: 41

restarting the dns server (bind) while the torrent is running will generally NOT fix things, even though sometimes I have seen this happening. stopping the dns , deleting any *.jnl files that were generated and restarting seems to work, but again it may not be always. (I don't have a repeated pattern for this case). I can't say I have found "a way" to get the internet back.

  • usually closing ktorrent and waiting for a few secs could even fix the internet on its own.
  • Other times closing the ktorrent client and restarting dns server would work faster than the previous case.
  • sometimes repeated restarts would NOT get the dns back working (bu waiting for a few mins would fix the prob)
  • recently I have started stopping named, deleting *.jnl files and restarting it. This has had 100% success in my (only 2) trials.

the firewall log, the /var/log/messages/ and named's logs, doesn't register anything weird.

I have not used tcpdump, wireshark, netstat so I don't know if I can use this tools to identify ...something! Could anyone help with this?

Since this problem seems to be related -primarily- to the dns server, I am appending my dns file and explain my pc's configuration abit:

so ADSL internet arrives in the modem (provided by ISP, always on, even when I don't have internet). Modem is connected to this pc on eth1 where I am downloading torrents . this pc is my home network and file server (and my desktop when I am away - i connect using nx). It is running iptables, dns, & squid servers (among others). Then from eth0 of this pc, the wifi and intranet switch are fed. The squid is running on a transparent configuration but it shouldn't interfere with torrent traffic as this is done on different ports (rather than the port 80).

So initially, I am attaching my named.conf, in an attempt to get feedback on it (perhaps some logically erroneous config that is not caught from the webmin named config file checker - with which I have repeatedly verified that the named.conf file is syntactically correct)

named.conf is here

If this is fine, is there someway I could start using tcpdump (and any other tool) under your guidance to collect info as to what might be causing this?

Thank you extremely much for your help :)

EDIT: my /etc/resolv.conf looks like:

domain skails.home
nameserver 127.0.0.1

nass

Posted 2013-08-01T23:00:53.813

Reputation: 310

1Is there any reason you have forwarders? I would personally just use the "." zone (with recursion) to have your name server cache directly. I have a feeling you are ending up with negative caching on your DNS results and hence the "failing" (with forwarders). Maybe reduce your max-ncache-ttl setting if you are keeping forwarders? – Drav Sloan – 2013-08-01T23:40:53.807

Confirm the contents of resolv.conf? – Andrew B – 2013-08-02T00:45:23.720

RE: tcpdump... tcpdump -i ethX -n -s 128 port domain – Andrew B – 2013-08-02T00:52:25.873

"Shouldn't the kernel be divide time among processes that request to send/receive packages?" No. In the inbound direction, which is most likely where the issue is, all the kernel can do is process the packets it receives. – David Schwartz – 2013-08-02T04:14:39.410

@AndrewB , resolv.conf is part of the question. – nass – 2013-08-02T10:00:08.953

@DravSloan Your question signifies I may have misunderstood basic dns stuff. Terms like delegation zone, forward zone etc are not still not very clear. Anyway, Shouldn't I have forwarders? I mean, the dns server should respond to all dns requests for my intranet - if I ask for a pc within my intranet domain .skails.home (and a few other domains i am connected to through vpn), my dns will resolv them, but if I request anything outside skails.home, I expect my dns server asks my ISP dns servers for the ip adr. The fwds you see are:my ISPs , then other greek ones, finally OpenDNS and google ones. – nass – 2013-08-02T10:13:45.230

Answers

2

(Shouldn't the kernel be divide time among processes that request to send/receive packages?)

The typical situation with having slow or no Internet with something like Bittorrent saturating your connection is that incoming traffic on your upstream (which is usually lower than your downstream on most residential connections) is crowded out. So incoming TCP ACKs are not received timely, and connections timeout on their end, and then eventually your end.

One thing I learned from studying QoS is that there is no such thing as QoS on incoming traffic, because you can't control what's being sent to you. You can only really QoS/divide/share outgoing traffic. You can see the current Linux QoS configuration with tc - but be warned, tc is very complicated.

It's possible that a single connection could saturate your incoming bandwidth and crowd out incoming TCP ACKs, causing slowdowns, drops, etc. The number of concurrent connections doesn't really matter.

You probably need to set the total amount of bandwidth your Bittorrent program uploads to just under your maximum upstream, like 8Kbit/sec below what you know is the speed of your upstream. You also might want to look into Wondershaper if you feel like going down the rabbit hole that is QoS on Linux.

LawrenceC

Posted 2013-08-01T23:00:53.813

Reputation: 63 487

Keep in mind that this is DNS...UDP carries the majority of the weight, unless we're dealing with oversized queries. (i.e. EDNS) – Andrew B – 2013-08-02T00:44:03.527

I have seen tc and it doesn't look nice at all. I would fiddle with it if I had these problems before the upgrade of slackware, but since in most computers (and definitely in my setup) it used to work fine without QoS... – nass – 2013-08-02T13:19:24.663

3

Your clue is this line:

;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26154

Assuming that resolv.conf only contains 127.0.0.1, this tells you that you that the caching server has decided that the upstream nameservers cannot be reached or is misconfigured. At that point, the server is going to give up on communicating with that domain. This means that the server is added to the list of lame nameservers. This is different from negative caching, which only applies to NXDOMAIN responses.

It stands to reason that once facebook.com has been determined to be dead, that the caching nameserver isn't going to bother trying to resolve it for awhile. You now have to figure out why that's happening.

Let's assume that you're experiencing network congestion, and facebook.com is not in cache.

  • named is going to try to cycle through your list of forwarders until it finds a nameserver that will respond with anything other than REFUSED for that record. NXDOMAIN and SERVFAIL responses that it will accept. Even if the other servers would have answered differently, all your server cares about is whether or not a record is in cache, and the first valid response that it gets.
  • Once it finds an answer, that will be cached. For better or worse.
  • Failure to get an answer from any of them will be considered a SERVFAIL as well.

For your particular test, the query and response would be small. UDP doesn't have the session overhead associated with TCP. To get a response of SERVFAIL...

  • The first valid reply you received was SERVFAIL for that domain.
  • All of the forwarders were unreachable. You failed to get a response from all of them.

The only way to know what's going on for sure would be to start a packet capture, then restart your nameserver and analyze the packets. One of your forwarders may be bad and returning SERVFAIL frequently, or your congestion is so potent that eight tiny DNS lookups against your entire forwarder list fail.

Andrew B

Posted 2013-08-01T23:00:53.813

Reputation: 314

i'm all ears. should I do packet capture with tcpdump -i ethX -n -s 128 port domain ? – nass – 2013-08-02T11:36:58.550

apparently on a normal ssh nass@domain.name -p 34567, when I have no problems with connectivity, the command tcpdump -i eth1 -s 128 port 53 returns the following http://pastebin.com/kFmF2R4k . So correct me if I don't read this properly, but most of the forwarders I have , refuse to give me an answer (see line 2). they only seem to send me their authority record? Other forwarders reply something that I don't understand (see line 7). Finally , I get an answer in the end (last line),but it is weird that most server refuse to reply.OK i'll start a torrent and check the behaviour and post back

– nass – 2013-08-02T13:13:51.327

I have tried to reduce the torrent limits (both UL and DL) and the situation is somewhat improved. I still lose the connectivity though and tcpdump returns http://pastebin.com/uiULAUPT . So no replies really. I think I may have to run a speed test.. The funny thing that i am at work now and conneect to hoem srv through ssh and start rtorrent to do the tests.. I don't have any problem with my ssh connection to home.. even when torrents are running at full speed.

– nass – 2013-08-02T14:50:42.043

Also, i am at a location with far slower internet connection. with another pc, and downloading a torrent. How do you explain that without an intervention (limiting UL/DL) on my part, i can download torrents AND be able to surf in the meantime? – nass – 2013-08-04T22:25:07.473