-2

We registered a domain name (say 'example.com') and what we want is to see who is trying to resolve the domain name and what kind of requests he send to the web server of that domain name.

For this purpose, we set up a name server and collect the logs of BIND to find out who is querying the name server to resolve 'example.com' (we don't care about the cases where people resolve the domain name by the cached data in the recursive resolvers). We also collect the logs of Apache web server. But the problem is that we can not find out which DNS request corresponds to which web-server request?

To map these two logs together, I was thinking of creating a random subdomain and return it as the CNAME of example.com for each DNS request and then config Apache to redirect all those subdomains to 'example.com' main page. so if that specific subdomain is requested by somebody, I know what is the corresponding DNS query.

Is it the right way of doing this? Is there any other way to do that?

I appreciate any thought or ideas.

Alex
  • 159
  • 2
  • 8
  • Can you not correlate the requesting IP from the DNS query with the HTTP request logs? If you're logging the DNS checks, they should have an IP to follow that you could then cross-reference with the HTTP access-logs. – Andrew Mar 12 '18 at 23:19
  • 1
    What you propose would generate an extra DNS query that you could find in the logs, but there's not necessarily anything else there (because of caching, among other things). What are you actually trying to achieve, what is the end goal? – Håkan Lindqvist Mar 12 '18 at 23:20
  • @Andrew, There are lots of noise there, and also mismatching timestamps and many race conditions, so I would like to control it as much as I can. Moreover, the IP I see in the logs of the name server is usually the IP of the resolver, not the actual client requesting that domina name (even when ECS is enabled, I have access to /24 of the clients IP) – Alex Mar 12 '18 at 23:22
  • @Andrew It's not usually the case that the IP will match. The DNS query will come from some recursive nameserver (eg ISP, public service, etc), while the HTTP request typically comes from the client machine itself. – Håkan Lindqvist Mar 12 '18 at 23:22
  • 1
    I have to agree with the ask from @HåkanLindqvist, what's the end-goal of this project? There might be other options/better possibilities to accomplish the goal – Andrew Mar 12 '18 at 23:26
  • @Hakan, I guess this is the best thing I can do as the owner of a domain name. You are right after the first client asks a resolver that specific subdomain would be cached and then all other clients of that resolver would request the same subdomains, but I guess this the best thing can be done, right? Actually, I asked the question to hear about other possible ways of doing it. Plus, I'm not sure if I can dynamically change the CNAME per request. – Alex Mar 12 '18 at 23:28
  • The goal is correlating the log files of the name server and web server( in the best way we can) – Alex Mar 12 '18 at 23:30
  • 2
    @Alex That sounds like your proposed solution, not the goal itself? – Håkan Lindqvist Mar 12 '18 at 23:32
  • **Why**? What is the **business need** you are trying to address? – MadHatter Mar 12 '18 at 23:32
  • As I said to track users from the point they send the request to the name server to the point they connect to the server. Having a full visibility of what is happening. – Alex Mar 12 '18 at 23:34
  • 1
    @Alex Explaining what you seek to accomplish by correlating this seems highly relevant to coming up with (alternative) solutions. As it stands we cannot judge which variations are better/worse or if some entirely different approach would solve the problem in a more straightforward way. – Håkan Lindqvist Mar 13 '18 at 00:29
  • I suppose the CNAME part of your proposal specifically doesn't make the correlation easier, but there are options in the same spirit (varying address of address record?) that would be visible in the http context. – Håkan Lindqvist Mar 13 '18 at 07:59

1 Answers1

5

You can not correlate on the IP address.

It is because your authoritative nameserver is queried by recursive nameservers, not by end clients directly. So you will get the IP address of the last recursive (they can be chained) nameservers used by the client.

Even with the EDNS Subnet Client option you will get at best a block of IP, not the true client IP.

Same way, on the HTTP front, your webserver is not necessarily contacted by the client directly, he can go through a proxy.

I already replied to your other question tied to that one with some hints: Can we update CNAME per request? See especially the last paragraph and all studies done by Geoff Huston.

You have thus another way: just give everyone www.example.com but insert in the content some dynamic thing (either a 1x1 pixels image, or a link to a CSS or JS file), with some unique token in the hostname part. This can be tied on your nameservers with a wildcard. Without any CNAME you will then be able to easily correlate the access:

  • some client go to http://www.example.com/
  • in the reply the webserver adds a <img src="https://eecahquai5thuu9ji0iepha.tracking.example.net"> and records that in its logfile (so this ties the unique token with the current HTTP exchange)
  • configure authoritative nameservers of tracking.example.net (I recommend you using a separate zone just for zone, as otherwise wildcards can be full of surprises) to have a wildcard record; you will then have in your nameserver logfiles this unique token and the associated DNS data (IP of the recursive resolver, etc.)
  • (and to be a proper netizen) configure indeed a webserver replying on this address with a proper ressource and content-type, even if it is only a 1x1 pixel transparent image.
Patrick Mevzek
  • 9,273
  • 7
  • 29
  • 42
  • Some nitpicking: 1) the example img src will be considered a relative filename (add http:// or https://?), 2) I also notice the mixed case in that example, this could be misleading (case insensitive and query may not be sent in the same case), 3) It doesn't really matter if it's a separate zone as long as an otherwise completely unused subdomain is used for the wildcard, 4) the use of a wildcard ties in with your last item, if they don't plan to serve content they could log queries that result in NXDOMAIN (but it would look messy, just as not having anything there in terms of serving content). – Håkan Lindqvist Mar 13 '18 at 06:49
  • Now, this is a pretty straightforward thing to do, but the results will be different from OPs proposed solution (tricky) in that every visitor (unless they blocked getting the image or something) will be in the log, not only the ones that triggered a normal query to their authoritative server. Is this better or worse? Hard to say without knowing what they are actually trying to achieve. This being easy to do is positive in terms of implementation, though. – Håkan Lindqvist Mar 13 '18 at 06:54
  • @HåkanLindqvist yes I forgot the scheme, sorry, I fixed it. Mixed case is ok, this is the DNS 0x20 trick, but I just took a random string, it can be all lowercase it does not change the whole algorithm. Separate zone or other subdomain is the same, yes. Yes, not servicing the content, even if semantically useless, would be wrong. If the token is unique per HTTP access, then there will be no cache involved and each user will need to query the authoritative nameserver. – Patrick Mevzek Mar 13 '18 at 13:14
  • I did not mean that the mixed case would be a technical problem, rather that it may confuse the less informed reader (it looks like there may have been some intention behind it) – Håkan Lindqvist Mar 13 '18 at 13:38
  • @HåkanLindqvist Ok, I lowercased it. – Patrick Mevzek Mar 13 '18 at 13:58