0

This is a proposed architecture for submitting anonymous request to re-crawl the web page to google bot. I tried to come up with the solution given below. The intention of posting it here is to know the security loopholes in the given architecture and find out what improvements might benefit the current architecture

Here is the scenario, Let us say user visits a URL and he/she suspects that the page is cloaked or for some reason want the bot to re-crawl the URL. Google provides fetch as google tool for the same. But anyhow, when we are submitting the URL to Google, Google will know our IP. I want this submission to be anonymous. Please do not confuse with this. Assume that I reach the page turning off(disabling) my JavaScript.

So here are the steps: 1. User request for Random number from a authority which grants the user a random number and same to Google server. From a duration from t to t' same random number will be given to all the users and Google will also store same random number for that duration. After that a new random number will come into picture and that random number will not be used{ I wanted to minimize the management of keys for a user so I resorted to that approach}
2.Once we get the random number, XOR this with URL and send the encrypted URL to trusted mediator. This mediator stores the request of all users{ encrypted URL} and after every 5-10 mins gives these URL to the Google Server. 3. Also note that the a user can send one URL once every 15 mins. 4. As soon as the dialog with mediator is over the connection is closed by the user. 5. Now mediator sends all the encrypted URLs to the Google server 6. Google server only knows the encrypted URLs and source as mediator hence privacy of the user is preserved Proposed Architecture

These are assumptions I made: a.Mediator can allow only one connection per user or client in every 15 mins window. b.Mediator on terminating the connection with the user or client keeps no details about the user or client c.Random number generator is true random number generator d.Mediator and Random number generator are both fault tolerant(in this context I mean robust to load).

What are some existing flaws?What can be improvements?


EDIT: In spite of the fact I have accepted the answer. I welcome comments other answers and improvements so feel free to let me knows the flaws or improvements.

  • I'm not sure what you're asking. `see link above` - did you forget to post the link? Can't the link be submitted via TOR if you wish to remain anonymous? What is the purpose of the random numbers and how can Google decrypt? – SilverlightFox Apr 24 '17 at 20:26
  • Edited. I was allowed to post only 1 link owing to the reputation.Sorry for that. – Paul Schimmer Apr 25 '17 at 00:41
  • @SilverlightFox I know TOR can do the trick but, I am thinking of a possibility which is generic, so even a chrome or firefox user is able to give feedback – Paul Schimmer Apr 25 '17 at 00:45
  • Can I also add this question on SO? – Paul Schimmer Apr 25 '17 at 04:54
  • @PaulSchimmer: This would be off-topic on SO, unless you are writting code that does it, and a specific question about a problem you ran into while coding. – Ben Voigt Apr 25 '17 at 05:44
  • Alright@BenVoigt – Paul Schimmer Apr 25 '17 at 07:49
  • 1
    I'm not quite sure what the XOR "encryption" here does at all. It does nothing to preserve the privacy of the user. The mediator can simply request the decryption key to the random number key authority and be able to decrypt all URLs submitted through it. You might as well just submit the actual URL to the mediator. – Lie Ryan Apr 25 '17 at 08:59

1 Answers1

0

Simplest is to use any VPN or SOCKS proxy.

When you submit your recrawl request (or pretty much any request) via a VPN/proxy:

  1. The end service (Google) only sees the VPN/proxy's IP address, not your real IP
  2. The proxy/mediator can't see the data (URL) you submitted, because your connection to the end service (Google) is encrypted using public key cryptography (TLS)
  3. Your only concern is to ensure your browser doesn't send cookies (or similar tracking identifiers) when connecting to the end service (Google). This can be done by using a new browser profile.

The only "weakness" here is that since this is a low latency, real time request, it's possible for a sufficiently advanced attacker that can monitor all of the VPN/proxy server's internet link but have not compromised the VPN/proxy server itself to do traffic analysis to see that there is an outgoing request to the end service shortly after your packets arrives to the intermediary server.

To protect against traffic analysis, you'll need a high latency/store-and-forward system. The stereotypical store-and-forward protocol is email. It is possible for the intermediary server to run an email server, you send an email to the intermediary, which stores the email for a random amount of time, and then forwards the mail/request to the end service at that timeout, possibly batching the requests. To ensure privacy of the URL against the intermediary's email server, you can GPG encrypt the email to a GPG key held by the end service.

Lie Ryan
  • 31,089
  • 6
  • 68
  • 93
  • As per your point 3:"Your only ......This can be done by using a new browser profile" How can I be sure that new browser profile doesn't send cookies? – Paul Schimmer Apr 25 '17 at 09:33
  • This also means I first need to set up VPN or be in proxy in order to become a anonymous entity. @Lie Ryan, I agree on your solution being pretty much robust and has the only weakness that you pointed out. But my point is that- anonymous reporting must be possible even for a layman, who may not be in VPN or under any proxy – Paul Schimmer Apr 25 '17 at 09:39
  • @PaulSchimmer: you can't do that without breaking half the internet. But if you use a new browser profile and a new IP (by using a proxy), the cookie can't be (easily) associated with the cookie associated with your regular account where you might be logged into your Google Account or other such ad tracking cookie with a much longer history of being associated with you. – Lie Ryan Apr 25 '17 at 09:40
  • Anyways thanks for the answer, It would be of great help if you could also point out the flaws in the mechanism that I suggested? – Paul Schimmer Apr 25 '17 at 09:42
  • Okay got your point – Paul Schimmer Apr 25 '17 at 09:43