2

A customer found a couple of generated links pointing to private data on their website. After looking at the links and searching a specific part of the URL on Yahoo, I found dozens of other private links available, from all our other clients (all different and privately owned domains) - this is really bad!

The links look like this:

https://domain.com/app/pagename.ext?aes_ctr_encrypted_data_encoded_in_base64

(The encrypted part is more than 100 chars.)

We send these links to clients by email only, we never publish these addresses on any web page, or do anything else with them.

How in the world can theses private addresses be indexed? Is it because Yahoo crawls private emails (seems really improbable)? Is it because these emails where somehow leaked on the web and Yahoo crawled them?

Also I was thinking, is it possible that clients copied/pasted the URL directly in Yahoo search, and then Yahoo searched for that and kept it?

Arminius
  • 43,922
  • 13
  • 140
  • 136

1 Answers1

1

Is it because Yahoo crawls private emails (seems really improbable)?

That's highly unlikely. E.g., Yahoo surely wouldn't index a password reset link that you get mailed to your private Yahoo inbox.

Is it because these emails where somehow leaked on the web and Yahoo crawled them?

That's a plausible explanation. It's sometimes hard to reason how exactly a search engine has discovered content but it had to appear somewhere on the public Internet or be made accessible to the search engine. Another possibility that comes to mind is that the framework of the site automatically indexes all content in a sitemap (e.g. at yoursite.example/sitemap.xml) - it's something that Wordpress often does. Also, are you sure the content isn't visible via directory listing or a publicly accessible database dump?

Also I was thinking, is it possible that clients copied/pasted the URL directly in Yahoo search, and then Yahoo searched for that and kept it?

I don't know if Yahoo automatically indexes the URLs you search for - this surely sounds risky from a security perspective. But I find it unlikely that multiple of your customers would put their URLs in the Yahoo search bar.

If you have request logs available, you could check which documents the Yahoo crawler was accessing in the past (look for a user-agent containing "Yahoo! Slurp").

Countermeasures

To prevent search engines from indexing sensitive content, you can add a Disallow entry to your robots.txt file with a * wildcard, like this:

Disallow: /app/secretcontent.ext?token=*

But note that not all search engines respect Disallow directives and robots.txt is one of the first places that attackers look at for information gathering.

More generally, you might want to let links to sensitive content quickly expire (or even turn them into one-shot links that can be only accessed once, if appropriate).

Another approach could be sending users the link to yoursite.example/secretcontent and the token separately. The site would then present a form where the user has to enter the token. This form would submit the token via POST so that the token is never visible in the URL and hence can't be indexed.

Arminius
  • 43,922
  • 13
  • 140
  • 136
  • 1
    I know about the robots.txt, althought I think it will only prevent website crawling, not indexing. Also, about the possible email leak, it seems improbable because there is some documents available from 1 week ago. I'm really lost here, I know how to fix it but I really wonder how it could have happen at first. Anyway, thanks for the information, I really appreciate it! – charlotmartine May 06 '17 at 10:25
  • @charlotmartine The `robots.txt` can influence indexing. I think to investigate this further you'd have to have someone look at the actual site. Otherwise it's only guessing. – Arminius May 06 '17 at 17:26