3

If I have a url that is used for getting messages and I create it like so: http://www.mydomain.com/somelonghash123456etcetc and this URL allows for other services to POST messages to. Is it possible for a search engine robot to find it? I don't want to have it in my robots.txt because that will expose it already to anybody that sees the robots file.

Of course I will put in other authentication in the app, but not having anyone discovering that URL is the first step.

Any common methods?

lamp_scaler
  • 577
  • 1
  • 5
  • 18

3 Answers3

5

If your authentication is working there is no reason to hide the url. Id rather focus on that.

Miademora
  • 146
  • 3
4

It is not possible for search engines to find it, if it is not linked anywhere (as this is the only way search engines find other information). In order to make sure a search engine does not find it via a robots.txt and have the secrecy, use a double-hash:

http://example.com/asdfghjk/12345678

Your robots.txt would disallow anything below asdfghjk:

User-agent: *
Disallow: /asdfghjk/

But anyone that does not know the full path will still not be able to get the second url part from looking at it.

Lars
  • 484
  • 5
  • 19
  • 3
    Strictly speaking robots.txt doesn't **disallow** anything. It's merely a *suggestion* to web crawlers as to what they should and should not index. And more than a few of them ignore those suggestions. Using Robots.txt and "hiding" URLs in the way you suggest does not actually **secure** anything. – Rob Moir Oct 07 '11 at 10:15
  • 2
    you are correct. However, there are use cases in which you don't want to have "security by obscurity" instead of real security, and from the question I deduce, that this is the case (e.g. by having application keys for getting a rss-feed without authentication but still with some privacy). Also, search-engines still only find something by seeing a link or by deducing certain urls, so a 32-char-string should provide a sufficient protection against getting find. – Lars Oct 07 '11 at 10:18
0

you could put in something identifiable such as /secretpage[hash]123456etcetc then grep the apache logs which an automated script that block access if one ip tries to access a secretpage url too often.

In reality though, miademora is correct. Get your authentication secure (2 factor?) and ditch the security by obscurity.

Sirex
  • 5,447
  • 2
  • 32
  • 54