122

I think that it's fundamental for security testers to gather information about how a web application works and eventually what language it's written in.

I know that URL extensions, HTTP headers, session cookies, HTML comments and style-sheets may reveal some information but it's still hard and not assured.

So I was wondering: is there a way to determine what technology and framework are behind a website ?

storm
  • 1,714
  • 4
  • 16
  • 25
  • 16
    Try www.builtwith.com – SnakeDoc Mar 11 '16 at 15:44
  • 32
    My tomcat server returns "CERN httpd" just to mess with people – Neil McGuigan Mar 11 '16 at 19:57
  • 2
    My first guess would be HTML – Hagen von Eitzen Mar 12 '16 at 06:39
  • 22
    @HagenvonEitzen If HTML had been a programming language it would have been named HTPL rather than HTML. – kasperd Mar 12 '16 at 09:44
  • 10
    `I think that it's fundamental for security testers to gather information about how a web application works and what language it's written in.` I think that, if even a security tester can't figure out what language the site is built in, that makes it more secure because then no one will know which exploits to try. (Yes, there are occasionally valid use cases for security through obscurity.) – Mason Wheeler Mar 12 '16 at 14:06
  • 6
    @MasonWheeler: figuring out what language the site is built in will only determine which exploits *not* to try. That won't make the site more secure. – Benoit Esnard Mar 12 '16 at 19:17
  • @BenoitEsnard well, if an attacker uses it to determine which exploits *not* to try, then it would be a security improvement if a site successfully misleads the attacker into thinking it's something different and thus the attacker skips trying the "proper" exploits. – Peteris Mar 14 '16 at 11:31
  • I use to be satisfied by just checking .php or .aspx to identify if website is on PHP or on ASP.NET webforms. Now a days, with URL routing and MVC framework it is quite hard for me to differentiate. :p thanks for the question. – Anuj Tripathi Mar 14 '16 at 14:28

6 Answers6

155

There's no way to be 100% sure if you don't have access to the server, so it's about guessing. Here are some clues:

  • File extensions: login.php is most likely a PHP script.
  • HTTP headers: they may leak some information about the language which is running on the server, and some additional details like the version: X-Powered-By: PHP/7.0.0 means that the page was rendered by PHP.
  • HTTP Parameter Pollution: if you managed to guess which server is running, you can refine the guess.
  • Language limits: maximum post data, maximum number variable in GET and POST data, etc. It may be useful if the webmaster kept the default values.
  • Specific input: for example, PHP had some easter eggs.
  • Errors: triggering errors may also leak the language. Warning: Division by zero in /var/www/html/index.php on line 3 is PHP, for example.
  • File uploads: libraries may add metadata if the file is being modified server-side. For example, most sites resize users' avatars, and checking for EXIF data will leak CREATOR: gd-jpeg v1.0 (using IJG JPEG v90), default quality, which may help to guess which language is used.
  • Default filenames: Check if / and /index.php are the same page.
  • Exploits: reading a backup file, or executing arbitrary code on the server.
  • Open source: the website may have been open-sourced and is available somewhere on Internet.
  • About page: the webmaster may have thanked the language community in a "FAQ" or "About" page.
  • Jobs page: the development team may be recruiting, and they may have detailed the technologies they're using.
  • Social Engineering: ask the webmaster!
  • Public profiles: if you know who is working on the website (check LinkedIn and /humans.txt), you can check their public repos or their skills on online profiles (GitHub, LinkedIn, Twitter, ...).

You may also want to know if the website is built with a framework or a CMS, since this will give information about the language used:

  • URLs: directories and pages are specific to certain CMS. For example, if some resources are located in the /wp-content/ directory, it means that WordPress have been used.
  • Session cookies: name and format.
  • CSRF tokens: name and format.
  • Rendered HTML: for example: meta tags order, comments.

Note that all information coming from the server may be altered to trick you. You should always try to use multiple sources to validate your guess.

Anders
  • 64,406
  • 24
  • 178
  • 215
Benoit Esnard
  • 13,942
  • 7
  • 65
  • 65
  • 4
    You forget to mention some example that are from Java which use generally a cookie JSESSIONID for their session management. Login URL can betray unlerlying technology too, spring default URL for instance. Those example are for java but are surely true from some others – Walfrat Mar 11 '16 at 15:25
  • 25
    Just a note: just because the http headers *say* they're powered by php, doesn't mean the site actually is. Although this example is more about the server platform, I know of a guy who would make his nginx server return Server: Microsoft-IIS/5.0 with every request so he could trick attackers into using the wrong attacks against the server. "It's too easy!" ~ *the attacker*. You're right about that! (This just goes to show that you can't trust headers) – d0nut Mar 11 '16 at 15:40
  • I liked the Parameter Pollution technique .. I'm sure that there are many more ways though – storm Mar 11 '16 at 16:14
  • @Walfrat: I've just detailed the CMS / framework part! – Benoit Esnard Mar 11 '16 at 17:19
  • 1
    @AhmedJerbi: I've added more techniques. – Benoit Esnard Mar 11 '16 at 17:19
  • @Benoit: thank you .. Many docs to read for the weekend :-) – storm Mar 11 '16 at 18:18
  • 3
    Another good one is checking the source to see if there are tell-tale signs of the use of some templating engine specific to a language. – mowwwalker Mar 11 '16 at 20:08
  • 9
    You forgot one of the simplest - looking at the jobs page. :) – Xiong Chiamiov Mar 11 '16 at 20:40
  • 1
    Nitpick: the first 9 will really only tell you what language was used to *deploy* the site, not to *build* it. E.g., if you determine that the site was deployed on a JVM, that doesn't tell you much, there are over 400 languages with implementations for the JVM, the site may have been built in Scala, Groovy, Clojure (which also has implementations for the CLI and ECMAScript), Fantom (ditto), Ruby (JRuby), Python (Jython), PHP (IBM P8, Quercus), ECMAScript (Mozilla Rhino, Oracle Nashorn, dyn.js). The same applies to the CLI (IronPython, IronRuby, IronJS, …). There are also many compilers that … – Jörg W Mittag Mar 12 '16 at 10:16
  • … target PHP: haXe, Hack, Wasabi, … – Jörg W Mittag Mar 12 '16 at 10:16
  • @mowwwalker: i've added that sign under the "rendered HTML" part. I'm not sure if you were thinking about another sign though, so let me know if I missed something! – Benoit Esnard Mar 12 '16 at 19:19
  • How about humans.txt? – Gustavo Rodrigues Mar 12 '16 at 21:34
  • Or maybe I'm trolling you. /cgi-bin/postcomment.exe turns out to be a ksh script. – Joshua Mar 13 '16 at 20:33
  • 3
    If there's a hidden field named "__VIEWSTATE", and/or if the buttons say "href=javascript:__doPostBack" it's likely asp.net. Off the top of my head I can't think of comparable "signatures" in other platforms, but, etc. – Jay Mar 14 '16 at 04:07
19

For guessing the programming language, you can follow the three steps approach detailed below:

STEP 1 - Search evidences on the site itself

Manually...

  • Search on a site page at the bottom for phrases like:

    -> "Powered by XXX"
    -> "Proudly Powered by XXX"
    -> "Running on XXX"
    -> ...

  • Search on the site if it will attend any conference where they could talk about the website from a technical point of view

...or with the help of a tool

  • Read the HTML code downloaded by your browser

  • Fire up the Network Tab in developer toolbar and study the exchanges made between the browser and the server.

  • Search for some known hidden page:

    wget -head http://the-site.com/private/admin

    If you get 200, the site may be running on a plublicly (free, paid etc) available software.

STEP 2 - Search evidences on the web

Ask search engines for front-end errors

You can look for some errors produced by the website.

  • Some keywords to type in a search engine:

    • Error 500 site:the-site.com
    • Exception site:the-site.com
    • ...
    • <what ever> site:the-site.com
      => You can simply replace "<what ever>" with some known error message produced by the various web technologies.

Ask search engines for back-end errors

You can even guess the technologies used in the backend:

  • ORA-12170 site:the-site.com
    => If you find something, the site may be using Oracle in its backend part.

Ask search engines for website competitors

  • Find what technology is popular in the website industry

  • Find what technology competitors are using

  • Find comparisons of the site with other competitors.
    Those comparisons may talk about technologies in use

Technology survey sites

Those sites can provide great info to the the site you target. They may have already done some part of the job for you.

STEP 3 - Analyze your results

The evidences you have found in step 1 may be wrong because the site owner can alter them. Try to find contradictions between those evidences. Eliminate contradictional evidences.

Merge the evidences in step 2 between the various sources and yours. Again eliminate contradictional evidences.

Resume all your findings in a table like the one below.

+-------------+-----------+------------------+    ...   +----------+-------+--------+
| EVIDENCES   |  ON SITE  |  Search Engine 1              SOURCE n   SCORE   PCT (%)
+-------------+------------------------------+    ...   +----------+-------+--------+
|    PHP 7    |     X     |       X          |                X    |   3   |  300/n
+-------------+------------------------------+    ...   +----------+-------+--------+
|  Wordpress  |           |       X          |                X    |   2   |  200/n
+-------------+------------------------------+    ...   +----------+-------+--------+
     ...
+-------------+------------------------------+    ...   +----------+-------+--------+
|  EVIDENCE m |           |                  |                     |       | (100*SCORE)/n
+-------------+------------------------------+    ...   +----------+-------+--------+

Finally, you will be able to say "I'm confident at XX% that this site runs on YY (EVIDENCE i)".

Stephan
  • 375
  • 1
  • 9
  • This looks like a useful step by step guide, but it's probably a bad idea to present the arbitrary confidence score as a percentage. Even if a server gets a perfect score it could very well be a carefully assembled honeypot, so you shouldn't say you are a 100% confident that it isn't. – August Janse Jun 03 '19 at 09:01
  • @AugustJanse How sould the arbitrary confidence score be presented ? – Stephan Jun 03 '19 at 20:58
  • Something like "I conclude that this site runs on YY with a confidence score of XX" perhaps? The problem is that the percentage looks a bit too much like a probability. – August Janse Jun 04 '19 at 06:42
17

It's simple. Add Wapplyzer extension available for Chrome as well as Firefox.

It tells about programming language, server, analytics tool or about CMS & Frameworks on which website is built.

Give it a try, you will love it.

Benoit Esnard
  • 13,942
  • 7
  • 65
  • 65
Manish Kumar
  • 297
  • 1
  • 3
  • 2
    That seems good .. but is it reliable and accurate? – storm Mar 11 '16 at 16:10
  • Yes, its very much accurate. I'm using it from last 4 years and even on my own developed websites. Its always accurate. – Manish Kumar Mar 11 '16 at 18:03
  • 11
    I don't think it can be considered accurate. We purposely fake our sent headers to return IIS. Have a wp-admin.php even though we don't use Wordpress. And several other honey pots. Our site is actually a Node.js application that returns static content. – Bacon Brad Mar 11 '16 at 18:09
  • I just downloaded it as it is accurate as it can be. Obviously it can't tell if the headers are being spoofed or not. – pllee Mar 11 '16 at 21:38
  • 2
    @Ahmed it works by [scanning](https://wappalyzer.com/suggestions) the HTML, headers, URL and JavaScript variables on a page. It's only as good as the rulesets used for detection of course, but I've found it to be right almost always. (But, of course, any web page can be set up to pretend to be running something it isn't.) – user2428118 Mar 12 '16 at 12:00
  • 12
    Social Engineering: ask how to identify the software used to serve web pages on StackExchange and wait for people to tell what their site runs on. Thank you, @BradMetcalf... – Arc Mar 13 '16 at 09:36
8

Besides the Wappalizer browser extension, there are several sites that detect what technologies power a given website:

Dan Dascalescu
  • 1,945
  • 2
  • 15
  • 23
2

The answer is that you can never "Be assured". Whilst 99.9% of the time the highly up voted answers will find the "tells" of the framework behind the site but it's never a certainty.

Basically your browser receives the end results of the codes processing. (html, CSS and JavaScript ) Between you and the code itself sits a webserver (nginx, Apache etc) and potentially a load balancer and a CDN. Because your not interacting directly there is no way for certainty.

If a website is serving content from wp-uploads/ It's a safe bet that it's running Wordpress but it's not a certainty. Perhaps the site was using Wordpress but when it was migrated to something else the wp-uploads/ path was kept to avoid breaking links and bookmarks.

Nath
  • 401
  • 2
  • 6
-2

Sometimes you can know, sometimes you cannot.

If the HTML is generated on the client-side, then you can easily tell which language by looking at the source in your web browser. These languages include: ruby on rails, javascript, java, etc. On the client-side the source is open to the user, and it must be honest about which technology it is.

If the HTML is generated on the server-side you may not know which programming language generated it. These languages include: PHP, C++, and many other languages. On the server-side, for as many ways as you can think of to guess which language it is, there are just as many ways to for the technology to hide itself.

Suppose you are a web administrator that wants to hide the server-side technology. Pick one of the techniques listed in another question for attempting to identify the language. For example, the *.php extension for a file. Now, configure your web server to execute C code from a file with a *.php extension. Your users will have no way to view the source (since both languages are equally capable of producing the same output, by Turing completeness), but they will be misled into thinking you are running PHP.

Why would someone want to obfuscate the server-side choice of technology? Because CGI languages have various vulnerabilities that are easier to target if the end-users know which of those languages you are using. Misleading the users about which server-side technologies you are using is a very reasonable security measure.

  • 3
    I didn't downvote, but this answer neglects the numerous techniques available for determining the server-side language and tech. –  Mar 13 '16 at 05:11
  • 2
    For starters, Ruby on Rails and Java are perfectly capable of generating HTML entirely on the server side. – Scott Hillson Mar 18 '16 at 03:33