5

I am trying to figure out how OWASP ZAP discovered a directory on a practice vm, I entered the host IP and hit attack, and the spider discovered this directory (pChart2.3.1).

I have searched every wordlist used by dirbuster and none of them contain this string so dirbuster would not find it. I also tried using spider in Burp, but it came up blank. (The index page simply says "It Works" with nothing additional in the source.) ZAP seems to have sent a request for robots.txt, then for sitemap.xml, and then struck the directory. The thing is, neither robots.txt or sitemap.xml exist.

My question in a nutshell is, does OWASP ZAP use a specific wordlist to search for directories, and if so, where is it. Alternatively, has it managed to get sitemap.xml even though the server says it doesn't exist?

I am taking a practical exam soon that doesn't allow the use of ZAP, and would like to be able to replicate its findings in another tool, or even just understand what it has done!

Anders
  • 64,406
  • 24
  • 178
  • 215
user3046771
  • 165
  • 2
  • 11
  • ZAP is a spider, like you said. Spiders hop from links, references, anchors, includes and such to establish a tree (or web, hence spider). Then it traverses this tree to perform vulnerability analysis. – Yorick de Wid Aug 31 '16 at 11:15
  • My bad, it did have part of the url hidden in the source code. Thanks for your answer, made me go back and recheck! – user3046771 Aug 31 '16 at 11:57

1 Answers1

4

For future reference:

ZAP works as spider. Spiders crawl through data and find connection points in nodes. Webspiders follow links, sources, anchors in HTML, JS and CSS. Every time a connection is found, it is added to the source tree, creating an hierarchical data structure knows as a searchtree. To give a simplified example:

\-root
  |-page
  |-page
  | |-css
  | \-js
  |   |-js
  |   |-font
  |   \-js
  |     \-img
  \-xhr
    |-js
    |-img
    |-doc
    | \-xml
    \-json
      \-html

At some point the tree is traversed, and ZAP requests each of these nodes individuality, allowing it to send additional form data, GET requests, headers and so on. These trees can become massive, which explains why ZAP takes a long time to complete. One can limit or expand the tree by suppyling a max depth or to include other (sub)domains.

The same technique is used by search engines to index webcontent.

Yorick de Wid
  • 3,346
  • 14
  • 22