When I was writing Using robots.txt to locate your targets, I felt the necessity of developing a tool to make automatic the task of auditing the Robots.txt file of the web servers.

Now, I am really proud of introducing you my first tool called Parsero. I hope you enjoy it...

Introductions

One of the things you need to do when you are auditing a website is to look at the Robots.txt file, for example: http://www.behindthefirewalls.com/robots.txt. The web administrators write this file to tell the crawlers like Google, Bing, Yahoo... what content they are allowed to index or what directories mustn't be indexed.

But... Why the administrators want to hide some web directories to the crawlers?

Sometimes they want to hide the web portal login, management directories, private info, sensitive data, page with vulnerabilities, documents, etc... If they hide these directories from the crawlers, then they can't be found making Google Hacking or just searching in the search engines...

Why do you need Parsero?

We've said that the administrators tell the crawlers what directories or files hosted on the web server are not allowed to be indexed. They achieve this purpose by writing so much "Disallow: /URL_Path" as they want in the Robots.txt file pointing to these directories. Sometimes these paths typed in the Disallows entries  are directly accessible by the users (without using a search engine) just visiting the URL and the Path even sometimes they are not available to be visited by anybody... Because it is really common that the administrators write a lot of Disallows and some of them are available and some of them are not, you can use Parsero in order to check the HTTP status code of each Disallow entry in order to check automatically if these directories are available or not. 

When we execute Parsero, we can see the HTTP status codes. For example, the codes bellow:

  • 200 OK                  The request has succeeded.
  • 403 Forbidden     The server understood the request, but is refusing to fulfill it.
  • 404 Not Found    The server hasn't found anything matching the Request-URI.
  • 302 Found             The requested resource resides temporarily under a different URI
  • ... 

Installation

Parsero needs at least Python3 and can be executed in all Operating Systems which support this language development. Also it needs Urllib3.
sudo apt-get install python3
sudo apt-get install python3-pip
sudo pip-3.3 install urllib3
When you have installed these software, just download the project from:

https://github.com/behindthefirewalls/Parsero

https://github.com/behindthefirewalls/Parsero/archive/master.zip

In Linux you can use the command bellow.

git clone https://github.com/behindthefirewalls/Parsero.git


When you download Parsero, you will see a folder with three files.


Before start, you need to check that your default Python version is 3 or later. If you have already installed Python3 but is not your default version,  you can run the script using the command "python3 parsero.py" instead of "python parsero.py".


If you don't type any argument, you will see the help bellow.


Example 1

In the picture below you can see the Robots.txt file of a web server in one of my environments. If you are a web security auditor, you should check all the Disallows in order to try to get some valuable information. The security auditor should want to know what directories or files are hosted in the web servers which the administrators don't want to be published on the search engines.


You can do this task automatically using Parsero with the command:
python parsero.py -u www.example.com 

Notice in the picture below that the green links are the links which are available in the web server. You don't need to waste your time checking the other links, just clicking on the green links.


If we visit the www.example.com/server-status/ we can see the Apache logs which are public but hidden for the crawlers...

Example 2

In the picture below you can see another robots.txt. The picture has been cut because this server has a lot of Disallow. Can you imagine checking all of them manually?

 
If you use Parsero, you will audit all the Robots.txt file in just a seconds...


... and discover for example, the portal login for this site.

The future of Parsero

I am working on developing new features of this tool which will be delivered in the next months... I would be really grateful if you decide to give me your feedback about this tool.

I want to give the thanks to cor3dump3d for his support and help!!! He has saved me a lot of time thanks to sharing his knwoledge of Python with me!!