As you already know, Parsero is a free script written in Python which helps you to automatically audit the Robots.txt file of a web server. In just a few seconds, you are able to get a lot of valuable information which is needed  when you are auditing a website.

This tool is available for download here:

https://github.com/behindthefirewalls/Parsero

And here you can learn what Parsero already did.

http://www.behindthefirewalls.com/2013/12/parsero-tool-to-audit-robotstxt.html

How to install Parsero v0.6

Parsero is really easy to install. You can install it  for example, in Kali Linux. You only need to run the commands below.
apt-get install python3
apt-get install python3-pip
pip-3.2 install urllib3
pip-3.2 install beautifulsoup4
git clone https://github.com/behindthefirewalls/Parsero.git

What's new?

If you look at the Parsero help, you will see two new features:

  • "-o" :   To only show the available Disallow entries.
  • "-sb" :  To search in Bing indexed Dissallows.

Showing only the available Disallows

In the picture below you will see the difference between using the "-o" option and not using it.

If the robots.txt file has a few entries, I recommend you don't use the "-o" option because you will be able to figure out what type of content the administrator wanted to hide looking if you get all the results. But if the file is bigger, you have a lot of information to analyze and it is easer perform the audit getting only the links which are allowed to be visited.




Searching the Disallows entries in Bing

The fact that the administrator wrote a robots.txt to try to hide the crawlers part of his content doesn't mean that the search engines don't index these Disallow entries.

For example, in the picture below, Parsero will find content indexed by Bing which it mustn't have been indexed. Parsero will show you the first 10 Bing results for the indexed Disallows.

By doing CTRL+ click on the links, your browser will be redirected to:

  • White links: the search page in Bing.
  • Green links: directly to the result found in Bing (the content is not always available and sometimes you will get a 404 HTTP code error).