The robots.txt file is a simple text file used to inform Googlebot about the areas of a domain that may be crawled by the search engine’s crawler and those that may not. In addition, a reference to the XML sitemap can also be included in the robots.txt file.Before the search engine bot starts indexing, it first searches the root directory for the robots.txt file and reads the specifications given there. For this purpose, the text file must be saved in the root directory of the domain and given the name: robots.txt.
The robots.txt file can simply be created using a text editor. Every file consists of two blocks. First, one specifies the user agent to which the instruction should apply, then follows a “Disallow” command after which the URLs to be excluded from the crawling are listed. The user should always check the correctness of the robots.txt file before uploading it to the root directory of the website. Even the slightest of errors can cause the bot to disregard the specifications and possibly include pages that should not appear in the search engine index.
This free tool from OnPage.org enables you to test your robots.txt file. You only need to enter the corresponding URL and the select the respective user agent. Upon clicking on “Start test”, the tool checks if crawling on your given URL is allowed or not. You can also use OnPage.org FREE to test many other factors on your website! You can analyze and optimize up to 100 URLs using OnPage.org FREE. Simply click here to get your FREE account »
User-agent: * Disallow:
This code gives Googlebot permission to crawl all pages. In order to prevent the bot from crawling the entire web presence, you should add the following in the robots.txt file:
User-agent: * Disallow: /
Example: If you want to prevent the /info/ directory from being crawled by Googlebot, you should enter the following command in the robots.txt file:
User-agent: Googlebot Disallow: /info/