Robots.txt Test

About Robots.txt Test

Check if your website is using a robots.txt file. When search engine robots crawl a website, they typically first access a site's robots.txt file. Robots.txt tells Googlebot and other crawlers what is and is not allowed to be crawled on your site.

In order to pass this test you must create and properly install a robots.txt file.

For this, you can use any program that produces a text file or you can use an online tool (Google Webmaster Tools has this feature).

Remember to use all lower case for the filename: robots.txt, not ROBOTS.TXT.

A simple robots.txt file looks like this:

        User-agent: *
        Disallow: /cgi-bin/
        Disallow: /images/
        Disallow: /pages/thankyou.html

This would block all search engine robots from visiting "cgi-bin" and "images" directories and the page "http://www.yoursite.com/pages/thankyou.html"

TIPS:

You need a separate Disallow line for every URL prefix you want to exclude
You may not have blank lines in a record because they are used to delimit multiple records
Notice that before the Disallow command, you have the command: User-agent: *. The User-agent: part specifies which robot you want to block. Major known crawlers are: Googlebot (Google), Googlebot-Image (Google Image Search), Baiduspider (Baidu), Bingbot (Bing)
One important thing to know if you are creating your own robots.txt file is that although the wildcard (*) is used in the User-agent line (meaning "any robot"), it is not allowed in the Disallow line.
Regular expressions are not supported in either the User-agent or Disallow lines

Once you have your robots.txt file, you can upload it in the top-level directory of your web server. After that, make sure you set the permissions on the file so that visitors (like search engines) can read it.