How To Fix Your Robots.txt File
There are quite a few problems with the default robots.txt file offered in Drupal. For starters, if you use Google Webmaster Tool’s robot.txt testing utility, you will find that many paths that look like they are being blocked will actually be crawled. The reason this happens is that Drupal does not require the trailing slash (/) after the path to show you the content. Because of the way that robots.txt files are analyzed, Googlebot will avoid a page without a slash and crawl a page with a slash.
For example, /admin/ is listed as disallowed. As expected, the testing utility shows that http://www.yourDrupalsite.com/admin/ is disallowed. However, put in http://www.yourDrupalsite.com/admin (omitting the ending slash) and you will see that it is disallowed. That is a problem, however there is an easy fix.
Make a backup of the robots.txt file.
2. Open the robot.txt file for editing.
3. Find the Paths (clean URLs) section and the Paths (no clean URLs) section. Note that both sections appear whether you’ve turned on clean URLs or not. Drupal covers you either way. They look like this:
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/ Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
Copy and paste the two sections so that you have four sections - two of the # paths (clean URLs) sections and two of the # Paths (no clean URLs) sections.
5. Add ‘fixed!’ to the comments of the new sections so that you can tell them apart.
6. Delete the trailing / after each Disallow line in the fixed! sections.
7. Save your robots.txt file, uploading it if necessary, replacing the existing file.
8. Go to http://www.yourDrupalsite.com/robots.txt and double-check that your changes are in effect. You may need to refresh your browser to see the changes.
Now your robots.txt file is working as you would expect!