Since I recently got into this whole blogging thing, and I’m someone who tends to exhaustively research anything I’m interested in ( I guess that’s why I like my job), I wanted to share a few tips for WordPress that I’ve worked out that may help others. One of those is determining exactly what is the right robots.txt file to use for your WordPress site. The goal is not so much SEO (search engine optimization) as it is to make sure the right content is being indexed by sites like Google, and the wrong stuff isn’t. I’ll break this down into somewhat basic terms for people who may be new to the process. There are a variety of blog posts on the subject, and I think I’ve compiled my own spin on the issue. The key is that you don’t want to block too much, so try to only block things that are meaningless to readers (like script files).
The root folder of your site can have a text file in it named robots.txt. This file contains some rules that you set that determine what files and folders you want to allow search engines to find, and which ones you want to label as being off-limits. Google has a bad rap for ignoring robots.txt files, but I believe that is coming from some confusion as far as how Google interprets this file. By playing with their robots.txt analysis tool I found something that I think many neophytes are missing.
First, a general primer. Below are the first few lines from my robots.txt file.
User-agent: *
# disallow all files in these directories
Disallow: /blog/wp-*