Robots.txt is a standard text file placed within the public root directory of a website, to instruct web robots (typically search engine bots) how to crawl pages throughout a website.
Robots.txt can be easily overlooked when building a website. It does not have any visual effect on the website and works by instructing ‘invisible to the eye’ web crawlers. Robots.txt is one of the most important parts of building a website as it can help focus the attention of web crawlers to the pages that are most important to your website or business.
A robots.txt file is a standard notepad text file that has web-based statements inside. These are read by crawlers to instruct them with which pages you would like them to visit. This allows the web crawlers to discover new content on your website, enriching search engine results.
Once the search engine crawlers have found the correct pages to gather content from, they will analyse your page and index them to the search engine they are working for. This is commonly known as spidering (spiders) as they are building the world wide web. Web crawlers will also use your internal linking to go from one page to another, which shows the importance of internal linking. This is because you will want your visiting web crawlers to index as many pages as possible to the search engine.
There are many statements which you can use within a robots.txt file, all of which have a unique function. Below are some of the most commonly used statements within a robots.txt file.
A robots.txt file is deemed complete with just the following functions
In the statement above, we are telling Googlebot not to crawl (or visit) the example page. The web crawler will then read that statement and avoid the example-page and will not index it within the search engine rankings.
You can do a lot more than declaring pages for a web crawler to look at. A robots.txt file is great for declaring other parts of your website, including:
You can use wildcards to shorten the amount you type, for example:
To find out a bit more on how these could be useful for your robotx.txt file, check out the robots.txt documentation on Google.
Seeing a correct robots.txt file can sometimes make it a bit easier to implement. Find below a working example of a robots.txt file. This was taken from our very own robots.txt file here at Bravr Digital Marketing:
You can see that we have allowed Googlebot to see our design elements, such as CSS. This is so that Google can understand out template and index this when the bot renders the website.
A robots.txt file is accessible to anyone and can be found in the same directory on every website. The directory in which a robots.txt file should be placed is within the public_html directory – which is where all of your public website files can be found, for example www.example.com/robots.txt
If you were to place the file in a different directory, such as www.example.com/directory/robots.txt, the robots.txt file would not be found by any of the bots, resulting in it being ineffective.
If you can’t find your robots.txt file, chances are you don’t have one. It is recommended that you contact your website developers to get this setup. Or you can alternatively drop us an email here at Bravr; we will be happy to make sure your robots.txt is working and present!
Be sure to check the following before going live with your robots.txt. As mentioned above, declaring the wrong statement can have a negative effect on your website, some things to look out for include;
Robots.txt is just one part of Search Engine Optimisation. To find out a bit more about SEO, check out our dedicated SEO pages.
To create a robots.txt file you will need to open a text editor program, notepad is probably the easiest to use. Once you have created your document, write all of your statements on individual rows. Doing this is important so the web crawlers can read your statements easily. If you would like to leave a note within your robots.txt you can use the # to null the text that follows, for example:
# The text on this line is now nulled because of the hash
Once you have written your statements, check them and make sure that you are allowing/disallowing the correct directories and files. It would be disastrous if you disallowed your important pages to Googlebot.
Once you are happy with your robots.txt file, save it locally so you have a backup. Then upload it to your website via FTP to your root directory (typically public_html).
Once you have done that you are all set! Sit back and be safe in the knowledge that any web crawlers visiting your website will be going to the correct pages on your website.