Robots.txt and Response Status Codes
Robots.txt files. They’re a pretty important element to consider as part of any SEO work. They provide the ability to block or allow crawlers to access a site, which can have an impact on your overall SEO performance. But what actually is a robots.txt file? Glad you asked! In this article we will talk about what a robots.txt file is, the SEO benefits you can get by optimising this file correctly, and some top rules you must follow.
What is a robots.txt file?
A robots.txt file is a file exclusion protocol that sits off the root domain of a site, in essence it will inform search engines which areas of the site they can and cannot access. The robots.txt provides instructions to search crawlers, such as Googlebot or Bingbot, of what should be crawled or indexed and what shouldn’t. Generally, it can be used to manage potential crawl budget challenges as well as duplicate content issues. It provides specific instructions for crawlers, but these can sometimes be ignored, so make sure to check it’s not hiding private sections of your site.
What’s the benefit for my SEO?
Blocking certain areas of your site which are not of interest for SEO will help manage potential crawl budget wastage and index bloat. Managing crawl budget ensures that all of your SEO important pages are crawled and prioritised above any non-SEO relevant pages, and preventing index bloat helps avoid unnecessary pages being indexed by Google. It can also be used to block search engines from crawling duplicate content within your site that otherwise could incur ranking issues. Plus, it can be used to reference the XML sitemap, a file that lists a website’s important pages, to ensure that it is crawled as a priority. So, pretty handy!
Our top four rules for crawling your site are…
- Ensure that your robots.txt file is present and placed off of the root domain. E.g.: www.domain.com/robots.txt
- Ensure that you select your specific crawlers of interest, if not all
- Reference the XML sitemap within your robots.txt file
It is important to be able to understand the syntax of a robots.txt file in order to fully understand how to optimise crawlability, however, it is also key to understand status codes to get an understanding of how bots and search engines will interpret these for SEO.
So, what are status codes?
Status codes are three digit response codes that a server will return based on a request made by a user in a browser or search engine. There are many types of http status codes, but there are a few that have a bigger impact on SEO than others:
200 OK Response Code – This tells the user that the request was received successfully and returns the requested content to the user.
301 Status Code – This occurs when a page or a resource has been permanently moved to a new location. It is important to implement these when migrating sites as the 301 status indicates a permanent move has taken place, so all SEO value from the original page is also carried across to its new location.
302 Status Code – This indicates that a page has been temporarily moved to a new location, such as when a sale page is retired for a short time on an e-commerce site, but as it’s only temporary, SEO signals are not moved. However, if a 302 redirect is left in place for a long period of time, Google may decide to treat it as a permanent redirect. We wouldn’t recommend replacing a 301 with this approach, as the migration and passing of value will take much longer!
404 Status Code – This code indicates to a user that the request for the resource or page they’ve made could not be found. 404s are generally bad for user experience, so we recommend removing internal links to 404 pages – they can also use up potential crawl budget! 404 pages can also occur and be found via external links pointing to a page which no longer exists on site. These are worth paying particularly close attention to, as they do not pass on any SEO value, so make sure to monitor and redirect these appropriately.
500 Status Code – A 500 response status code indicates to the user that a server has received the request, but could not process it. As with 404s, these generally provide a bad user experience.
503 Status Code – This code occurs when a server is temporarily unavailable. This can happen for a number of reasons such as the server being too busy, or being under maintenance. As this is a temporary response code, it’s unlikely your page will lose rankings as a result, but if it occurs over a prolonged period of time, Google will remove 503 content from its index.
So, there you have it. Understanding robots.txt files and interpreting status codes are fundamental to your sites performance and the success of any SEO campaign. If you are experiencing any of the above and are unsure what action you need to take, give us a shout at email@example.com.
Written by:Davide Tien Senior SEO Analyst
Category:What we think
You may also like
/ 28 Jan 2022
2022 Trends Forecast: DX & Project Management
2022 is here, and with a new year comes new trends, because in our industry, nothing stays the same for long. But that’s why we love it - new technologies, platforms and methodologiess are always evolving and we enjoy the process of learning and adRead more
/ 18 Jan 2022
2022 Trends Forecast: Digital Media
2022 is here, and with a new year comes new trends, because in our industry, nothing stays the same for long. But that’s why we love it - new technologies, platforms and methodologies are always evolving and we enjoy the process of learning and adaRead more