Robots.txt is a text file that allows the webmaster or administrator of a website to tell robots (also called crawlers) of search engines what information they are allowed to analyze. It is exclusively intended for indexing robots, it does not prohibit the access of a page or a directory to an Internet user.
The origin of the robots.txt file
We attribute the authorship of this file to Martin Koster who worked for Web crawler in 1994. At the time, it was a question of regulating the crawl of the robots, an activity that had the faculty to cause a certain number of inconveniences, like the script activation and server planning.
What is the link between robots.txt and SEO?
The referencing of a website is not possible without the exploration of the contents by the robots of the engines. By giving them instructions through this file, you can essentially explain to them that they are not intended to be interested in content that you think would not add value to Google’s results, Bing or Yahoo.
Does the creation of robots.txt guarantee better SEO?
In 2017, this same engine communicated on this subject. The ease of crawling is not a criterion of the relevance of its algorithm, the effect on SEO is therefore not mechanical, this being a platform that is explored more “effectively” obviously has more opportunities to see its best content analyzed and therefore returned in the SERPs.
What content should be prohibited from an SEO perspective?
First, the static pages that you are updating for relevance maybe some of the content that you would not like to see crawled by bots.
It is also information qualified as confidential, as non-sensitive resources but above all intended to be discovered by internal collaborators (documentation, white paper, specifications, etc.) We then think of duplicate pages, which frequently represent parts of the important sites on WordPress and other SMS. In addition, it is the internal search engine searches, which, while they can give you some relevant ideas to exploit in natural referencing, are not necessarily of interest to engine users.
What other SEO rules should I know?
The Google Search Console test tool to check your robots.txt file
How to use, place, and update the robots.txt file?
How can we create or read the robots.txt?
The file can be easily created and edited with a simple text editor, such as Notepad, Atom or Notepad.
Where to put the robots.txt file?
The robots.txt file must necessarily be located at the root of the site. To do this, simply drag it to the location provided on your FTP server.
How to update it?
In the Search Console, each we2bmaster has the option to update the robots.txt file. In the “ Exploration ” tab, we distinguish in particular the category named “ Tool for testing the robots.txt file ”. Here, for example, you are allowed to test the possible blocking of a page. By clicking on “Send”, follow Google’s instructions to update your file, it will take it into account fairly quickly.
Robots.txt: what you should definitely not do
- a change of URL of robots.txt (which is no longer at the root)
- the URL of the robots.txt which returns an error (404, 500 …)
- the robots.txt overwritten by the preproduction version (in which a disallow directive is mentioned / which blocks the entire site)
- a blank line in a directive block
- bad encoding of the file (it must be in UTF-8)
- wrong order of directive blocks
The robots.txt commands
Allow indexing of all pages on a site
We don’t put anything after the “Disallow:”, which means we don’t block anything.
Here, we allow robots to index all pages.
Block indexing of all pages
Block indexing of a specific folder
Disallow: / folder /
Block GoogleBot in indexing a folder, except for a specific page in that folder
Disallow: / folder /
Allow: / folder/ nompage.html