Google has announced that from September 1, 2019, it will no longer consider the “noindex” directives in the robots.txt file. Website administrators who strictly relied on noindex directives to exclude pages from Google’s index will have to take measures as soon as possible.

In an article published on the official webmasters blog , Google specified that the noindex directives in robots.txt will no longer be taken into account by their crawlers. This means that from September 1, 2019, all pages excluded from the Google index through this procedure will be able to be found and indexed by the giant search engine.

What is robots.txt?

Robots.txt is a text file, inserted in the root of the site and which contains instructions for search engines regarding the pages that can be accessed and the pages where their access is not allowed. Although these solutions work very well in general, there are also situations when search engines do not respect the instructions provided by the robots.txt file and end up indexing certain pages.

Why doesn’t google respect the instructions in the robots.txt file?

Until the beginning of September, Google respected the noindex directives, although these were not official indications. In the case of official directives, such as “nofollow”, search engines should not index the content. And yet I do it.

One cause of this non-compliance is the presence of external links that refer to the hidden page.

How can we prevent Google from seeing certain pages?

Although for many entrepreneurs in the online environment, the main goal is to have as many pages of the site as possible indexed by search engines, every site also has some pages that should not be displayed in the results page.

Whether we are talking about site authentication URLs, or order completion URLs, any site administrator will try to hide certain pages from search engine crawlers. The easiest solution for deindexing pages, but also the most common, is through the robots.txt file. This, used properly, can bring the desired results, but, as we mentioned above, if the pages that want to be hidden from search engines receive external links, Google will ignore the indications of the robots file and index the page.

As Google suggests, the safest way to remove pages from the index is by configuring the “meta robots” tag. If through the robots txt method, there may be situations where certain hidden pages end up in the Google database, through the “noindex, nofollow” directives in meta robots, the pages will be removed permanently.

Conclusion

In conclusion, we can say that even Google is not perfect, and any website administrator must make sure that his website is up to date with all the changes imposed by Google. In the case of directive changes in robots txt, our recommendation is to review the files before certain pages end up indexed.