When you host a website, search engine spiders (otherwise known as bots) will take a tour of the portal for indexing purposes. Search engines will index websites and update the indexed data regularly. In fact, the search engines employ the very same data to display search results. Normally, these bots will index all the web pages originating from a website. Can we control these bots? I mean, if you wish not to index certain particular pages of your website, can you do it? Yes, you can execute it with the help of robots.txt file. What is the need for a robots.txt file? Is it mandatory to include this for your website? How will it affect the search engine visibility of your website? I will answer these in the succeeding sections.
There was a time when search engines used to penalize webmasters for promoting duplicate content (within the same website). While some administrators did it knowingly (with the very intention of increasing the page ranking), the others had to post duplicate content because they had to. For the sake of illustration, consider that you wish to make a web page printable. You will have to develop two web pages with the same content – but only one among them will be printable!
In the earlier example, with the help of the robots.txt file you can instruct the search engine bots to skip indexing the printable page. Bear in mind that this is just another instance with the help of which you can understand the true importance of the robots.txt file. Plenty of websites contain confidential data, and it is a smart practice to keep them away from search engine bots. Once indexed, it is tough to flush them off. Webmasters often employ the same robots.txt file to choose the web pages that the search engine must index.