Robots.txt File Explained

The robots.txt file is an important file located in your website files, it is a hidden file by default, you can view the robots.txt fie by logging into your Cpanal go to file manager and tick the show hidden files box then select the document root for the domain that you want to view the file, right click on the robots.txt file and select view.

The robots.txt is part of the robots exclusion protocol and this file regulates how and what files the robots can crawl, robots like the googlebot search out and harvest all the available links on your website and follow them to the next link, including all internal links on your website and all the external links linking to other websites on the internet, these links are then indexd by Google to be seached by other users to find what they are looking for.

With any website there are files and links with in that website that you do not want to be followed and indexed by the search engines, for example your installation files, libraries and logs, these files have no use to anyone if they are indexed and could cause a security problem as well, so these type of files are added to the disallow, and looks like this in your robots.txt file, Disallow: /installation/ now when the robots are crawling your site the instalation files are not indexed.

Files that you want indexed including any images as these count to your SEO score are allowed and look like this in your robots.txt file, Allow: /*.jpg*, the robots.txt file must be installed in the site root and not a subfolder within the main domain to work correctly, you can also place your links to your xml site map with in the robots.txt file, and the site map will be crawled and indexed when the robots crawl your website. 

So how does the robots.txt work, the robots crawl every website on the internet by following every link on every website which is billions of links world wide, when the crawler arrives at your website or any other website it looks for the robots.txt file first and will read this file before crawling the rest of the site, the file instructs the crawler which files to index on your site before following the links to next website where it reads the robots.txt again.

The robots.txt is installed automatically when you install your content management system like Joomla or Wordpress, and the file is installed in a standard configuration, depending on your needs you edit the file to allow or to disallow what you want to be indexed or not to be indexed, when you have a subdomain each website needs to have its own robots.txt, the file is very useful for when you have duplicated content on your website, you can allow one to be indexed and disallow the duplicated one from indexing.

