Googlebot

Fernando Maciá

Written by Fernando Maciá

What is Googlebot?

When we talk about Googlebot, we are referring in SEO, to the Google robot that crawls the different accessible web pages, to see its content, classify it and index it in its index. All search engines have spiders or bots that crawl the different accessible web pages, in the case of Google, Googlebot is the crawler that performs this function.

Googlebot not only has the ability to crawl and index Internet websites, but it can also extract information from files such as PDF, XLS, DOC, etc.

The development of Googlebot makes it even able to access and read some JavaScript and CSS files, so recently it is recommended not to block these contents from Googlebot’s eyes.

How Googlebot works

The robot requires an enormous amount of resources, as it must continuously crawl millions of web pages. To do so, it does through algorithmic tracking, i.e. an euristic logic provided by their software. The “crawl” section of the sitemap, which sites to crawl, how many pages to explore on each site, how often to crawl each site, and even how much time to spend researching and discovering new web pages.

To do this, the bot downloads copies of the pages it crawls and performs this crawling at an enormous speed, which can occur from several different locations, as it has the ability to distribute itself across different computers to optimize its performance and access web pages from different locations.

Despite the infinite crawls it performs with enormous speed, its goal is always to crawl the largest number of pages without saturating the server where they are located and without collapsing the server’s bandwidth.

The algorithm, which will be influenced by more than 200 factors, will determine how often each page should be crawled. As it does so, it will store them through indexing, in order to know their content and then offer them to users when they search in its search engine, Google.

Advantages offered by Googlebot

Its main advantage is that, once we get it to crawl our page, it will index and store it so that it can offer it as a search result to users. when it finds that it will be relevant to them, giving us the much desired visibility in the biggest search engine on the planet, Google.

For this reason we must facilitate access to Googlebot to all those contents of our website that we want to be indexed and displayed to users, avoiding using forms of programming that are not accessible to the spider, such as programming with tables (<frame>, <iframe>), the use of flash technology; or making a correct implementation of those that limit and hinder indexing, as happens with the use of AJAX, JavaScript, etc.

In case there are certain contents of our website that we do not want to be registered by Google, either for privacy reasons or because they contain elements that are not of our interest to publish, we must totally or partially block Googlebot from indexing our content. This can be done through the robots.txt file, with the robots meta tag, in the HTTP X.Robots-Tag directives or through the inclusion of restrictions such as passwords or IPs.

You can also, with these meta tags, give directives to Googlebot to index or not the content, follow or not the site’s links, etc.

Disadvantages of Googlebot

You may encounter situations in which Googlebot is a problem or inconvenience, for example if you do not want it to access parts of your website, we have already explained that you should tell it through the mechanisms mentioned above, that it should not do so. But sometimes it happens that even though we have given it that directive, it skips it and ends up indexing content that we don’t want it to.

It is also possible that our server is limited and that the crawl frequency is a problem, or conversely, that we estimate that the time between Googlebot crawls has decreased. In these cases, we can indicate through GSC (Google Search Console), to increase or decrease the crawl frequency.

In the event that the Google spider does not pass by our website very often, we must also consider that it is not relevant, so we must improve the indexability, content, linking, popularity and other factors that influence to make our site relevant in the eyes of Google’s algorithm.

Googlebot family

Over time Google has increased the family of crawlers and although it is still Google’s main user agent, other new bots have emerged from it:

  • Googlebot News.
  • Googlebot Images.
  • Googlebot Video.
  • Googlebot Mobile.
  • Google Mobile AdSense.
  • Google AdSense.
  • Google AdsBot.

Additional references

What is SEO

Google Trackers.

Google’s robot.

  •  | 
  • Last modified on
Fernando Maciá
Fernando Maciá
Founder and CEO of Human Level. Expert SEO consultant with more than 20 years of experience. He has been a professor at numerous universities and business schools, and director of the Master in Professional SEO and SEM and the Advanced SEO Course at KSchool. Author of a dozen books on SEO and digital marketing.

Related Posts

en