Written by Ramón Saquete
Table of contents
- 1 CSR issues during the initial load of a page
- 2 CSR problems with regard to navigation to the next page
- 3 Blocking the indexing of partial responses through AJAX
- 4 Conclusion
Sometimes, the development company uses CSR and doesn’t offer us the option of using a universal framework. This CSR-based web development will get us into trouble, to a greater or lesser degree, depending on the crawler and its ranking algorithms. In this post, we are going to analyse what these problems with Google’s crawler are and how to solve them.
CSR issues during the initial load of a page
Issues as a result of slow rendering
Google’s indexing process goes through the following steps:
- Crawling: Googlebot requests a URL to the server.
- First wave of indexing: it indexes content painted on the server instantly, and gets new links to crawl.
- Second wave of indexing: with the HTML painted client-side, the remaining content is indexed, and new links to be crawled are obtained.
In the tests we’ve conducted, when the HTML rendering took more than 19 seconds, it didn’t get to index anything. While this is a long time, in some cases it can be surpassed, especially if we use AJAX intensively, and in these cases Google’s renderer, just as any renderer, really, has to wait for the following steps to occur:
- HTML is downloaded and processed to request linked files and to create DOM.
- CSS is downloaded and processed, to request linked files and to create CSSOM.
- The AJAX request is moved to a request queue, waiting to be responded, together with other requested files.
- The AJAX request is launched, and it has to travel through the network to the server.
The request and download times of the process we just described depend on the network and server load during that time. Moreover, Googlebot only uses HTTP/1.1, which is slower than HTTP/2, because requests are dealt with one after the other, and not all at the same time. It’s necessary that both the client and the server allow HTTP/2 to be used, which is why Googlebot will only use HTTP/1.1, even if our server allows HTTP/2. To summarise, this means Googlebot waits for each request to finish in order to launch the next one, and it’s possible it won’t try to parallelise certain requests by opening various connections, as browsers do (although we don’t know exactly how it does it). Therefore, we are in a situation where we could exceed these 19 seconds we estimated earlier.
On the other hand, due to these CSR performance issues, we will get a worse mark for the FCP (First Contentful Paint) metric in PageSpeed in terms of rendering and its WPO, and as a consequence, worse rankings.🕸A pure CSR approach damages indexing and rankings, because HTML generation is more costly for both Googlebot and browsers 😕Click To Tweet
- If the application requests user permission to do something, and on it depends the rendering of the main content, it won’t get painted, because Googlebot denies any permission it’s requested by default.
We must keep in mind that in the HTML generated by the previous tool, all metadata (including the canonical URL) will be ignored by Googlebot, as it only takes into account information when it’s painted on the server.
Now, let’s see what happens when we use a link to navigate, once we’re already on the website, and the HTML is painted client-side.
- Links don’t have a valid URL returning 200 OK in their href attribute.
What happens with the fragments now that Google can index AJAX?
Fragments are a part of a URL that can appear at the end, preceded by a hash #. For example:
Back when Google couldn’t index AJAX, if a URL changed its content through AJAX, based on the fragment part, we knew it was only going to index the URL and the content without taking the fragment into account. So… what happens to pages with fragments now that Google can index AJAX? The behaviour is exactly the same. If we link a page with a fragment, and it changes its content when accessed through the fragment, it will index the content ignoring the fragment, and the popularity will go to this URL, because Google trusts the fragment is going to be used as an anchor, and not to change the content, as it should.
However, Google does currently index URLs with a hashbang (#!). This can be implemented by simple adding the exclamation mark or bang, and Google will make it work, to maintain backward compatibility with an obsolete specification to make AJAX indexable. This practice, however, is not recommended, because now it is should be implemented with the history API, and besides, Google could suddenly stop indexing hashbang URLs any time.
Blocking the indexing of partial responses through AJAX
When an AJAX request is sent to URLs of a REST or GraphQL API, we’re returned a JSON or a piece of a page we don’t want indexed. Therefore, we should block the indexing of the URLs to which these requests are directed.
Back in the day we could block them using robots.txt, but since Googlebot’s renderer came to exist, we cannot block any resource used to paint HTML.
Currently, Google is a little bit smarter and it doesn’t usually attempt to index responses with JSONs, but if we want to make sure they don’t get indexed, the universal solution applicable to all search engines is to make all URLs used with AJAX to only accept requests made through the POST method, because it isn’t used by crawlers. When a GET request reaches the server, it should return a 404 error. In terms of programming, it doesn’t force us to remove parameters from the URL’s QueryString.