Written by Ramón Saquete
Table of contents
With the arrival of AJAX, or Fetch API as it’s known now, SPA frameworks (Single-Page Application) have come to existence. These frameworks use Fetch requests to generate part of the HTML in the client, which prevents crawlers from correctly indexing all the content, or even crawling it, because this type of websites can be implemented without trackable links using URLs with fragments (with #), or directly without using URLs in links (which only work if the user clicks on them).
With “SPA”, the term Multiple Page Application (MPA) came along, to refer to classic frameworks, generating all the HTML in the server, which is exactly what crawlers need to index and crawl all pages without any obstacles.
There are many SPA-type frameworks: Google’s Angular (previously known as AngularJS), Facebook’s React, and an infinite amount of other open-source ones, namely Vue, Meteor, Lazo, Rendr, Aurelia, Backbone, Knockout, Mercury, etc. These frameworks initially could only be executed in the client, but later we’ll see how the best solution is that they are not.
As I’ve mentioned earlier, SPA frameworks are based on the use of Fetch API, because they are loaded in the browser with a shell, containing the parts that do not change during navigation, besides a series of HTML templates, which are filled with responses to Fetch requests launched to the server. We have to differentiate how it happens in the first request to the website, and in the navigation through the links once it has been loaded, as it works differently:
- First request: the Shell is sent over the network. Then, with one or several Fetch requests (which also travel through the network), we get the necessary data to generate the HTML of the page’s main content. As a consequence, the first load is slower, due to this hustle of requests, as this is what takes the longest. For that reason, it is faster to send all the HTML generated in the server during the first load in one unique request, as MPA frameworks do.
This isn’t a good technique, because it presents the following issues:
- We are incurring in cloaking, and as a result our website will only get indexed by the crawlers we’ve filtered.
- If our HTML hasn’t been cached, the crawler will perceive that the loading time is very slow.
- If we want the crawler to perceive a faster loading time, we have to generate a cache with the HTML of all the URLs, which implies having a cache invalidation policy. This isn’t viable, for the following reasons:
- The information should be updated continuously.
- The time it takes to generate the full cache is unacceptable.
- We don’t have enough space in the server to store all the pages of the cache.
- We don’t have the processing capacity to generate the cache and to maintain the page online at the same time.
- We have to keep in mind that the cache invalidation problem is very complex, because the cache has to be updated when something changes in the database. However, it’s not easy to erase the updated data from the cache. Because the cache is not a database, but something much simpler and faster, we cannot easily select what we want to regenerate, which means we follow strategies that delete more than is necessary, or leave inconsistent data. Depending on each individual case, these issues can prevent us from choosing this solution.
- And finally: tools that act as a browser in the server cost money: (Prerender.io, SEO4Ajax, Brombone, …).
- Only client-side.
This way we have pages that work fast both on the first load, as well as throughout the subsequent navigation, and the crawlers have no indexing problems, because the full page is always going to be generated in the server for them, without recurring to cloaking.