Angular and React indexability with Universal JavaScript

Ramón Saquete

Written by Ramón Saquete

With the arrival of AJAX, or Fetch API as it’s known now, SPA frameworks (Single-Page Application) have come to existence. These frameworks use Fetch requests to generate part of the HTML in the client, which prevents crawlers from correctly indexing all the content, or even crawling it, because this type of websites can be implemented without trackable links using URLs with fragments (with #), or directly without using URLs in links (which only work if the user clicks on them).

With “SPA”, the term Multiple Page Application (MPA) came along, to refer to classic frameworks, generating all the HTML in the server, which is exactly what crawlers need to index and crawl all pages without any obstacles.

ReactThere are many SPA-type frameworks: Google’s Angular (previously known as AngularJS), Facebook’s React, and an infinite amount of other open-source ones, namely Vue, Meteor, Lazo, Rendr, Aurelia, Backbone, Knockout, Mercury, etc. These frameworks initially could only be executed in the client, but later we’ll see how the best solution is that they are not.

Angular

How does a SPA framework work without Universal JavaScript?

As I’ve mentioned earlier, SPA frameworks are based on the use of Fetch API, because they are loaded in the browser with a shell, containing the parts that do not change during navigation, besides a series of HTML templates, which are filled with responses to Fetch requests launched to the server. We have to differentiate how it happens in the first request to the website, and in the navigation through the links once it has been loaded, as it works differently:

  • First request: the Shell is sent over the network. Then, with one or several Fetch requests (which also travel through the network), we get the necessary data to generate the HTML of the page’s main content. As a consequence, the first load is slower, due to this hustle of requests, as this is what takes the longest. For that reason, it is faster to send all the HTML generated in the server during the first load in one unique request, as MPA frameworks do.
  • Loading of the subsequent pages after the first request: SPA frameworks offer a much more fluid and faster load, because they don’t need to generate the HTML in full. Moreover, it is possible to add loading or progress bars in between page views, which create the sensation of a higher speed. However, there’s an additional problem: by generating part of the HTML with JavaScript client-side, after a Fetch request also made with JavaScript, crawlers cannot index these pages, especially when the user reaches pages where the crawler doesn’t even have a URL it can index.

How could you try to make a SPA framework indexable before Universal JavaScript?

Initially, there was the idea of using a tool in the server which acts as a browser. This browser in the server would become active upon receiving the request of a crawler. It would work in the following way: first, we would detect that the request came from the crawler, filtering it by its user-agent (for example, “Googlebot”), and then we would send it to a browser inside the server itself. This browser would request this URL to the web service, also inside the same server. Then, after recovering the response to the request from the web service, it would execute JavaScript, which would make the Fetch requests and generate the HTML in full, so that this way we can return it to the crawler and store it in cache. This way, the subsequent requests of the crawler for this URL will be returned from the cache, thus working faster.

To do this well, the links launching Fetch requests should have user-friendly URLs (obsolete techniques like URLs with hashbangs “#!” should not be used), and when the user clicks on a link, the developer should paint the same URL with JavaScript, using History API. This way, we ensure that the user can share this URL and add it to their bookmarks. This URL should return the full page when a crawler requests it to the server.

This isn’t a good technique, because it presents the following issues:

  • We are incurring in cloaking, and as a result our website will only get indexed by the crawlers we’ve filtered.
  • If our HTML hasn’t been cached, the crawler will perceive that the loading time is very slow.
  • If we want the crawler to perceive a faster loading time, we have to generate a cache with the HTML of all the URLs, which implies having a cache invalidation policy. This isn’t viable, for the following reasons:
    • The information should be updated continuously.
    • The time it takes to generate the full cache is unacceptable.
    • We don’t have enough space in the server to store all the pages of the cache.
    • We don’t have the processing capacity to generate the cache and to maintain the page online at the same time.
    • We have to keep in mind that the cache invalidation problem is very complex, because the cache has to be updated when something changes in the database. However, it’s not easy to erase the updated data from the cache. Because the cache is not a database, but something much simpler and faster, we cannot easily select what we want to regenerate, which means we follow strategies that delete more than is necessary, or leave inconsistent data. Depending on each individual case, these issues can prevent us from choosing this solution.
    • And finally: tools that act as a browser in the server cost money: (Prerender.io, SEO4Ajax, Brombone, …).

How to make a SPA framework indexable with Universal JavaScript?

Universal JavaScript can be executed both server-side and client-side.
Universal JavaScript can be executed both server-side and client-side.

The idea of Universal JavaScript or isomorphic JavaScript (as it was known initially) came to exist with Facebook’s SPA framework, React, and it consists in using a universal API, which underneath uses JavaScript APIs as a browser client, or JavaScript APIs as a server with Node.JS, depending on whether this universal API is executed client-side or server-side, respectively. This way, when we write code in JavaScript using this API, we can execute it both client-side and server-side. If we add this to a SPA framework, which was conceived only to work client-side, we have a universal framework that can work both client-side and server-side in the following way:

First, we need to take into account that we can differentiate three distinct types of JavaScript code for our website development, based on where it’s going to be executed:

  • Only client-side.
  • Only server-side, although this JavaScript can be replaced by any server-side language, such as PHP.
  • Client-side and server-side (Universal JavaScript).

If we use JavaScript in the block of code that is only executed server-side, we’ll be using this language in all three cases, therefore we’ll be using what is called a full-stack framework.

When using universal JavaScript, part of our code is reused in the client and in the server
When we use universal JavaScript, part of our code is reused in the client and in the server

Just as when we didn’t have Universal JavaScript, the behaviour is going to be different for the first request and the ones that follow:

  • First request: regardless of whether the request comes from a crawler or a user, the HTML in full will be generated in the server using the Universal JavaScript block, launching Fetch requests to the JavaScript executed server-side only. This is similar to when we’re not using Universal JavaScript, except for the Fetch requests made from the server to itself, and not from the client, avoiding the initial bustle of requests travelling over the network.
  • Loading of the subsequent pages after the first request: if it’s a user and they click on a link, the JavaScript executing client-side exclusively will intercept the click and transfer the request to Universal JavaScript (the same one from the previous point). This will make a Fetch request to the server-side JavaScript with the requested URL, and once the data has been recovered from the server, it will display the new page to the user. In this case, the Fetch request goes from the client to the server, preventing the page from reloading in full.

This way we have pages that work fast both on the first load, as well as throughout the subsequent navigation, and the crawlers have no indexing problems, because the full page is always going to be generated in the server for them, without recurring to cloaking.

Conclusion

If a development company offers you a website using Angular, React or any other SPA framework, make sure they know about Universal JavaScript and that they have a project that is indexing correctly, because they might not know about it, or not know how to use it. It’s not odd for them to be using an older framework version that doesn’t have Universal JavaScript. Angular, for example, initially created an independent add-on called Universal Angular, which later was integrated in the framework. On the other hand, if they are familiar with it, your website won’t have any indexability issues.

Another story is for them to know JavaScript, the frameworks, and all the issues these websites have, enough to write maintainable code and for its errors to be easily tested and fixed. A good sign they know what they’re dealing with could be to see whether they use other frameworks, in addition to the ones we mention here, to manage application states, like Redux or Ngrx. This is because this task –without this kind of additional frameworks– can result in code with low maintainability.

Ramón Saquete
Autor: Ramón Saquete
Web developer at Human Level Communications online marketing agency. He's an expert in WPO, PHP development and MySQL databases.

Leave a comment

Your email address will not be published. Required fields are marked *