Shadow DOM indexability

Ramón Saquete

Written by Ramón Saquete

Shadow DOM is difficult to index, because it can only be created using JavaScript, and there isn’t a way to declare it in the HTML. However, it’s a very useful technology to make work easier for developers, which is why it’s increasingly more common to see it in newly developed websites. In this post we are going to see what shadow DOM is, what issues it has with regard to indexing, and how we can try to make it indexable.

Do not let yourself be swept away by DOM's dark side. Shadow DOM is not indexable without JavaScript, but the light DOM is.Click To Tweet

Before we get down to business, let’s remember what web components are: they encompass various technologies, among which is also shadow DOM.

Web components

Shadow DOM is one of the technologies we talked about some years ago, which is used by web components. Since then, they have evolved from web components v0 (which were only compatible with Chrome and required polyfills or frameworks for other Internet browsers) to web components v1, which are compatible with all browsers.

Web components do not contribute with anything new to the user, but they are an excellent way to isolate certain parts of the code of a website, making it easier for several developers to work in parallel on the front end. The components prevent the code of one developer from affecting the code of another developer, as they allow encapsulation, thanks to the CSS and JavaScript inside shadow DOM remaining isolated from the rest of the page, inside a new HTML tag or a custom element, defined with whichever name the developer chooses to give it. And that’s why this technology is increasingly more present in newly developed websites, as we can see in the chart attached below, where the usage of custom elements in Chrome is displayed, being one of the foundation technologies web components are composed of:

Web components - custom elements usage

We must not confuse this type of components with the components of a framework like Vue, Angular JS or React, because these directly replace the component by HTML code in light DOM, and they can do it server-side using the universal version, which is why they don’t pose as many indexability issues as real web components do.

Difference between light DOM, shadow DOM and composed DOM

In terms of indexability, light DOM is the part of the component that is visible in the code without executing JavaScript, and thus, it is indexable. On the other hand, shadow DOM is the part of the implementation that remains hidden, created by the developer using JavaScript. And, as we already know, Google doesn’t always have enough render budget to run the JavaScript of the pages it indexes, which is why Google recommends –whenever possible– to keep the content to be indexed inside the light DOM.

Composed DOM is a combination of the two, because inside the shadow DOM we can define slots, to which we can assign chunks of the light DOM.

Let’s see it with an example: let’s imagine we have a web component (they can be easily recognised because they use an HTML tag with a hyphenated name, in this case “my-component”). This is how the HTML code of this component would look, without running JavaScript:

<my-component>
        <span slot="title">Title inside the light DOM but with the H3 inside the shadow DOM</span>
        <p>Text inside the light DOM</p>
    </my-component>

Moreover, we assume that the component has a template declared in the following way, where there is a slot with the name “title”, and another one, initially without a name, which will take the content of the component that doesn’t have any slot name assigned:

<slot></slot>   
   <h3><slot name="title"></slot></h3>
   <p>Text inside the shadow DOM</p>

If we run JavaScript in the previous component, the browser will generate the following code, which is the shadow DOM combined with the light DOM, i.e. the composed DOM:

  <p>Text inside the light DOM</p>
   <h3>Title inside the light DOM but with the H3 inside the shadow DOM</h3>
   <p>Text inside the shadow DOM</p>

In this case, if Google doesn’t run JavaScript, it won’t know which title uses an h3 tag, it won’t see the paragraph that says “Text inside the shadow DOM”, and it will crawl the text in a different order, damaging the rankings of the page using this component, because it can’t give enough importance to the header, while also partially losing the content and its order.

How can we know which part of a web component is inside the shadow DOM?

If we disable JavaScript on a page, we won’t see anything placed inside the shadow DOM, or that, which requires JavaScript, but if we want to know exactly which parts of a component are inside the shadow DOM, we can use the “Inspect element” tool, with JavaScript enabled, and the browser will show us the shadow DOM and the light DOM separately, conveniently colouring the shadow DOM. Example:

light and shadow dom

In this Google Chrome screenshot, where we can see the component of the previous example with the “Inspect element” tool, we can see that the shadow DOM appears inside the #shadow-root tag, indicating that it’s the shadow DOM’s root. Moreover, if we click inside a slot, it will highlight the HTML linked to it in the light DOM.

How can we make shadow DOM indexable?

In the present day, the only way to make shadow DOM indexable is to use a technique called “rehydrating DOM“. It consists in running the code calculating the composed DOM of the component on the server, exactly as the browser would, to replace this component by this code. This way, everything stays generated inside the HTML created on the server before reaching the client. A JavaScript library implementing this technique is skatejs. Nevertheless, depending on the web component this solution could lead to issues.

Another option is, as Google suggests, to move all the content susceptible of affecting rankings to the light DOM. But this can only be done in specific cases. For example: if the component is a button carrying out an action, such as sharing a page on a social media platform, you don’t even need it to have light DOM. If, on the other hand, it’s a component providing format to a Q&A block, then it can be more problematic, especially if we want to keep inside the light DOM the semantic tagging of the headers and other semantic tags of the HTML, and to prevent the page styles from affecting these tags.

Conclusions

Web components are very useful for custom development, and if their use extends to CMS, we can better isolate the code of the various plug-ins to prevent incompatibilities. However, to ensure the indexability of these pages, we should either not use web components, or keep all the important code inside the light DOM of the component, or force developers to use the complex solution of rehydrating DOM.

None of these approaches is good, because we lose our ability to encapsulate code, which is the primary reason for the use of web components. We expect this specification to evolve in the future, and allow us to declare shadow DOM inside the HTML in an explicit and indexable way.

Ramón Saquete
Autor: Ramón Saquete
Web developer at Human Level Communications online marketing agency. He's an expert in WPO, PHP development and MySQL databases.

Leave a comment

Your email address will not be published. Required fields are marked *