The HTML5 outline algorithm

Many say that the outline algorithm is worthless, as it is currently not implemented by any browser and may not have any effect on SEO. There is no indication that this situation will change in the future. However, it is very important to know it because it contains the fundamentals that allow you to understand how to write HTML5 code correctly, thus improving semantic structure, SEO and usability in screen readers.

In HTML4 the hierarchy of headings on a page is given by the h1, h2, h3, h4, h5 and h6 tags. In HTML5 it is different, the level that each heading occupies in the hierarchy is not given mainly by the number that accompanies each “h”, but by the place that each heading occupies within the document and the tags that surround them. It is an idea that has its origin in the discarded XHTML2 specification and has served as the basis for the development of HTML5 and has a couple of important advantages: first we can have an infinite number of header levels availablenot just six, and in second place, we can create components that regardless of whether they have a h1, h2, h3, etc. their headings will always occupy the level of the hierarchy that corresponds to them within the page. The latter is especially important when reusing code, where we can have headers of any level, and it is applicable to widgets, CMS plugins and the Web Components I will write about them in the near future.

Before we dive into the algorithm, let’s take a look at some basic HTML5 concepts:

Root section elements

These are elements that can be considered to contain independent documents. These elements are:

<body>: you already know it, it contains the HTML document
<blockquote>: content from another source
<fieldset>: group of form controls
<figure>: used for illustrations, diagrams, code, etc.
<td>: table cell

Content section elements

These are elements that define sections within the current root section element or within other content sections. Each section can contain a header and footer element. Below is a very brief explanation of these elements:

- <article>: is a part of the document that can be understood and distributed independently without having to read other parts of the document. They are usually blog articles, products, etc.
- <section>: is a part of the document that is related to other parts of the document. It is used to form a grouping of parts of the same subject.
- <aside>: is a part of the document with unrelated or transversal information regarding the main topic. Normally used for sidebars.
- <nav>: is a part of the document with main navigation links, leading to other pages or parts of the same page.

Be careful with the label <main>. Although it may seem so, this tag is NOT a content section element: it is used to frame the main content of the page, which is usually that part of the content that changes when we navigate from one page to another.

With this, we now have the basis for understanding how the
“outline algorithm” works
which we could loosely translate as header algorithm.

Outline algorithm

Everything I have explained and what I am going to expose below, you can also find in English and in a much more extended form, even with implementations of the algorithm, in the HTML specification “the living standard” of the WHATWG.

<body>
 <h1>A</h1>
 <p>B</p>
 <h2>C</h2>
 <p>D</p>
 <h2>E</h2>
 <p>F</p>
 <h3>G</h3>
 <p>H</p>
</body>

Since this is a document without a clear semantic structure, the heading algorithm has to rely on the numbers of the “h” tags to guess what sections the document has. The structure would be as follows: <h1>A</h1> and paragraph B, inside the algorithm creates three implicit sections, in the second level of the document one with <h2>C</h2> <p>D</p> and another with <h2>E</h2> <p>F</p>and within the latter <h3>G</h2> <p>H</p>. Structure:

- 1. <h1>A</h1>
    1. <h2>C</h2>
    2. <h2>E</h2>
      1. <h3>G</h3>

We can achieve the same heading structure using content section elements:

<body>
 <h1>A</h1>
 <p>B</p>
 <section>
   <h1>C</h1>
   <p>D</p>
 </section>
 <section>
   <h1>E</h1>
   <p>F</p>
   <section>
    <h1>G</h1>
    <p>H</p>
   </section>
 </section>
</body>

In this way the algorithm knows which level each header occupies depending on the number of parent section elements it has, ignoring the number of each “h”.

When writing HTML5 code, whenever we write a title with an “h” tag, it is a good practice to enclose what we are writing inside a section element.

To make the code compatible with both HTML4 and HTML5, we can give each “h” tag the number actually given by the algorithm in HTML5:

<body>
 <h1>A</h1>
 <p>B</p>
 <section>
   <h2>C</h2>
   <p>D</p>
 </section>
 <section>
   <h2>E</h2>
   <p>F</p>
   <section>
    <h3>G</h3>
    <p>H</p>
   </section>
 </section>
</body>

This last example would be the most correct. If a section element is left untitled here, the browser may give it a generic title such as “Section” or “Sidebar”, so it is also advisable to title section elements.

Let’s see more examples:

<body>
<h1>A</h1>
<p>B</p>
<h1>C</h1>
<p>D</p>
<h1>E</h1>
<p>F</p>
</body>

In this case the main document is considered to be divided into three parts that have the same level, this is something that Google has allowed before the advent of HTML5. Structure:

- 1. A
  2. C
  3. E

To avoid making mistakes, it is also important to keep the following rule in mind:

Headers never rise above other sections.

Let’s see an example to understand it, what would you say is the main heading or h1 of the following document?

<body>
<section>
 <h1>A</h1>
 <p>B</p>
</section>
<h1>C</h1>
<p>D</p>
</body>

Did you answer that the document has no main heading? Probably not, but if so, then you’ve got it right, since the structure looks like this:

- 1. (untitled)
    1. A
  2. C

What happens is that when the main title is placed after a section, it cannot rise above it, creating a second part of the document with the title that we thought was going to be the main one. Let’s look at another example:

<!DOCTYPE HTML>
<title>título de página</title>
...

<h1>título del documento</h1>

<main>
 <article>
<header>
  <nav>
   <a href="?t=-1d">enlace 1</a>;
   <a href="?t=-7d">enlace 2</a>;
   <a href="?t=-1m">enlace 3</a>
  </nav> 
</header>
  <h2>título conflictivo</h2>

  <p>texto</p>
</article> 
</main>
...

In this case we have an article element without its title because it appears after a nav section. The structure would be as follows:

- 1. document title
    1. (Untitled article)
      1. (Untitled navigation section)
    2. conflicting title

Finally, please note that the root section elements create independent documents, e.g:

 <blockquote>  <h2>título</h2>  <blockquote>

In this case the h2 would actually be an h1 outside the hierarchy of the document in which it is located.

I hope this article has given you some clarification on how to write HTML5 code.

The HTML5 outline algorithm

Root section elements

Content section elements

Outline algorithm

Tags