Errors to avoid when applying structured data markup

Ramón Saquete

Written by Ramón Saquete

Structured data allow us to specify to Googlebot the type of information a website contains, so that it can interpret it better, and with certain data types, use it to enrich search results and generate impressions through voice assistants. Nevertheless, when implemented incorrectly, they could penalise the page as a result of a manual review by Google.

Reading the documentation regarding manual actions executed by Google to penalise a website we encounter multiple examples of structured data misuse or abuse. These mistakes are always made on purpose as part of a Black Hat SEO strategy, in order to appear in the SERPs with rich snippets. Making these mistakes unintentionally is highly uncommon, so even if we claim to have done something by accident, we are going to be penalised all the same if it comes to a manual review.

We should not be afraid of being penalised because of structured data, though. As we’ve said on other occasions, when used correctly they improve rankings and visibility in search, besides allowing us to appear in voice search results. To make the most of the advantages of structured data without being penalised, we should always use them correctly and without cheating.

In the following paragraphs, we are going to see the types of errors we should avoid, if we want for Google to interpret our data correctly, and if we don’t want to receive a penalty. Basically, we can have these errors:

  1. Syntax errors, of two types:
    1. Those made with regard to the language format.
    2. Those made with regard to the grammar specified by Schema.org and Google.
  2. Semantic errors.

These errors can happen regardless of the markup language.

Syntax errors

Syntax errors happen when we write in a programming language without following the rules of its grammar. In the case of structured data, we have to follow two types of grammar rules: formatting rules, which we are going to use to write them (e. g. JSON-LD), and grammar rules set by the Schema.org vocabulary.

Syntax errors related to formatting

As we’ve already mentioned, these are errors that happen when we write the format of structured data, either JSON-LD or microdata, which results in the data type being impossible to interpret.

Let’s see an example where we generally see a JSON-LD with several typical syntax errors. One of them, because we didn’t consider that the “license” attribute can be empty when this code is generated:

{
   "@context":"https://schema.org/",
   "@type":"BlogPosting",
   license: 
   headline:"example",
}

If we don’t have a value for “license”, we should either remove this attribute, or add an empty string in the following manner, to prevent syntax errors and have a correct JSON:

{
   "@context":"https://schema.org/",
   "@type":"BlogPosting",
   "license": "",
   "headline":"example"
}

The other errors corrected in the second example, if you haven’t noticed, are the quotation marks enclosing the attributes, and the removal of the last comma before the closing curly brace. Other required attributes are also missing, but these are not syntax error formats. They belong to the vocabulary syntax error category instead.

To avoid these errors in our structured data definition, we should follow the JSON grammar, defined in the RFC 7159, W3C’s JSON-LD specs, and WhatWG’s microdata specs.

When a markup language follows the set grammar rules correctly, it is said that it’s been correctly formatted (e.g. in the XML validation tools, such as a sitemap, it’s common to run into this expression).

Syntax errors related to Schema.org and Google

These are errors, which do not follow the grammar rules of the Schema.org or Google’s specification.

Some examples of syntax errors:

  • To assign as an attribute value an incorrect or non-existing data type. For example, if the attribute author can only be of type Organization or Person, we cannot break the grammar rules by assigning to it Event type of data.
  • To add an attribute to a type of data that doesn’t carry one, as a result of an error when following the spec. For example, adding the addressLocality attribute directly to the LocalBusiness type.
{
  "@context": "http://schema.org",
  "@type": "LocalBusiness",
  "addressLocality": "Madrid"
}

This is an error, because the spec indicates that LocalBusiness is composed of an address attribute of the PostalAddress type. addressLocality belongs to the latter, and not to LocalBusiness.

{
  "@context": "http://schema.org",
  "@type": "LocalBusiness",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Madrid"
  }
}

Syntax errors as a result of not following the vocabulary specification can be further divided into two types: those resulting from not following the general specification of Schema.org, and those resulting from not following the specification of Google. We had already explained in the past how to interpret Schema.org to create structured data, and that the Google spec presents additional restrictions to the Schema.org spec, like which attributes are required.

When a markup language follows the syntax rules of its format and its vocabulary, it is said that it’s valid, which also means it’s correctly formatted.

How to identify the syntax error type using validation tools

The interpreters used by structured data validation tools do not specify whether they found syntax errors in the format or the vocabulary, but they do have their own particular way of expressing they have found a format syntax error. If the tool returns a different type of error, it’s an error concerning the grammar of the vocabulary:

Syntax error rich snippets tool
If there’s only one syntax error, the rich snippets testing tool will return a parsing error.

 

Structured data testing tool syntax error
If there’s only one syntax error, the structured data testing tool will return it as an “Uncategorised error”.

This is important, because for the vocabulary of a structured data type to be correct, first its format must be syntactically correct. So, if we only have a format syntax error, Google won’t read anything from the structured data. And, if we don’t have those, but we have various vocabulary errors, it might interpret something different, even though it highlights the errors in red.

Semantic errors

Semantic errors are those, which provide the data with a different meaning to the one they actually have. A clear and obviously exaggerated example would be to use the Book data type on a recipe. In terms of syntax, it can be done, plus both Book and Recipe inherit the same CreativeWork attributes we can fill in, but the meaning we would be giving to it wouldn’t be correct.

It’s common to find cases where, even though there is no error, the meaning can be improved. For example, by assigning the Article data type to a blog entry, instead of the deriving BlogPosting type. However, this isn’t a reason for penalty.

Not a single automatic tool will tell us whether the document is semantically correct. That’s why Google conducts manual reviews, and the reason why structured data exist, because if the machine was capable of deducting the meaning of the data and the relationships between them without getting it wrong, we wouldn’t have to tag them.

Semantic errors are those, which could provoke a penalty after a manual review, while syntax errors only generate Google Search Console warnings, without any type of penalty, besides having incorrect structured data.

If we look at Google’s structured data specification, we are going to encounter the semantic meaning of each data type, in much more detail than on Schema.org. Additionally, the search engine’s documentation includes examples to avoid misinterpretation, which could lead to a penalty.

That’s the type of error Black Hat SEOs usually commit on purpose. For example, lately it’s been happening a lot with FAQPage and How to. The appearance of rich snippets in the SERPs and voice search have resulted in an abusive use of these tags. They are used on pages, which are neither providing instructions nor answers, by artificially adding pieces of content to page types where the main content has nothing to do with them. Their sole purpose is to take up more space in the results, without thinking about the fact that this is something that is going to decrease their relevance in terms of what they really want to rank for, lowering their rankings and increasing bounce rate.

Other errors

Implementing structured data on information that isn’t even present on the page, or information hidden from the user is a common error of the most daring Black Hat SEOs. This could also be considered a semantic error, providing grounds for a penalty. For example, this is a common occurrence with review rich snippets, which are used to make up ratings, that never even appear on the page, or are simply hidden, their only purpose being to appear with a star rating in the search results.

Recently, Google has limited the number of structured data types it can be applied to, and it doesn’t allow them to be used on self-serving comments for LocalBusiness and Organization types, due to the frequent misuse this structured data type has suffered.

Among Google’s guidelines we can find other logical requirements, such as that structured data should not be applied on illegal content, namely: content plagiarism, scams, frauds, or any other type of ethically deplorable content.

Conclusion

In situations where structured data doesn’t bear a visual effect on the results, they are very useful for rankings, because they help us specify the meaning of our content to the search engine bot. They put us on better positions for things we’re relevant in, decreasing the bounce rate, which at the same time improves rankings.

Visual changes provided by rich snippets are also useful for boosting CTR in the SERPs, but most SEOs only see this effect. Occasionally, Black Hat SEOs abuse it, to the point of completely distorting the content’s meaning, which is counterproductive and penalty-inducing.

The best structured data strategy, as with anything SEO-related, is to always keep our white hat on, and try to implement them in the best possible way.

Ramón Saquete
Autor: Ramón Saquete
Web developer at Human Level Communications online marketing agency. He's an expert in WPO, PHP development and MySQL databases.

Leave a comment

Your email address will not be published. Required fields are marked *