Mistakes to avoid when applying structured data markup

Ramón Saquete

Written by Ramón Saquete

Structured data allows specify to the Google robot the type of information contained in a website so that it can to interpret it better and, with certain types of data, enrich how search results are displayed and generate printouts in voice assistants. However, if the incorrectly applied, the page could be penalized in a manual Google review..

In the documentation on the manual actions that Google performs to penalize a site, we find multiple examples of misuse or abuse of structured data. These failures are always done intentionally as a Black Hat SEO strategy, in order to appear in the SERPS with rich snippets. It is very rare to make one of these mistakes unintentionally, so even if we claim that we have done it unintentionally, we will still be penalized in a manual review.

We should not be afraid of being penalized for structured dataas we have already mentioned on other occasions, well used, they improve positioning and visibility in searches, in addition to appearances in voice searches. To take advantage of structured data without being penalized, we must always apply them correctly and without deceit..

Next, let’s see what types of errors we should avoid, both if we want Google to interpret the data correctly and if we don’t want to be penalized. Basically, we have these types of errors:

  1. Syntactic errors. They can be of two types:
    1. Commitments on language formatting.
    2. Assignments on the grammar specified by schema.org and Google vocabularies.
  2. Semantic errors.

These are the same mistakes that can be made with any markup language.

Syntactic errors

Syntactic errors occur when a language is written and the rules of language grammar are not followed. In the case of structured data we have to follow two types of grammatical rules: those of the format in which we are going to write it, for example JSON-LD, and the grammatical rules of the language defined by the schema.org vocabulary.

Syntactic errors on formatting

As we have already mentioned, these are errors made when writing the format of structured data, whether JSON-LD or microdata, so that the data type cannot be interpreted.

Let’s see an example where we generate a JSON-LD with several typical syntax errors. One of them is caused by not taking into account that the “license” attribute may be empty when generating this code:

{
   "@context":"https://schema.org/",
   "@type":"BlogPosting",
   license: 
   headline:"ejemplo",
}

If we do not have a value for “license”, we should remove this attribute or add an empty string, as follows, to avoid the syntax error and make it a correct JSON:

{
   "@context":"https://schema.org/",
   "@type":"BlogPosting",
   "license": "",
   "headline":"ejemplo"
}

The other syntax errors corrected in the example, in case you have not seen them, are the quotation marks in the attributes and the elimination of the last comma before the closing braces. Required attributes are also missing, but these are not errors due to not following the syntax of the format, but due to not following the syntax of the vocabulary.

To avoid making mistakes in the formation of structured data, we must follow the JSON grammar, defined in RFC 7159, the JSON-LD specification in the W3C and the specification of microdata in the WhatWG.

When a markup language follows the grammatical rules of its format correctly, it is said to be well-formed (in XML validationtools, such as a sitemap, it is common to find this expression).

Syntactic errors about schema.org and Google

They are those committed by not following the grammatical rules of the schema.org or Google specification.

Examples of syntactic errors of this type are:

  • Assign as the value of an attribute, a data type that is incorrect or does not exist. For example, if the author attribute can only be of type Organization or Person, we cannot break the rules of the grammar by assigning it data of type Event.
  • Adding an attribute to a data type that does not have one, due to errors in following the specification. For example, adding the addressLocality attribute directly to the LocalBusiness type:
    {
      "@context": "http://schema.org",
      "@type": "LocalBusiness",
      "addressLocality": "Madrid"
    }
    

    This is an error because the specification indicates that LocalBusiness is composed of an address attribute of type PostalAddress, to which the addressLocality attribute corresponds , and not LocalBusiness:

    {
      "@context": "http://schema.org",
      "@type": "LocalBusiness",
      "address": {
        "@type": "PostalAddress",
        "addressLocality": "Madrid"
      }
    }
    

Syntactic errors due to not following the vocabulary specification can be subdivided into two types: those committed by not following the general schema.org specification and those committed by not following the Google specification. We have already explained how shcema.org should be interpreted to create structured data and that in the Google specification, we will find additional restrictions to the schema.org specification, such as which attributes are required.

When a markup language follows the syntactic rules of its format and those of the vocabulary, it is said to be valid, which also implies that it is well-formed.

Differentiate the type of syntactic error with validation tools.

The interpreters of structured data validation tools do not tell us directly whether they have found syntactic errors in formatting or vocabulary, but they do tell us that have their own way of expressing that they have found a syntactic error of the format, if the tool gives another type of error, it is an error made on the grammar of the vocabulary:

Syntax error in rich results tool
If there is a single syntax error in the formation of the code, the rich result test tool will give us “Analysis error”.
Syntax error in structured data testing tool
If there is a single syntax error in the formation of the code, the “Unclassified errors” section will appear in the structured data testing tool.

This is important because for the vocabulary of a structured data type to be correct, its format must first be syntactically correct. So if we have a single syntactic error in the format, Google will not read anything of the structured data type y, if we don’t have them, but we have several errors in the vocabulary, it may interpret some things even if you point out the errors in red.

Semantic errors

These are those committed by giving the data a meaning other than the one it has. A clear and exaggerated example would be to use theBook data type for a recipe. Syntactically we can do it and, besides, the Book type and the Recipe type inherit the same attributes from CreativeWork that we could fill, but the meaning we would be giving it would not be the correct one.

It is common to find cases where, although there is no error, the meaning can be improved. For example, assigning the data type Article to a Blog post instead of the derived type BlogPosting. But this is not a reason for penalty.

No automatic tool will tell us if the document is semantically correct. This is the reason why Google performs manual reviews and the reason why structured data exists, because if the machine were able to deduce the meaning of the data and the relationships between them without making a mistake, it would not be necessary to mark them.

Semantic errors are the ones that could generate a penalty in a manual review, while syntactic errors will only generate warnings from Google Search Console, without any type of penalty, beyond not having the data structured correctly.

In Google’s structured data specification, we will find the semantic meaning of each type of data more detailed than in schema.org. The search engine documentation also includes examples to avoid confusion that could lead to a penalty.

This is the kind of mistake that Black Hat SEOs tend to make on purpose. For example, it happens a lot lately with the FAQPage and How to data types. By starting to generate rich snippets in SERPS and voice searches, have begun to be abused. They are inserted into non-Q&A and non-instructional pages, artificially adding chunks of content of these types to pages where they do not constitute their main content, just to take up more space in the results, without taking into account that it will detract from the relevance of what really needs to be positioned on the page.and increasing the bounce rate.

Other errors

Adding structured data about information that is not on the page or that is hidden from the user is a common type of error made by the most daring Black Hat SEOs, which could also be considered a semantic error and is a reason for penalization. This is often the case, for example, with the rich snippet of reviews with stars for rating, with which the ratings are often invented and do not even appear on the page or are hidden, for the sole purpose of appearing with the stars in the results.

Google has recently limited the number of structured data types to which it can be applied and does not allow its use on self-managed comments on LocalBusiness and Organization types, due to the abuse of this structured data type.

Among Google’s guidelines, we can find other logical requirements, such as that no structured data on illegal content such as: plagiarized content, scams, hoaxes or any kind of ethically deplorable content should be applied.

Conclusion

In situations where structured data does not have a visual effect on the results, are very useful for positioning, as they help to specify the meaning of the contents to the robot.positioning them better for what we are relevant to, decreasing the bounce rate, which in turn improves positioning.

The visual changes of the enriched fragmentsare also useful for increase CTR in SERPSHowever, most SEOs only see this effect and, sometimes, Black Hat SEOs abuse them, to the point of distorting the meaning of the content of the pages, which is counterproductive and punishable.

The best strategy with structured data, as with any aspect of SEO, is to keep our white hats on and try to implement it in the best possible way.

  •  | 
  • Published on
Ramón Saquete
Ramón Saquete
Web developer and technical SEO consultant at Human Level. Graduated in Computer Engineering and Technical Engineering in Computer Systems. He is also a Technician in Computer Applications Development and later obtained the Pedagogical Aptitude Certification. Expert in WPO and indexability.

What do you think? Leave a comment

Just in case, your email will not be shown ;)

Related Posts

en