How to interpret schema.org to create structured data

Ramón Saquete

Written by Ramón Saquete

Understanding schema.org’s documentation is not an easy task. To follow it we have to understand the abstractions deriving from the RDF data model, which, for most part, is comparable to the object-oriented data model, a subject studied transversally, together with software design, object-oriented programming and object databases. If these concepts are clear, we can create a structure of data that better suits our website’s information and schema.org data model, without being limited by what appears in the examples, and without losing out capability of getting Google to understand the semantic information these provide.

Structured data allows us to explain to the Googlebot the meaning of the information listed on our website, so that it can display rich snippet results, add information to its knowledge graph, better interpret information to show more relevant results, and even answer directly to questions through the position 0. All of this should attract traffic of better quality.

Structured data also makes the task easier for scrapers, which mean to extract content from our website to add it to their databases. If we have valuable information for others, we should prevent this by blocking “bad” bots from our server’s configuration.

We’ve already talked in our blog about the basic structured data and schema.org concepts before, I recommend you read that first if you don’t know what I’m talking about here. But if you’re already familiar with all of it, let’s dive in deeper into this subject.

Basic concepts of the schema.org’s data model

Now, let’s look at the concepts of class, property and instance of a class, which are basic for understanding how structured data information is modelled, as schema.org is a vocabulary restricted to a series of elements, where these concepts are used to define them.

Class

Structured data is used to define thing, and these things can be of any class or type, for example: people, places, products, actions like creating or searching, creative works like a blog or a book, events like concerts or matches, and intangible things like offers or services. These classes of things are simply called classes, although you’ll also see them called ‘types‘ or ‘entities‘.

Property or attribute

To define each type of information or classes we use properties, which can be assigned any value we want. For example, if we have a class “Product“, it could have a property “name“, which could be assigned the value of “cup”, and another property “color“, which could be assigned the value of “red”.

Each class can have different properties and what schema.org is going to tell us is what properties each of the classes we want to define have. That is, the class “Product” has the properties “name”, “color”, and many other a product can have. Another class like “Event”, we could have properties like “starting date”, and “ending date”, which would not make sense for a product.

Object or instance of a class

When we take a class and assign values to each of its properties on which we have information, it means we are creating an instance of a class. For example, if we assign data to a specific product, such as the red cup from the previous example, we would say that this cup is an instance or an object of the class “Product”.

Thus, classes are like templates telling us which data we need to fill out, and when we do it, we end up with objects or instances we want.

Object composition relationships

At this point, allow me to introduce nomenclature that is more specific to computer science and object-oriented data modelling, than to the RDF model. They help to simplify the explanation.

We’ve already seen that a property can be assigned values, but these values can be of different types, which we are going to classify into: primitive types or composite types. Primitive types consist in assigning a value of a specific type. To make it more palatable, the easiest thing you can do is to check out the following list, where I enumerate all primitive data types schema.org allows:

  • Boolean type value: the property can be true or false. For example, the CreativeWork class has a property isAccessibleForFree, to which we can assign the value true or false, depending on whether it’s free or it isn’t.
  • Date type value: the property is assigned a date in ISO 8601 format. For example, the Product class has the property releaseDate, to which we can assign the value 2018-01-30, which equals to 30 January 2018.
  • Date and time type value: the property gets assigned a date and time in ISO 8601 format.
  • Number type value: can be an integer or a floating point number.
  • Text type value: this is an arbitrary text, for example “cup”, which I used in a previous example for the “name” property. This text can also be a URL.
  • Time with format type value: hh:mm:ss[Z|(+|-)hh:mm].

Properties allowing use of composite value types are those to which we can assign one or several objects, of one or several classes, so inside this property we are going to be able to define several more properties. Let’s see an example: the Product class has the property review, to which we can assign a value, which is an instance of the Review class, where, in turn, we can fill in the “author” property, as an instance of class Person, where we can finally fill in the “name” property, corresponding to a real person, using a text primitive data type. We can say that we have a product object, composed of a review object, which in turn is composed of an author. This is what the object-oriented model calls composition relationship.

Product properties Schema.org
In this screenshot we can see several Product class properties, which have several primitive data types, and the property review, which takes in a composite data type.

When we create an instance of a product, we can navigate through all the possible composition relationships, and we will end up specifying all the necessary information to fill in almost all classes defined on schema.org. This, however, wouldn’t be correct, as the idea of structured data is not to add all the possible information, but to define for Google information that is displayed to the user. For that reason, it is not recommended to add a JSON-LD with all the information we have just to define a product.

Properties with several entities in schema.org
There are properties, to which we can assign an instance of a class, chosen from several different ones. In this screenshot we can see the property recipient, to which we can assign an object, which can be of type AuthorizeAction or of type CommunicateAction, or…

Sometimes we can run into blue-coloured properties, instead of red-coloured ones. These are properties, which haven’t yet been approved, so they have more chances of giving us problems in terms of getting Google to correctly interpret them.

Schema.org tells us the type or types of data each property can have, although it doesn’t specify certain things Google take into account, which are:

  • It doesn’t make it clear when we can repeat the same property several times. When this happens, each repetition can have values with instances of different classes.
  • It doesn’t specify when a property is mandatory or recommended by Google.
  • It doesn’t say when a composite data type can be replaced with a text primitive data type, and Google won’t have any trouble reading it. And yet, this can be almost always done, even though the definition doesn’t specify it.

It’s perfectly natural that schema.org doesn’t specify these aspects, as it’s a vocabulary defined for various search engines, and each can add its own restrictions on this model.

We can find some more information by checking Google’s documentation, that’s missing from schema.org, to make a correct implementation for Google.

Therefore, to know whether the structured data we are writing based on schema.org follows Google’s guidelines, we’re left no other choice but to test them after we’re done, using Google’s structured data testing tool. And, if what we want is to create structured data for rich snippets, we can also use the rich results testing tool.

Schema.org's data model is generic for several search engines. We have to validate our structured data with the testing tool of the robot to which they're directed to know if they're correct 👍Click To Tweet

Other search engines have their own tools too, like Yandex’s structured data validator, and Bing’s structured data validator.

Class inheritance relationships

There are properties, which are the same throughout several classes, and to avoid having to define the same property several times in each class, schema.org uses a concept called class inheritance. It consists in the possibility of a class being capable of inheriting from another, establishing a parent-child relationship. This means the child class inherits all properties of the parent class. A parent class can have several child classes sharing the same properties.

In schema.org, the prime example of a parent class is Thing, defining common properties like: name, description, URL and image. Given that all schema.org classes inherit from Thing, its properties can be defined in any object we instantiate. Example:

schema.org class inheritance
When we take a look at the definition of a class on schema.org, we can see the class it inherits at the top (in this example, Thing > Person). Then, it shows us the class’ own properties (Propierties from Person), and then, the inherited properties (Properties from Thing). Although there’s a possibility for the class to have no properties of its own. In this scenario, it would only be used to specify which type is being defined.

Similarly, we can run into classes inheriting from another, as well as a second or a third one, and so on. They end up inheriting properties from various classes, as shown in the example below:

Inheritance schema.org

If we scroll down to the definition of a class, we can also see the child classes of the current one. Check out the screenshot below for child classes of Product:

Child classes for Product

This link provides the class hierarchy in full. This resource is useful, as it helps us to decide which class of object is the most suitable to define the information appearing on a website.

Polymorphism

Structured data allows dynamic types of data or polymorphic objects (from Greek: can adopt many shapes). If objects are polymorphic, it means that a composite data type can adopt the shape of any of its child classes, or even of a parent class. For example, SocialMediaPosting class has the property sharedContent, which is a CreativeWork type. This one, in turn, inherits from Thing and has various child classes, for example, Book. Thus, we could assign without worries an object of type Thing or Book to the sharedContent property (even if its specification says it’s a CreativeWork), and it wouldn’t be incorrect.

This characteristic is seldom considered when modelling the data, despite allowing a wide range of possibilities. It allows us to define our data in more detail (using child classes), or with less detail (using parent classes), if we don’t know what they are in reality.

Enum types

Enum types are those admitting list values. For example, the bookFormat property of the Book class allows BookFormatType class values. If we go to see its definition, we can see that in reality it isn’t a class, but an enum, because it inherits from Enumeration and also because enums usually have at the end of their name the word Type. If we go to see its description, we can see that we can assign to it the following enum value types:

Enums schema.org

To assign an enum value to a property, we must enter the URL representing the value we want. For example: “bookFormat=http://schema.org/Paperback”.

There are enums not carrying the -Type tag. These ones take values defined in another vocabulary, different to schema.org, called Good Relations, for example BusinessFunction. They work in the same way as any other enumeration.

We can also make up a text value and assign it, but then the search engine won’t understand the meaning of said value.

Going from the conceptual model to code

Once we’ve chosen the class we’re going to use, the properties we are going to fill it with, and the types of data we’ve selected for each of the available properties, we have to convert this data model to one of structured data formats Google understands. Let’s see an example, assuming we have the following data model for a Book object, shared in a social media platform, and we’re also going to complete it with the book’s author:

https://schema.org/SocialMediaPosting  <= Social Media post type of data.
     sharedContent = http://schema.org/Book <= although sharedContent's type of data is CreativeWork, we'll take a deriving type using polymorphism.
     name = Marketing Online 2.0
     isbn = 978-8441532649
     bookFormat = http://schema.org/Paperback <= given that bookFormat uses a class, which in reality is an enum, we directly assign the corresponding value to it.
     publisher = Anaya Multimedia <= according to the specification, the publisher must belong to the Person or Organization class. Let's see what happens if we ignore the specs and directly assign a text to it.
     author = https://schema.org/Person <= here, the specification tells us that we can put an organisation or a person, so we're going to enter a person. If necessary, we can add various authors, and some of them can be people, and others - organisations.
          givenName = Fernando 
          familyName = Maciá
          jobTitle = CEO
          brand = https://schema.org/Organization
               name = Human Level

There are many ways to express the conceptual data model. In this particular case, I’ve chosen a simple textual representation, but I could also represent it in UML.

If Google has this information, it can answer –besides other things– to the question: Who is the author of the Online Marketing 2.0 book? But to do this, we need to convert it to code, in JSON-LD, Microdata or RDFa. First, we’re going to see how to do this with microdata, and then JSON-LD, which is the format suggested by Google as the best option.

To transform it to microdata, we simply have to think that each class instance is defined with “itemscope” and “itemtype=[URL of the class on schema.org], and the properties with “itemprop=[property name]”. When creating composition relationships, we’ll have to use “itemscope”, “itemtype” and “itemprop” at the same time, because we’re declaring the class type of a specific property we’re going to use. Below you’ll find the model from before written using microdata, so that you can compare them:

<div itemscope itemtype="https://schema.org/SocialMediaPosting">
     <div itemprop="sharedContent" itemscope itemtype="http://schema.org/Book">
          <p>Book: <span itemprop="name">Marketing Online 2.0</span></p>
          <p>ISBN: <span itemprop="isbn"> 978-8441532649</span></p>
          <p>
          Format: Paperback <link itemprop="bookFormat" href="http://schema.org/Paperback" />
          </p>
          <!-- when we have to define a property that isn't presented to a user in the same manner, and has a URl type of data, we must use the link tag with the href attribute; if it's a Text type of data, we will use the meta tag with the content attribute !--> 
          <p>Editorial: <span itemprop="publisher">Anaya Multimedia</span></p>
          <div itemprop="author" itemscope itemtype="https://schema.org/Person">
               <p><span itemprop="givenName">Fernando</span> 
               <span itemprop=" familyName">Maciá</span>
               </p>
               <p>
                    <span itemprop="jobTitle">CEO</span> in
                    <span itemprop="brand" itemscope itemtype="https://schema.org/Organization">
                         <span itemprop="name">Human Level</span>
                    </span>
               </p>
     </div>
     </div>
</div>

If we run this code in Google’s structured data testing tool, we won’t get any errors, only a couple of warnings with recommended properties, which means Google would interpret it correctly.

Structured data test in Google: results

Another thing to keep in mind in this example is the Publisher property, according to schema.org it should belong to the Organization or Person class. However, when we assign a simple value to it directly, the tool doesn’t return an error, and assigns the Thing class to it, using polymorphism without us realising it. So, Google doesn’t know whether it’s a person or an organisation, but it does know it’s a publisher. This is useful to know, whenever we aren’t sure what meaning we should give to some property.

With JSON-LD the code is simpler. Below is the same example, but written with JSON-LD:

<script type="application/ld+json">
{
     "@context":"https://schema.org/",
     "@type":"SocialMediaPosting",
     "sharedContent":{
          "@type":"Book",
          "name":"Marketing online 2.0",
          "isbn":"978-8441532649",
          "bookFormat": "Paperback",
          "publisher":"Anaya Multimedia", 
          "author":{ 
               "@type":"Person",
               "givenName":"Fernando",
               "familyName":"Maciá",
               "jobTitle":"CEO",
               "brand":{
                 "@type": "Organization",
                 "name": "Human Level"
               }
          }
     }
}
</script>

Here we define the namespace of the vocabulary at the beginning with “https://schema.org/”, to avoid repeating it again throughout the code. “http://schema.org/” would also be valid, even though the page is under HTTPS, since this isn’t a link, but a way of letting the search engine know that all the classes we are going to use belong to schema.org. All namespaces are used in RDF, to avoid having two classes with the same name belonging to different vocabularies get mixed up, but here, for the time being, it’ll be enough with just one namespace.

If we had several values in any of our properties, with microdata we would simply repeat it, but with JSON-LD we use brackets. Let’s imagine we want to define a book with various publishers and various authors. This is how it would look with microdata, and its equivalent version with JSON-LD:

<div itemscope itemtype="http://schema.org/Book">
     <p>Publisher: <span itemprop="publisher">publisher1</span>, 
     <span itemprop="publisher">publisher2</span></p>
     <div itemprop="author" itemscope itemtype="https://schema.org/Person" >
          <p><span itemprop="name"> author1</span></p>
     </div>
     <div itemprop="author" itemscope itemtype="https://schema.org/Person" >
          <p><span itemprop="name"> author2</span></p>
     </div>
</div>

<script type="application/ld+json">
{
     "@context":"https://schema.org/",
     "@type":"Book",
     "publisher": ["publisher1", "publisher2"],
     "author":[
     { 
          "@type":"Person",
          "name":"author1"
     },
     {
          "@type":"Person",
          "name":"author2"
     }]
}
</script>

Conclusions

Structured data (or semistructured data, as it should really be called) is difficult to model, but it would be even more difficult if we used other vocabularies, defined with RDF or OWL. I’m talking about semantic web vocabularies, like FOAF, SIOC, SKOS or Dublin Core, as those have more complicated relationships, than those we’ve seen in the schema.org vocabulary proposed by search engines. By correctly applying schema.org we’ll have it much easier, and with Google’s tools, we’ll ensure its bot will be able to interpret the meaning of our data correctly.

Additional references

Ramón Saquete
Autor: Ramón Saquete
Web developer at Human Level Communications online marketing agency. He's an expert in WPO, PHP development and MySQL databases.

Leave a comment

Your email address will not be published. Required fields are marked *