The Future of the Semantic Web, Web 3.0 or Linked Data

The Semantic Web, also called Web 3.0 or Linked Data, is a technological revolution that is changing the way we understand the Web and forcing everything to evolve: search engines, browsers, development and SEO.

The Semantic Web is essentially based on a new type of link that serves to connect data or concepts, rather than documents, as is the case with current links. These new links also indicate what kind of relationship exists between the concepts, so that we can know what information we are going to find before navigating the link.

If we draw the concepts as nodes and the relationships as arcs between them we will have a data structure that computer engineers know quite well, the graph.

Tim Berners Leedirector of the W3C and inventor of the Web, intends that this graph can be consulted as if the Web were a global database, on which algorithms can be applied to acquire new knowledge and serve as a basis for creating artificial intelligences capable of understanding human language. This idea is still utopian. Linked Data technologies have many problems to solve before they can meet these and other objectives, such as making the technology easy to use for everyone.

However, despite these problems, the Semantic Web has been quite successful. In 2007, the W3C launched the Linked Open Data project, with the aim of disseminating the Semantic Web. On their web site they have been publishing every year a graph showing the links created between all the semantic Webs. If we look at the evolution, we can see that the growth rate has been exponential. In the center of the graph we have DBpedia, the semantic version of Wikipedia and very close Freebase which is currently owned by Google and has been the basis for its Knowledge graph. This “knowledge graph” is used to complement searches with semantic information.

The new Web 3.0 links are called RDF links. RDF (Resource Definition Framework) is the language in which these links are written and can be serialized or written in various formats(XML, N3 and Turtle) or embedded within HTML as RDFa (RDF with attributes). If this is not complex enough, there are also RDF detractors, such as Google, who prefer , and others who advocate the Hypernotation format. As I have already said, this is one of the big problems that the semantic Web currently has, there are too many technologies and they are too complex for the average user. For example, if an engineer tries to explain to a user that Microdata only allows the creation of tree-like structures and that this is a more restrictive type of network, the user will probably give up trying to understand the technology. Therefore, to create the new RDF links, users will need tools that abstract the complexity of this task.
However, for a better understanding, I will try to explain, with practical cases, how these RDF links are created. Let’s start with a simple example:
<rdf:RDF>

<rdf:Description rdf:about=”http://dbpedia.org/resource/Way_of_the_Dragon”>

<dbpprop:starring rdf:resource=”http://dbpedia.org/resource/Chuck_Norris”/>

</rdf:Description>

</rdf:RDF>

RDF links are triples obviously consisting of three elements: subject, predicate and object. In the example, the movie “Way of the Dragon” is the subject we are going to describe and appears in the “about” attribute of the rdf:Description element. The predicate is dbpprop:starring which indicates the meaning of the relation, i.e., that it acts in that movie and the object it relates appears in its rdf:resource attribute which is Chuck Norris. In this way we have two concepts semantically related and univocally identified by means of URIs.

It is important to keep in mind that URIs identify concepts. For example, the URI http://dbpedia.org/resource/Chuck_Norris identifies Chuck Norris himself, not the document containing information about him.

In addition, URIs must be dereferenceable, which means that if the URI refers to Chuck Norris, when dereferencing it by typing it in the browser, what it refers to should appear, i.e., Chuck himself should appear on our monitor with a spinning kick. Since that is not possible, a 303 redirect performs a content negotiation by redirecting us to the page http://dbpedia.org/page/Chuck_Norris which contains an HTML document with information about it. If we had accessed it with a mobile application or a semantic search engine spider had accessed it, the 303 redirect would have sent us to this other page http://dbpedia.org/data/Chuck_Norris containing an RDF document with a bunch of triples referring to Chuck Norris. Let’s take a look at an excerpt from the latter document:
<rdf:RDF> <rdf:Description rdf:about="http://dbpedia.org/resource/Chuck_Norris"> <rdf:type rdf:resource="http://dbpedia.org/ontology/Person"/> <rdf:type rdf:resource="http://umbel.org/umbel/rc/Actor"/> <owl:sameAs rdf:resource="http://es.dbpedia.org/resource/Chuck_Norris"/> <rdfs:comment xml:lang="en"> Carlos Ray "Chuck" Norris is an American martial artist and actor.</rdfs:comment> <foaf:homepage rdf:resource="http://www.chucknorris.com/"/> </rdf:RDF>
As we can see, when an rdf:Description block is opened to talk about a subject, we can include several predicates in it. In the example we have two rdf:type predicates indicating the data type. In this case, Chuck is a person and an actor, which, to humans may seem obvious but, to an artificial intelligence, is very helpful. The predicate owl:sameAs is used to indicate a different URI that identifies the same subject and may speak the same thing from another point of view or, as in this case, in another language. This is followed by the predicate rdfs:comment, which does not link to another concept, but relates directly to a text type value and therefore there is no link. Finally, we find the URL of your personal page.

As can be seen in the example, RDF links can be of many different types. They can be inbound, outbound, reciprocal, subject-defining, predicate-defining, or indicating that a subject, predicate or object is the same as the one defined in another URI.
The URI as an identifier of a real-world concept is one of the most difficult ideas of the Semantic Web to understand. In fact, many of the RDF links we can find do not apply it correctly. To dereference such a URI, there are two techniques: 303 redirects and fragments. If a URI appears with a fragment (the part with #), it is referring to the concept and if it does not, to the document. In the document Cool URIs for the Semantic Web, the W3C tries to explain these two techniques and the combination of both in a user-friendly way.
But what is really difficult to understand, to the point of requiring months or years of study, if no other data model has been studied before, are the vocabularies. These are created with the SKOS, RDFS and OWL meta languages used to define thesauri and ontologies. Vocabularies are a way to establish a global format, which is basic to make information interoperable. In this way no API is required to allow a Web site to share information that another Web site wants to incorporate into its site, but rather, by having a common, global and accessible formatThe information can be shared minimizing the development effort, since it is not necessary to implement and learn how to use the API. However, the complexity that vocabularies bring to the semantic web is enormous. If we want to publish an RDF link we must look for a vocabulary that fits the meaning we want to give it. If none of the currently defined vocabularies will work, we will have to invent our own, as, for example, those implemented by the BBC Music, BBC Programmes and the New York Times websites.

An important detail to mention, which we have not yet discussed, is the namespace declarations of the vocabularies. The namespace is what, in each label, appears before the colon, and serves to prevent the definitions of two vocabularies from overlapping. For example, og:image and twitter:image would be called the same if it were not for the namespace of the Open Graph and Twitter Cards vocabularies. Namespaces can be declared at the beginning of the document with the xmlns (XML name space) attribute, or if they are very extended, they can be omitted. Example:
<rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:foaf=”http://xmlns.com/foaf/0.1/”>
…
</rdf:RDF>

In the example we are declaring the namespaces rdf, rdfs and foaf.

One of the ways to publish an RDF file is by adding the following line to the header of an HTML document:
<link rel="alternate" type="application/rdf+xml" href="archivo.rdf" />
This HTML document will contain the same information as the RDF file.
Another option, as I have already mentioned, is to embed it within the same HTML in RDFa format.
In addition, using an RDF data store, we can offer a SPARQL endpoint. SPARQL (pronounced esparcle) is a query language similar to SQL that allows you to perform a wide variety of queries, such as: tell me all the movies of the actors who have played with Chuck Norris.
Currently, Google uses semantic technologies to display rich snippets and incorporate information into its Knowledge graph. On the other hand, the social networks Facebook and Twitter use them so that, when pages are published, they appear with personalized or additional information using the Open Graph and Twitter Cards vocabularies.

At the moment, semantic technologies are being used in a very limited way, but as they become established and impose themselves on each other, as tools appear that abstract users from the underlying complexity of the technology and as browsers and search engines evolve, we will see more and more original ways of exploiting the possibilities of the Web of data.