{"id":52381,"date":"2014-02-10T09:31:06","date_gmt":"2014-02-10T08:31:06","guid":{"rendered":"https:\/\/www.humanlevel.com\/sin-categorizar\/como-funciona-un-buscador-como-google-indexacion.html"},"modified":"2014-02-10T08:17:00","modified_gmt":"2014-02-10T07:17:00","slug":"how-does-a-search-engine-like-google-index","status":"publish","type":"post","link":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index","title":{"rendered":"How does a search engine like Google work? &#8211; Part I: Indexing"},"content":{"rendered":"<p><img loading=\"lazy\" class=\"alignright size-full wp-image-13300\" src=\"https:\/\/www.humanlevel.com\/wp-content\/uploads\/buscador.jpg\" alt=\"search engine\" width=\"200\" height=\"211\">Search engines or, as they are technically called, <a href=\"https:\/\/es.wikipedia.org\/wiki\/B%C3%BAsqueda_y_recuperaci%C3%B3n_de_informaci%C3%B3n\"><strong>Information Retrieval Systems<\/strong><\/a>, are one of the great achievements of computer science and, more specifically, of the field of <strong>Artificial Intelligence<\/strong>. Today, no one can doubt its importance as an engine of the world economy. For many, its inner workings are an almost magical mystery. In this series of articles I will unveil some of that magic.<\/p>\n<p>If you are a Web developer, after reading the series of articles I am going to develop, you will know how to implement a search engine in a Web, beyond the typical and inefficient search engine that only looks for matches in the database and does not provide <strong>results ordered by relevance<\/strong>. If you are an <a href=\"https:\/\/www.humanlevel.com\/en\/blog\/seo\/the-10-skills-you-need-to-be-an-seo-specialist\">SEO consultant<\/a>, you will have a better understanding of the inner workings of search engines.<\/p>\n<p>In order to reach as many readers as possible, I will try to express myself in the least technical way possible, however, and since this is a high-level subject, in some cases I will use mathematical formulas. In these cases, those who have not mastered them can jump directly to the explanation. However, those of you who are skilled in this area will have a much deeper understanding of what I am explaining, since a formula can be worth a thousand words.<\/p>\n<p><img loading=\"lazy\" class=\"size-full wp-image-13299 alignleft\" src=\"https:\/\/www.humanlevel.com\/wp-content\/uploads\/\" alt=\"artificial intelligence\" width=\"200\" height=\"194\">Let&#8217;s start by putting ourselves in the picture. Within Computer Science, the field of <strong>Artificial Intelligence<\/strong> is divided into many branches such as: Shape Recognition, Robotics, Computer Vision, Natural Language Processing, etc. Here we are going to focus on one of the applications of <strong>Natural Language Processing<\/strong>, the <strong>Information Retrieval<\/strong> Systems, which from now on I will call by its acronym <strong>IRS<\/strong>. This, as well as other applications of this branch, such as Machine Translation, Response Search Systems, Dialogue Systems or Information Extraction Systems, represent for the researchers who develop them, areas in which it is necessary to specialize, since they, in turn, are divided into many quite complex sub-problems. On the other hand, there is the area called <strong>Web Intelligence<\/strong>, which consists of applying Artificial Intelligence techniques to a Web.<\/p>\n<p>What is an IRS?<\/p>\n<blockquote><p>IRS are tools that, based on a query, are able to choose from a huge collection of documents, the ones that respond to that query and <strong>sort them by <a href=\"https:\/\/www.humanlevel.com\/en\/digital-marketing-dictionary\/content-relevance\">relevance<\/a><\/strong>.<\/p><\/blockquote>\n<p>If you look for scientific articles about <strong>Information Retrieval<\/strong> you will find much more information than what I am going to give here, but of course, usually with less didactic explanations. Google has several articles on IR, from its researchers, published in:<br \/>\n<a href=\"https:\/\/research.google\/pubs\/\">https:\/\/research.google.com\/pubs\/InformationRetrievalandtheWeb.html<\/a><\/p>\n<p>To be able to perform the colossal feat I have described, over millions of documents and in a few seconds, simplifying a lot and without taking into account many technical aspects, the following tasks are carried out:<\/p>\n<ol>\n<li><strong>Indexing<\/strong>:\n<ul>\n<li>The documents are first pre-processed to analyze them and apply transformations that later improve the search.<\/li>\n<li>The weights of the terms extracted in the previous step are then calculated and efficiently stored in an index.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Treatment of the consultation<\/strong>:\n<ul>\n<li>The query is analyzed to represent it internally in a way that allows it to be compared with the documents.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Comparison between consultation and document collection<\/strong>\n<ul>\n<li>A value is obtained that indicates the relevance of each document for the query and that will allow them to be sorted.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>In this article I will explain only the Indexing phase and in the following article the other two phases.<\/p>\n<h2>Indexing<\/h2>\n<h3>Document pre-processing<\/h3>\n<p>As I have already mentioned, the document must first be analyzed. To perform this task, the HTML tags are removed, extracting their meaning first, in order to know the importance that these tags contribute to the words they contain.<br \/>\nIn the process, the number of <strong>occurrences of each word<\/strong> is counted, since, based on this value, the importance of the word in the document will be calculated, since <strong>if it appears more times, it is more likely that the document is more relevant<\/strong> for that word.<br \/>\nAt the same time, the <strong>position of the word in the document<\/strong> is usually calculated, since if the words we are looking for are closer together in the document, it is more likely that it is a relevant document for the query. So having the position stored, we can calculate the distance between those query words in the document, to take it into account.<br \/>\nSome IRS <strong>instead of indexing words index their root<\/strong>, in order to consider derivations of the same word (e.g. verb tenses) as repetitions of the same term. The calculation of the root is usually done by some heuristic algorithm that tries to guess the correct root. The most widely used in English is <strong>Martin Porter&#8217;s Stemmer,<\/strong> which has its Spanish adaptation in the <a href=\"https:\/\/snowballstem.org\/demo.html\" target=\"_blank\" rel=\"noopener\">Snowball<\/a> software.<br \/>\nThe strategy of indexing the root of words has the problem that synonymous words are considered distinct words, so other IRSs <strong>group the frequency of occurrence of words by meaning<\/strong>. To do this, it is necessary to solve one of the most difficult and studied problems in natural language processing, the <strong>WSD (Word Sense Disambiguation)<\/strong>, which tries to disambiguate the meaning of words by their context.<br \/>\nAnother strategy is to <strong>index sets of two or more words instead of single<\/strong> words, to much better refine the user&#8217;s search intent. As this requires a much larger index size, only the most frequent word sets are taken into account in these cases.<br \/>\nWhat does Google do? No one knows, you may use all of the above strategies or you may use none of them and use n-grams instead. What we can tell from the search results is that it <strong>takes into account synonyms and the closeness of the query keywords<\/strong>, functionalities that can be achieved in many different ways.<br \/>\nThere is also one thing that, looking at the results, we can know that <strong>Google does not do<\/strong>, which is to <strong>eliminate <em>stop words<\/em><\/strong>. <em>Stop words<\/em> are words that have a very high frequency of occurrence in documents, such as articles and prepositions. Since these are words that do not provide relevant information about the document and significantly increase the size of the index, they are usually discarded in all IRSs. Google does not remove them because they <strong>can sometimes provide information to the<\/strong> query, for example &#8220;country&#8221; and &#8220;the country&#8221; are searches with different intent.<\/p>\n<h3>Calculation of the weight of words or terms<\/h3>\n<p>In IRS, the <strong>vector space model<\/strong> is normally used to represent documents and queries, because it is the one that obtains the best results. This means that we will represent documents and user queries numerically in an n-dimensional vector space, with as many dimensions as there are words. If for the moment you don&#8217;t understand the previous sentence, don&#8217;t worry, in the next post I promise to explain it in more detail. What matters now is that we are going to <strong>calculate the weight or importance that each word has when returning results<\/strong> sorted by relevance. <strong>Gerard Salton&#8217;s formula, called TFIDF (Term Frequency &#8211; Inverse Document Frequency)<\/strong>, is normally used for this task. This formula is calculated by multiplying the frequency of the term in the document by the inverse frequency of the term in the document collection, which I will explain below.<br \/>\nIt is logical to think that  <strong>if a word appears more times in a document, it will make that document more relevant<\/strong>  in a query containing that word, so the first element of the formula is the frequency of the term in the document, which we can use as we obtained it in the previous step or normalize it by dividing by the frequency of the term that appeared most often in the document. In this way the term frequencies of all documents will always be between 0 and 1. This will prevent long documents from taking precedence over short documents, although in practice long documents will always have slightly more weight.<br \/>\nFormula:<\/p>\n<figure id=\"attachment_13284\" aria-describedby=\"caption-attachment-13284\" style=\"width: 252px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-13284 \" src=\"https:\/\/www.humanlevel.com\/wp-content\/uploads\/\" alt=\"term frequency\" width=\"262\" height=\"48\"><figcaption id=\"caption-attachment-13284\" class=\"wp-caption-text\">f(t,d) is the frequency function of the term t in document d<br \/>In the denominator we have the maximum of all the frequencies of the w terms belonging to document d.<\/figcaption><\/figure>\n<p>For anyone interested in digging deeper, this is not one of the best ways to normalize for fairer results, there are much better non-linear normalizations, such as <strong>Amit Singhal&#8217;s pivoted normalization<\/strong>.<\/p>\n<p><strong>If the word appears in many documents in the collection, that word will be less determinant to return relevant results<\/strong> and if it appears in few documents, it will make those documents more relevant. This is the idea of the <strong>inverse frequency of the word in the document collection<\/strong> and is calculated by dividing the number of documents in the collection by the number of documents containing the term. A logarithm is applied to the result to make the high values not grow too much and thus normalize them a little.<\/p>\n<p>Formula:<\/p>\n<figure id=\"attachment_13283\" aria-describedby=\"caption-attachment-13283\" style=\"width: 259px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-13283\" src=\"https:\/\/www.humanlevel.com\/wp-content\/uploads\/\" alt=\"inverse-document-frequency\" width=\"269\" height=\"47\"><figcaption id=\"caption-attachment-13283\" class=\"wp-caption-text\">D is the collection of documents that is divided by all documents d that have t terms that belong to d. The horizontal bars are to indicate that the cardinality is calculated, not the norm of a vector. In practice, 1 is usually added to the denominator to avoid division by zero errors.<\/figcaption><\/figure>\n<p>Finally, we can calculate the <strong>TFIDF<\/strong> which will finally give us the weight of the word. This is the importance of the term in deciding the search results, which is given by the importance of the word t, in document d, by the importance of the word in the collection D:<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-13285\" src=\"https:\/\/www.humanlevel.com\/wp-content\/uploads\/\" alt=\"tfidf\" width=\"280\" height=\"21\"><\/p>\n<p>Now we have all the necessary information to elaborate <strong>the inverse index<\/strong>, which will be composed of a <strong>list of words and, for each one, we will have a list of identifiers of the documents where they appear together with their weights (TFIDF)<\/strong>.<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-13286\" src=\"https:\/\/www.humanlevel.com\/wp-content\/uploads\/indice.png\" alt=\"index\" width=\"559\" height=\"156\"><\/p>\n<p>In this way, when a word arrives, <strong>we will be able to quickly know which documents it is in<\/strong> and what weight the word has in them, so that we can elaborate a formula to order the documents by relevance. This formula and several other things, I will finish explaining them next month in my next post, I hope you do not fail to read it and comment on it.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Search engines or, as they are technically called, Information Retrieval Systems, are one of the great&#8230;<\/p>\n","protected":false},"author":14,"featured_media":44810,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[459],"tags":[511,527,357],"class_list":["post-52381","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","tag-geo","tag-google","tag-indexability"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How does a search engine like Google work? - Part I: Indexing<\/title>\n<meta name=\"description\" content=\"Discover how the algorithms used by search engines or Information Retrieval Systems work.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How does a search engine like Google work? - Part I: Indexing\" \/>\n<meta property=\"og:description\" content=\"Discover how the algorithms used by search engines or Information Retrieval Systems work.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index\" \/>\n<meta property=\"og:site_name\" content=\"Human Level\" \/>\n<meta property=\"article:published_time\" content=\"2014-02-10T08:31:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.humanlevel.com\/wp-content\/uploads\/google-buscador1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"600\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Ram\u00f3n Saquete\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@daiatron\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ram\u00f3n Saquete\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index\"},\"author\":{\"name\":\"Ram\u00f3n Saquete\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#\\\/schema\\\/person\\\/11ad888926867985985a0210476bae94\"},\"headline\":\"How does a search engine like Google work? &#8211; Part I: Indexing\",\"datePublished\":\"2014-02-10T08:31:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index\"},\"wordCount\":1726,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/google-buscador1.jpg\",\"keywords\":[\"GEO\",\"Google\",\"Indexability\"],\"articleSection\":[\"AI\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index\",\"url\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index\",\"name\":\"How does a search engine like Google work? - Part I: Indexing\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/google-buscador1.jpg\",\"datePublished\":\"2014-02-10T08:31:06+00:00\",\"description\":\"Discover how the algorithms used by search engines or Information Retrieval Systems work.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#primaryimage\",\"url\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/google-buscador1.jpg\",\"contentUrl\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/google-buscador1.jpg\",\"width\":600,\"height\":600},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\\\/how-does-a-search-engine-like-google-index#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Inicio\",\"item\":\"https:\\\/\\\/www.humanlevel.com\\\/en\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Blog\",\"item\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"AI\",\"item\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/blog\\\/artificial-intelligence\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"How does a search engine like Google work? &#8211; Part I: Indexing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#website\",\"url\":\"https:\\\/\\\/www.humanlevel.com\\\/en\",\"name\":\"Human Level\",\"description\":\"Web positioning and online marketing consultant Human Level\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.humanlevel.com\\\/en?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#organization\",\"name\":\"Human Level\",\"url\":\"https:\\\/\\\/www.humanlevel.com\\\/en\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/logohl25x3.png\",\"contentUrl\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/logohl25x3.png\",\"width\":600,\"height\":93,\"caption\":\"Human Level\"},\"image\":{\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/company\\\/human-level-communications\",\"https:\\\/\\\/www.youtube.com\\\/user\\\/humanlevelcommunica\",\"https:\\\/\\\/bsky.app\\\/profile\\\/humanlevel.bsky.social\",\"https:\\\/\\\/instagram.com\\\/humanlevel\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/en#\\\/schema\\\/person\\\/11ad888926867985985a0210476bae94\",\"name\":\"Ram\u00f3n Saquete\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/1x1-ramon-saquete-26-96x96.jpg\",\"url\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/1x1-ramon-saquete-26-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.humanlevel.com\\\/wp-content\\\/uploads\\\/1x1-ramon-saquete-26-96x96.jpg\",\"caption\":\"Ram\u00f3n Saquete\"},\"description\":\"Web Developer and Technical SEO Consultant at Human Level. He holds degrees in Computer Engineering and Technical Engineering in Computer Systems. He also earned a Higher Vocational Degree in Computer Applications Development and later obtained the Certificate of Pedagogical Aptitude (CAP). He is an expert in WPO and indexability.\",\"sameAs\":[\"https:\\\/\\\/es.linkedin.com\\\/in\\\/ramonsaquete\",\"https:\\\/\\\/x.com\\\/daiatron\"],\"url\":\"https:\\\/\\\/www.humanlevel.com\\\/en\\\/author\\\/ramon\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How does a search engine like Google work? - Part I: Indexing","description":"Discover how the algorithms used by search engines or Information Retrieval Systems work.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index","og_locale":"en_US","og_type":"article","og_title":"How does a search engine like Google work? - Part I: Indexing","og_description":"Discover how the algorithms used by search engines or Information Retrieval Systems work.","og_url":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index","og_site_name":"Human Level","article_published_time":"2014-02-10T08:31:06+00:00","og_image":[{"width":600,"height":600,"url":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/google-buscador1.jpg","type":"image\/jpeg"}],"author":"Ram\u00f3n Saquete","twitter_card":"summary_large_image","twitter_creator":"@daiatron","twitter_misc":{"Written by":"Ram\u00f3n Saquete","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#article","isPartOf":{"@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index"},"author":{"name":"Ram\u00f3n Saquete","@id":"https:\/\/www.humanlevel.com\/en#\/schema\/person\/11ad888926867985985a0210476bae94"},"headline":"How does a search engine like Google work? &#8211; Part I: Indexing","datePublished":"2014-02-10T08:31:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index"},"wordCount":1726,"commentCount":0,"publisher":{"@id":"https:\/\/www.humanlevel.com\/en#organization"},"image":{"@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#primaryimage"},"thumbnailUrl":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/google-buscador1.jpg","keywords":["GEO","Google","Indexability"],"articleSection":["AI"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index","url":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index","name":"How does a search engine like Google work? - Part I: Indexing","isPartOf":{"@id":"https:\/\/www.humanlevel.com\/en#website"},"primaryImageOfPage":{"@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#primaryimage"},"image":{"@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#primaryimage"},"thumbnailUrl":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/google-buscador1.jpg","datePublished":"2014-02-10T08:31:06+00:00","description":"Discover how the algorithms used by search engines or Information Retrieval Systems work.","breadcrumb":{"@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#primaryimage","url":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/google-buscador1.jpg","contentUrl":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/google-buscador1.jpg","width":600,"height":600},{"@type":"BreadcrumbList","@id":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence\/how-does-a-search-engine-like-google-index#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Inicio","item":"https:\/\/www.humanlevel.com\/en"},{"@type":"ListItem","position":2,"name":"Blog","item":"https:\/\/www.humanlevel.com\/en\/blog"},{"@type":"ListItem","position":3,"name":"AI","item":"https:\/\/www.humanlevel.com\/en\/blog\/artificial-intelligence"},{"@type":"ListItem","position":4,"name":"How does a search engine like Google work? &#8211; Part I: Indexing"}]},{"@type":"WebSite","@id":"https:\/\/www.humanlevel.com\/en#website","url":"https:\/\/www.humanlevel.com\/en","name":"Human Level","description":"Web positioning and online marketing consultant Human Level","publisher":{"@id":"https:\/\/www.humanlevel.com\/en#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.humanlevel.com\/en?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.humanlevel.com\/en#organization","name":"Human Level","url":"https:\/\/www.humanlevel.com\/en","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.humanlevel.com\/en#\/schema\/logo\/image\/","url":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/logohl25x3.png","contentUrl":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/logohl25x3.png","width":600,"height":93,"caption":"Human Level"},"image":{"@id":"https:\/\/www.humanlevel.com\/en#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.linkedin.com\/company\/human-level-communications","https:\/\/www.youtube.com\/user\/humanlevelcommunica","https:\/\/bsky.app\/profile\/humanlevel.bsky.social","https:\/\/instagram.com\/humanlevel"]},{"@type":"Person","@id":"https:\/\/www.humanlevel.com\/en#\/schema\/person\/11ad888926867985985a0210476bae94","name":"Ram\u00f3n Saquete","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/1x1-ramon-saquete-26-96x96.jpg","url":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/1x1-ramon-saquete-26-96x96.jpg","contentUrl":"https:\/\/www.humanlevel.com\/wp-content\/uploads\/1x1-ramon-saquete-26-96x96.jpg","caption":"Ram\u00f3n Saquete"},"description":"Web Developer and Technical SEO Consultant at Human Level. He holds degrees in Computer Engineering and Technical Engineering in Computer Systems. He also earned a Higher Vocational Degree in Computer Applications Development and later obtained the Certificate of Pedagogical Aptitude (CAP). He is an expert in WPO and indexability.","sameAs":["https:\/\/es.linkedin.com\/in\/ramonsaquete","https:\/\/x.com\/daiatron"],"url":"https:\/\/www.humanlevel.com\/en\/author\/ramon"}]}},"_links":{"self":[{"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/posts\/52381","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/comments?post=52381"}],"version-history":[{"count":4,"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/posts\/52381\/revisions"}],"predecessor-version":[{"id":52870,"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/posts\/52381\/revisions\/52870"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/media\/44810"}],"wp:attachment":[{"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/media?parent=52381"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/categories?post=52381"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.humanlevel.com\/en\/wp-json\/wp\/v2\/tags?post=52381"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}