Fifteen key points to make search engines like me

Fernando Maciá

Written by Fernando Maciá

Any website can generate a certain traffic potential in the Internet search engines: Live Search, Google or Yahoo! This traffic potential is determined by the positioning that the different contents of the website can achieve for the searches related to them that users may make, on the one hand. But also by the total number of contents or pages of which the website is composed. It seems obvious that the more content there is, the higher the traffic potential should be. …or not?

Search engines and web directories are, today, the main source of traffic to a portal. A website well positioned in search terms popular and with a large amount of content has a much greater capacity to receive traffic from search engines than a website with little content, framed in topics that generate little interest among Internet users or poorly positioned for these searches.


In other articles on search engine optimizationWe have already mentioned that the positioning of a web site in search engines depends essentially on two aspects: the relevance on pageThis is the relevance of the contents of the page itself – essentially the texts, title and metatags of the page – on the one hand, and off-page relevance, or relevance in the form of links from other websites, characterized by the quantity and quality of these links. However, even before a search engine has the ability to calculate the relevance of a web page, there is a precondition that the web page must meet: it must be indexable. We call indexability of a web site to its ease of being found by search engines, of being correctly crawled in the totality of its contents and of being adequately identified the search categories in which it should be included as a result. From this point on, the greater or lesser relevance of the page, calculated according to multiple parameters, will influence the final position it will occupy in the results.

So, there are many things we can do to make our website more search engine friendly. On this occasion we are going to focus on fifteen points that we believe are key and that, if we make sure we comply with, will guarantee that our contents are in the best position to be correctly crawled by search engine robots. It is not a sufficient condition to achieve a good positioning, but it is, of course, necessary. Here we go.

1. Putting on the “searcher’s” glasses

We have to put on the searcher's glasses to identify what he likes and dislikesOne of the first steps to know what a search engine likes and dislikes is to see your website as it is seen by the robot that has to index it. There are severalspider simulators or robot simulators that you can use. They are online tools that present information that can be tracked and used to calculate relevance. You will see how images, animations, multimedia content, Flash, etc. disappear and only the text remains. Indeed, search engines mainly take into account the text content of the page to calculate relevance.

For this purpose, the “cache” view that you can see in the results of some search engines is also very useful: it is the copy of your page that they have saved on their servers. Sometimes, it is possible to isolate the available text by arriving at a view of your page similar to the one obtained with the spider simulator tools in the previous paragraph.

If in the cached version of your page or after using the spider simulator you find that no text is visible, you have a problem. Your website is probably made with Flash, or all the text is part of an image that integrates the design along with the text. In both cases the solution is to alter the original programming of your website or create an alternative HTML version that does contain text relevant to the search engine.

2. Each page needs a DNI

Unique URLs for each page On search engine results pages, each result is identified by a value that must be unique: it is the URL of the page. This is the same string of text, numbers and symbols that once in the address field of the browser will take you to that page. The value of this address is unique: it is like the page’s ID. It identifies that content and no other page on the Internet can have exactly the same.

If you browse your website and find that your browser URL does not change, you have a problem. Your website may have a lot of content, but search engines will not be able to file each page with a unique address. You can check if this is the case by asking the search engines which pages they know about your site by entering the command “site:www.sudominio.com” in the search field. When you click return, the search engines will return a list of indexed pages from your site. These are the pages likely to appear in search results. If you did not change the URL address when browsing your site, it is possible that there are too few pages in this listing. It may be because your website is programmed with Flash, AJAX or frames. In any of the three cases you will need to alter the programming to identify each distinct page with a different and unique URL. This is the only way to increase the chances that the different pages of your website will appear in the results for different searches.

3. The stones that help me cross the river: traceable links

Crawlable links are like stepping stones to help you cross the riverLinks are to search engine robots as stones in the middle of the river are to hikers. The latter needs them to get to the other side and the robots need the links to get to the next page. Any of the means used in point 1 will help us to see the crawlable links, those that the robots will follow to continue crawling contents. In the cache version we will see them as blue underlined text, while in the spider simulator they will occupy a specific section of the analysis.

If searching for “site:www.midominio.com” in point 2 found few pages of your site listed, it may also be because the links on your pages are not crawlable, so you should apply a robot simulator to your page to check this. If necessary, replace drop-down menus programmed with JavaScript or Flash with normal HTML links, or duplicate the most important links in a link line in the footer. This will ensure that robots can jump from one page of your site to another and index all of them.

4. Beware of pop-up content: pop-up windows

It is still very common in e-commerce: we browse the sections, go to the product family, consult a product card and, voila, it opens in a new, smaller window with no navigation controls. Product sheets are the most valuable information on any website. By opening it in a new window in this way we run the risk of the opening of the new window being intercepted by the blockers that exist in multiple browsers.

On the other hand, and more importantly, we also prevent robots from reaching these valuable pages, since the links that open these pages are usually not crawlable. These are links programmed with JavaScript that can pose problems for search engines. If this is your case, the solution is to integrate the product sheets into the overall design of your website so that they are simply another page, without the need to open them in a new window.

5. Fear of the depths: information architecture

Search engine robots normally consider that the home page of a website is the most important page of the site, and that the level of importance decreases as the distance in clicks from it increases. The indexing process thus starts with the pages that occupy the first levels and has a harder time reaching the pages that have few inbound links or are in the deeper levels of the navigation.

It is therefore important to design information architectures with few levels of depth, which evolve more horizontally than vertically. And set up alternative navigation paths for search engines that allow internal pages to pop up a few clicks away from the home page. We can do this with sections of related links, most searched for, featured products, etc.

6. Impacts users and leaves search engines indifferent: Flash, Silverlight…

Despite its long presence on the Web, Flash technology still presents many problems for search engines. In general, most of the contents made with these technologies called Rich Media are difficult for search engines to index and, depending on the way the web is programmed, it may mean that none of our contents will be crawled.

For the time being, there is no alternative but to program an alternative HTML version that contains sufficient indexable content and that, at the user level, can provide the links to Rich Media content, but that meets the requirements of the search engines for good indexing.

7. Frames, better for works of art

At a time when bandwidth was a scarce commodity, the use of frames was fully justified. Pages were divided between fixed elements such as navigation, header, footer, etc. and dynamic elements such as content area. The different sections were programmed in frames so that, once a particular website was loaded, only the part that varied – the page that occupied the content frame – had to “travel”. These types of pages are easy to identify because they contain vertical or horizontalscroll controls that do not occupy the entire browser window.

A website programmed with frames(frames or iframes) presents many indexability problems: search engines often cannot crawl the content of the frame. The URL address on such websites generally does not change. And, even when frame pages are indexed, there is a risk that the user, who clicks on them as a result, will find himself on an “orphaned” page that opens in his browser separated from its corresponding frame structure and, therefore, without navigation, header, footer, etc.

The increasing use of broadband makes the use of frames unjustifiable in most cases. Given the indexability problems they pose, it is recommended to transform a frame structure into individual pages that integrate all the elements.

8. Pages that play hide-and-seek: internal search engines

In many large portals, such as media or real estate portals, there is a much larger amount of content than can be linked to from the various home page menus or section headers. These portals resort to the use of internal search engines so that users can filter the contents and access the pages they are interested in. However, search engines cannot fill in search forms with different criteria to reach these contents.

This results in a very significant portion of this content not being indexed. The solution is to create content groupings that, through links, allow navigation to each of the contents under different criteria. Sometimes, this navigation structure will resemble a directory, in the case of a real estate portal, or a calendar, in the case of a media. In any case, the strategy to guarantee indexability involves the creation of alternative navigation routes by means of indexable links for search engines.

9. What are these weights about?

In the early years of search engines, it was recommended that pages should not be too heavy, i.e. their file size should not be too large, to ensure that search engines would index the entire content of the page. Nowadays, this recommendation makes less sense as the evolution of search engines allows us to circumvent this type of limitation.

However, it is still a good rule to keep the file size as small as possible, without junk code and as compliant with the standards defined by W3C as possible. This will ensure that search engines will crawl it correctly and will also have several very beneficial side effects. First of all, a very long page is very likely to be diffuse in its content: it will talk about several different things. This type of page ranks worse than pages that are clearly focused on one topic. On the other hand, by reducing the size of the file, we make it easier for users to navigate the website, which results in a more positive experience.

10. Let’s put things in order: domains, subdomains and subdirectories.

Okay, so your company is global, serving many markets and many languages. Your company is global, serves many markets and in many languages, how should you structure your website from an indexability point of view? Let’s see what some general recommendations might be:

Search engines reward websites from the same country, so if you are active in several different countries, it could be interesting to acquire the country extension domains of each market you are active in: mydomain.co.uk, mydomain.co.uk, mydomain.fr, etc.

If you do not target different countries, but you do have content in different languages, it may be appropriate to group them in subdomains, such as: english.yourdomain.com, francais.yourdomain.com, etc.

If the only thing you are concerned about is structuring the sections of your site well, then the obvious thing to do is to use the subdirectories: www.sudominio.com/seccion1, www.sudominio.com/seccion2, etc.

11. You have gone to the wrong window: redirects

Sometimes you will have acquired domains in other countries just to avoid problems with unfair competitors or with a view to possible future expansion. What is the best way to send the possible traffic that may be generated on these domains to your main domain, by having them all point to the same IP as the main domain? From the user’s point of view there may be no difference, but from the search engines’ perspective, it is better to program a permanent 301 redirect from each of these domains to the main domain. This permanent redirect message communicates in a language that search engines can understand that these domains have no content at the moment and that the main domain is the one to which the visit is redirected.

There are many HTTP header analyzers on the Web with which you can check how your domains respond. Your primary domain should respond with a 200 OK message, while your redirected domains should respond with a 301 message.

12. You can also learn from mistakes: an effective 404 page.

In a dynamic website, with frequent updating of multiple contents, it is common that, sooner or later, a link ends up pointing to a non-existent page. Even if your website has some kind of control to detect the existence of broken links, it is always possible that a link on another website or in the search engines points to a page that one day you thought you no longer needed. In these cases, the servers usually return a generic error message with the code 404, indicating that the page does not exist.

This generic message can be customized so that the server returns a correctly laid out page with the corporate design and also informs that the requested content no longer exists. However, there are powerful reasons for the user and for search engines why in addition to such an error message you should also add a small link directory with links pointing to the main content groups of the site. Your users will interpret this as, “Okay, the page you were looking for no longer exists, but this is what we have to offer you to keep you with us so you can continue your visit.” And search engine robots will have new “pebbles” to keep jumping to new content to index on your site. In both cases, your website will be a winner.

13. If you get lost, maps are best: the site map.

Although we usually read books in a sequential way, from beginning to end, there is no doubt that the table of contents plays a fundamental role when it comes to relocating certain contents later on. The table of contents is, on the one hand, a large outline that summarizes and clearly shows all the contents of the book and, on the other hand, a way of jumping to certain specific content through the page number. Similarly, the sitemap allows us to see on a single page the complete outline of the website we are on, and allows us, through its links, to quickly “jump” to certain content without having to use the navigation menu. The site map is therefore very useful for users.

But it is also very interesting from an indexability point of view. The navigation menus only allow for a few – usually less than ten – options in the main menu. From these few options, by means of submenus, drop-down menus, etc., we can access the following contents. This increases the distance in clicks of certain content with respect to the home page, which, as we have already seen, makes indexing more difficult. The sitemap allows you to display, on a single page, a much larger number of links that are just one click away from the home page. This allows a better circulation of the popularity juice from the home page to the internal pages and makes the search engine robot’s circulation through your website much easier.

14. Safe from prying robots: the robots.txt file

Everything we have said above is to ensure that search engines will be able to index all the contents of our website. But what can we do if we want precisely the opposite, that certain contents are not indexed? There is a special file type called robots.txt where we can easily specify which areas, subdirectories or files of our website should not be indexed by search engines.

It is important to properly program this file, especially in content management systems (CMS) that generate it in an automated way, since it may happen that areas that should be crawled are accidentally included as non-indexable.

15. Pass me a list: the sitemap file

Finally, we will name another special type of file, it is the sitemap file, which is usually a file with XML code invisible to users but that search engines will consult to discover all the pages of your website that you want to index. There are multiple tools on the Internet to easily generate the code for this file. Once generated and uploaded to the server, we can submit it to search engines through the Yahoo! or Google webmaster tools interface, or by entering a simple line “sitemap:http://www.midominio.com/misitemap.xml” in the robots.txt file, for Live Search.

For large portals, the use of a sitemap file may be the most effective strategy to achieve high indexing levels.

With everything in sight

The goal of indexability is to ensure that a website takes advantage of its full traffic generation potential. To do this, you must ensure that each and every one of your contents has had the opportunity to be indexed by search engines. This means that all the text has been crawled, that the search categories where it should appear have been correctly identified and that, as far as possible, its relevance is higher than that of counterpart content on other websites with which it will compete on a search engine’s results pages. Think of each page of your website as a hook waiting in the sea of search engines: if you only have one page indexed, you only have one hook. If you have a few pages in the indexes, it is as if you have several hooks waiting for prey. If all the pages of your website are indexed, your website will be like a trawl: you will be taking advantage of its full traffic generation potential. Apply these fifteen points and your website will surely look more like that potential customer trawl.

  •  | 
  • Published on
Fernando Maciá
Fernando Maciá
Founder and CEO of Human Level. Expert SEO consultant with more than 20 years of experience. He has been a professor at numerous universities and business schools, and director of the Master in Professional SEO and SEM and the Advanced SEO Course at KSchool. Author of a dozen books on SEO and digital marketing.

What do you think? Leave a comment

Just in case, your email will not be shown ;)

Related Posts

en