To put it simply, Search Engines are software programs that display a list of web pages to provide people with the information that they are looking for. They match up keywords from the user’s search query to content from across the World Wide Web and display results in order of relevance, quality and authority.
A short but sweet definition of ‘Crawling’ in the context of search engines is to discover content. These crawlers have the responsibility to find information on the internet. So basically, they scan sites and gather details about each individual page. This could be features like images, titles and other linked pages. However, different crawlers might look for different details such as; adverts and page layouts.
Crawlers are often referred to as search engine ‘spiders’. These spiders scan through thousands of pages a second on the internet. Each time that the crawler visits a page, it adds the URL to an index. They use page links to find where to go next and repeat the process of copying, indexing and following a link to another page. This is a continuous process that builds up a giant index crammed full of web pages.
It is worth noting however, you can prevent certain pages on your website (or your entire website if you really wanted to) from being crawled. Therefore, these pages won’t be included in the search engines index.
The difference between crawling and indexing is that crawling involves the process of collecting all of the information from every accessible web page, whereas indexing is the process of logging all of that information into databases on servers.
So, basically, this is the part where the search engine stores all of the information that it collects from your website. All of the servers that Google use to store indexed web pages are kept and maintained in one of their 15 Data Centres across the world.
Google is thought to have over 2.5 million servers in total across their data centres. It is these server-stuffed strongholds that process every search you perform on Google. At the time of writing this blog, these data centres were processing approximately 65,000 searches per second. Find out how many searches Google is processing right now if you want to see some jaw-dropping numbers.
Here is what one of their centres looks like:
When you put the crawling and indexing processes into perspective, it is pretty impressive! Just think of it like this;
You consider yourself a bit of a bookworm, you may have a hundred or so books. Now, one day, your friend Dave tasks you with logging all of the authors, genres, publishers, and content for every one of these books. Obviously, in the real world, you would say “Dave? Are you okay? Why on earth would I do that?“. But luckily, we’re talking about a virtual world here. So instead, you say, “That’s a great idea, Dave, I’ll get started right away“. But then he tells you that once you have finished logging all of your own books, you then need to log every single book in the world. This will be roughly 135 million books. Are you imagining how big that job would be?
Good, because now imagine doing that job 15 times over (as there is now thought to be well over 2 billion websites). Then and only then would you have crawled every website in the world.
Retrieval & Ranking
You may not have noticed, especially if you prefer to use one over the other, but search results can vary between Google and other search engines such as Bing. This is because of the unique algorithm that search engine will use to perform its Retrieval and Ranking functions.
The retrieval process begins as soon as you click on that button to perform a search. In a matter of milliseconds, the search engine sifts through all of the information it has indexed and establishes which web pages are relevant to your search. It will do this by identifying what keywords you have used in your search query and compile a list of web pages that also include those keywords.
Ranking is where the search engine’s algorithm comes in. This will determine the position of every single web page that was deemed relevant on the Search Engines Results Page (SERP). Since their Hummingbird update back in 2013, Google’s algorithm is now so sophisticated that it is able to identify the actual intent of a user’s search. Consequently, Google is able to rank sites more efficiently.
There are A LOT of factors that will be considered during the ranking process. But, as I don’t want to go off on a tangent I have listed just a few below to give you an idea:
- Keyword appearing in your domain;
- Keyword density;
- Keywords in your H1 and H2 tags;
- Page loading speed;
- Recency of content updates;
- Inbound and Outbound links;
There are literally hundreds. So, like Farmer Hoggett said to that cute little pink fella from the film that everyone loves; Babe – “That’ll do pig. That’ll do.”
Just to clarify, that’s not us calling you a pig. That would be really unnecessary and completely inappropriate. Anyhow, to recap that whole process…
Hopefully, this should have given you some sort of guidance has to how search engines work, step by step. Now that you know how they work, you can start competing with Google in no time! Or not, as the case may be.