Search Engines: Who, What, When, How and Why?

The exterior glass wall of Google's HQ in California, featuring the Google logo
Google HQ Image: Ben Nuttall, some rights reserved

For the majority of internet users in 2018, search engines are so deeply ingrained into their internet experience that they cannot imagine navigating the web without them. But how did they come about? Who benefits from their popularity? And what problems do they pose for the average consumer?

What Are Search Engines?

When discussing search engines, the average internet user will likely think first of the industry giants: Google, the now-fading Yahoo, Bing if they’re a Microsoft loyalist, or Baidu if they often search in Chinese. But search engines also include much more specific services. While a search engine was originally “a system that indexes webpages”,[1] they now include search tools of all kinds of media, including:

With all those services falling under the umbrella, it becomes clear how much of a presence search engines have across the internet. Today, search engine compatibility is built directly into web browsers – indeed, Google Chrome, the world’s most popular internet browser, was designed by search engine giant Google specifically to optimise integration with Google’s other services. Type directly into the search bar to search Google, or right click highlighted text to run it as a search, and if you come across a page in another language, Chrome will detect that and offer to translate it to your native language via Google Translate. Not only do we rely on search engines to find websites, but we also often rely on them to navigate through those sites.

The History of Search Engines

Prior to the widespread employment of search engines, internet users relied heavily on user-collated lists of sites, and direct links from one site to another.[2] Naturally, it was much harder to discover new content in this way, with users often relying on familiar sites they were experienced with, and newer sites and creators struggling unless they managed to gain the attention of one of these collation sites.

A screenshot of a text-only page in Netscape browser, with an alphabetical list of subjects each containing a hyperlink
The “Yanoff List”, a hand-curated list of websites, shortly before its abandonment in 1995 Image: Screenshot by Richard T. Griffiths

The first search tool on the internet was “Archie“,[3] released in 1990 and still live today, with much more than just its name coming from archival tradition. Archie indexed neither websites nor their content – instead, it was used to retrieve the titles of files listed for sharing on FTP (File Transfer Protocol) sites.[11] It isn’t surprising that this first attempt at a search engine would be based around file collation, as one of the driving forces of not only the internet, but computer science in general, was the need for informational organisation at institutions from libraries to governments to private archives.[5]

Archie, existing online but before the World Wide Web or graphical browsers, stands as the middle step between traditional document storage, and internet browsing. Matthew Grey of MIT invented the World Wide Web Wanderer – the first bot built for the purpose of indexing web pages – in 1993, and used it to create his engine, Wandex.[6] That same year, Jump Start (with its own crawling bot) was born, becoming the first ever search engine to combine the three key components we associate with the software today: crawling, indexing and searching. [7] 

What Google brought to the table at its creation in 1996 was the revolutionary “PageRank” system – while other engines of the time sorted their results by number of times the keywords appeared on the site, or how prominently the keywords were featured, Google was the first search engine to attempt to evaluate their results and list them by “quality”. [8]

Google's original logo - featuring a faux 3D shadow effect - above two different search bars, one saying
Homepage of in 1997, Screenshot obtained through the Wayback Machine

PageRank calculated how many times a webpage had been linked to by other pages, and listed their search results in order of the most-linked – the logic being that a site only becomes frequently shared if those using it find it to be worth sharing.[9] This extra dimension proved to be wildly popular with web-users, and is still in use today (alongside further search-listing factors which Google keeps a secret)

How Search Engines Work Today

With human moderators having failed to keep up with the tide of what David Shenk called the “information glut”,[10] nowadays search engines rest heavily on the labour of bots. While thankfully even the very first “crawlers” or “spiders” in Google’s prototype could keep over 300 connections open at once and index 100 pages a second[4], making life much easier for the searching humans on the other end of the service, this reliance on automated software has opened its own set of problems. Google openly admits that it only very rarely manually edits their bot-sourced results, and then only in the cases of known breaches of the law and cyber-security. But if a site is not reported, or a once safe and popular site becomes infected, it will remain accessible (and possibly highly ranked) through Google until their webmaster is alerted to the issue.

On a more granular note, Google’s crawlers will freely index sites that list sensitive personal data, such as by listing people-search sites like Pipl (which lists over 3 billion individuals). Google’s official Removal Policies state that they may remove credit card information and non-consensually shared pornography, but they usually will not remove details like phone numbers or home addresses. This leaves the listed individuals – sometimes unknowingly – vulnerable to everything from stalking to advertisement bombardment campaigns to threats of violence. And that’s just from the information Google allows, to say nothing of Pipl, who proudly advertise their ability to build a full network of information around a person from a single piece of data.

Who Benefits?

This essay alone is testament to how much Google has altered the way the average consumer uses the internet. While some of the references listed below came from user-collated lists (like the ARIN2610 class readings), the vast majority came from searches through Google, Google Scholar and the University of Sydney Library’s OPAC system. But while the average internet user has definitely benefited from an increase in ease of use through the advent of search engines, one should also be conscious of who benefits financially from them, and what price consumers pay for this increase in convenience.

Throughout this discussion, the prevalence of Google has become quite apparent. It is not only enormously popular in its original search engine, but also has found a strong role in the market for niche engines, such as the aforementioned Google Images, and even more granular services like Google Books, as well as effectively renting their search capabilities out to private websites, allowing site builders to use Google’s programming as a site-internal search, with BuiltWith counting over 700,000 websites paying for this functionality. Not only are their search engines dominating the industry, Google is well-known enough to have their brand used as a verb, and often so enough to be added to the Oxford English Dictionary in 2006.

But how does Google make money?

Google, LLC may feature a hardware store, but the majority of their profits are made off of an advertising business model. In the second quarter of 2018, Google’s parent company – Alphabet – made nearly $30 billion off Google advertising alone.

A chart breaking down Alphabet's quarterly revenue into three categories: Google advertising; Google, cloud, apps, hardware etc.; and "Other bets"
Alphabet’s (Google’s parent company) quarterly revenue breakdown Image: Ashley Rodriguez, chart available for public reuse

Google advertising goes beyond selling ad space on their results pages – which, while marked with an “Ad” label, are designed to mimic the look and format of the engine-sourced results, in order to draw on consumers’ associations of high-ranked Google results with quality. Google also has the Google Display Network, which allows users to place ads in text, image  and video formats on their choice of over 2 million websites. And for all of these advertising services, Google allows advertisers a high level of customisation.

Advertisers can choose to:

This is a very granular level of control Google offers its customers, but the average consumer may be wondering how Google gets this information, how it could tell who to show ads to when a customer says they want to target “amateur bakers”. The answer lies in Google’s original and most-used service: its search engine.

Using “cookies” (data exchanged between a user’s browser and a site’s server every time the user accesses the site), as well as Google’s user accounts necessary for services like Gmail and Youtube, Google keeps track not only of everything a user has searched (through Google homepage and the in-site searches run by Google mentioned above), but also the URLs they have accessed through Google Chrome, the places they have visited (if they have location services turned on) and the apps that they have on their Google devices.[12]

This is what it costs the average user to take advantage of Google’s “free” public services. Their data: on everything from their movements to their age to their favourite Mexican restaurant.

Richard Serra may have been discussing advertising on television back in 1979 when he said “You are delivered to the advertiser, who is the customer. He consumes you.”[13] Nonetheless, the principles of advertising remain just the same. The average users of Google’s search engine are not, as they may imagine, its customers. Google’s customers are those purchasing its advertising space – the average search engine user is merely the product.




