Taking the 'Ooh' out of Google: Getting site search right - Part 1
Over the course of the next couple of weeks I want to present an expanded version of the presentation I gave at the 2008 Euro IA Summit in Amsterdam at the end of September. There I was talking about "Taking the 'Ooh' out of Google" and getting site search right for news.
 
Over this series of posts I hope to show how you can use additional information about content that Google can't get access to to make your site search stand out, and to provide an excellent service for those that use it. I'll be illustrating my points with examples drawn from a wide range of European newspaper websites, drawn from all parts of our continent.
 
Mental 'Map' of the Internet
In the early 2000s, the BBC commissioned research to understand user's mental "map" of the Internet. The findings were that most people 'drew' the Internet as an inter-related set of sites, linked together via a central hub. Around the hub would be major Internet brands like Amazon, eBay, Napster and so forth. There would also be utilities like Hotmail and Internet Banking on the map, and personal things like the site of their favourite sports team.
And the hub at the centre?
 
Actually, strictly speaking, since this was in 2001, the central hub might also have been Yahoo!, MSN or AOL amongst others.
 
In the intervening years the power of Google has waxed whilst the Yahoo!'s of the world have waned, and by 2006 the average user's mental model of a 'map' of the Internet seemed to consist of just one page - the Google homepage.
 
By 2008, this homepage itself seems to have become defunct for a lot of users. Time and time again in user testing sessions I observe that where people used to type google.com into their browser bar, they now directly address Google through the search box in the top right-hand corner of Internet Explorer 7 or Firefox.
 
This dominance by Google presents news publishers with several challenges. They rely on Google to send them traffic, but they also have to allow Google to 'scrape' and index their valuable content. In some cases, as with Copiepress in Belgium, publishers have successfully resorted to the courts to try and define the limits of what Google can or cannot store and present to users.
Why have site search at all?
All of which begs the question, if Google is so ingrained into the search habits of the bulk of users, why have site search at all?
Back in 1997 and 2001 when Jakob Nielsen was issuing his definitive guidelines for having a search box on every page, using the browser search box to access Google was not a mainstream behaviour. Yet these days I often I see that if a user is on a page within foo.com, and you ask them to find all the information about 'bar' on foo.com, they go straight to the Google box in the top right-hand corner of their browser and type in something like "bar foo.com", rather than even looking for a site search input box.
A lot of companies and publishers spend a lot of money on having a search engine technology index their content, but is it still necessary? And if so, what makes it worthwhile?
Google's performance at indexing your content and presenting it to users has to be the benchmark against which you judge your own site search. If you can't produce something better, or something that offers alternative functionality, you might be better off setting up your own Google Custom Search Engine that just looks over your domain.
Seriously.
You'd save a massive amount on your IT infrastructure budget, and if you set up an AdSense account you can even claim a slice of the advertising revenue on your search engine.
 
Now, that may sound a little like simply 'giving up' and handing all of search over to Google - but when you frequently observe user behaviour around search, you realise you need to do a hard cost/benefit analysis of providing a site search service.
Next...
Having said all that, there are plenty of things that you can do to "take the 'Ooh!' out of Google", and tomorrow I'll be looking at what you can achieve by hooking your CMS up to a search engine, that Google can't achieve by scraping HTML.

Looking forward to this series. Site search is phenomenally difficult to get right, which is why we all end up using Google.
I've been struggling to think of websites that I DO use the site search on. The few examples I can think of are either sites with a fast turnover of content (Twitter, BBC iPlayer), where Google just doesn't index them fast enough, or e-commerce sites (Amazon, LoveFilm, Argos), where I want to be able to sort and order the results by price & availability.
You mention that you should be able to use 'additional information' in your site search, that either isn't available to Google or that Google can't parse and extract from the page. This is a valid point, but it's also worth mentioning that Google has access to some information that you don't have: the behaviour of its millions of users. For instance, Google may be able to tell which are the popular pages people are searching for better than you can, as they can see which results people are clicking on, and how people are refining their searches. It's tricky to gather this information yourself, as it takes a fair amount of technology and also a pretty large userbase.
Google handling 70% of search queries and all web based startups depending on google for site traffic is not a good sign. There has to be an alternative
I downloaded the PDF for the series and started reading but want to ask if that written in 2008 is different than what would be done in 2011? If it is, could you direct me to where I would learn the same information but what would be given today. Or if it's all the same as today I will go back and read it. Thank you.
Most of it should still hold true today Jennifer.