Fine Tuning Your Enterprise Search - part 3

 by Martin Belam, 3 December 2004

Fine Tune Your Content

If you are using software to search over an intranet, it is important to make sure that in your content production processes you are maximising the amount of assistance you are giving the software. Many years ago when I took my first programming steps with my trusty old ZX Spectrum I learnt the mantra "Garbage In, Garbage Out" to describe the input and output processes of computers. Nowhere is this more apt than in the field of search results. The quality of your results will always be dependent on the quality of information in your index.

One easy improvement to the usefulness of the results for your business is ensuring that every HTML page or document to be indexed has a unique title - and a unique title that can be displayed easily by your search technology.

Before the BBC as a business devised and promoted a Metadata standard it was common for pages on the public facing bbc.co.uk site to use the HTML <TITLE> tag to convey a sense of crumbtrail navigation and hierarchical structure for a page. This was no bad idea, and worked very well with high-level index pages, leading to document titles like "BBC - Entertainment - Holby City". However, where it didn't work so well was when we ended up with long titles like:

"BBC - Leeds - Sport - Football - News - Leeds United - Headline of the story this page is about"

In this example it is the last unique element that is most useful to the user in conveying exactly what a page is about. However on a search results page it is this element which is most likely to be truncated for space reasons, and you can end up in a situation where all of the titles look the same. In some cases every page in a sub-section would have identical titles and descriptions, and we would end up displaying unhelpful results sets like this one from 2002.

Search results from BBC Wales illustrating the problem when all documents have the same titles and description

Not surprisingly, in user testing, when the BBC's audience where presented with results pages like these, often the assumption was that there was a problem with the search, and that the same ten pages were listed over and over again.

By putting content production processes in place within your business that ensure documents are properly ascribed titles and given descriptive metadata you can improve this situation. This is a technique that is not so much about fine tuning the search technology you are using, but about fine tuning the content you are making available to that technology to use.

Continue to find out how to fine tune dealing with enterprise specific jargon and acronyms.

Keep up to date on my new blog