Metadata Schmetadata, Relevance and Reality
These are our contentions:
- Metadata matters.
- Metadata adds worth to data.
- Documents are data.
- Keywords are the essential data in documents.
- Keywords in context create knowledge.
- Documents have worth because they contain knowledge.
- Enterprise search finds keywords.
- Findable keywords yield documents.
- Findable keywords in context yield documents with knowledge.
- Knowledge in documents has worth.
- Metadata is not essential for enterprise search.
- We don’t need metadata.
What’s our point? Before answering that question, we invite understanding of the context: This project is about implementation of enterprise search within a large but not humongous non-profit organization. We’re talking about 170 paid employees, with easily an equal number of volunteers of one kind or another. So let’s say for purposes of context that we have 350+ real people using our networked infrastructure. We have two — count ‘em, two — IT guys. We’re not talking Fortune 500 here. We’re not even talking Fortune 500,000. That’s our world.
Working on this project, we have evaluated what we need from metadata as part of enterprise search implementation. Our conclusion? We don’t need metadata.
Or better said, we don’t need to add metadata for a Google Search Appliance (GSA) to accomplish what we want to accomplish with enterprise search. We could use metadata more — and there are several very impressive features in a GSA that can exploit external metadata and metadata biasing of search results — assuming the organization has the resources to organize and manage metadata. But as a practical matter, do we have the resources to go down that path and, ultimately, do we need it? No.
In fact, as part of this project, we have put a metadata model in place, a simple “labeling” or tagging system. It exploits our Sharepoint server installation with a practical (if kludgy) way to add metadata to files saved to a shared document repository. For example, when saving a file in a directory in our structural taxonomy, as the user navigates — say, to the Income Maintenance folder…

…a dialog box pops up with a prompt to add one or more optional “LSNC labels” to the file, associating the file with additional folders or categories in our taxonomy:

In the above example, an Excel spreadsheet with unemployment data is being saved to the “Unemployment Insurance” folder, a subfolder under the “Income Maintenance” top-level directory, but is also marked or tagged as “Data-Statistics-GIS” and “Employment.” Even then, this kludge only works with Microsoft applications, which is to say Sharepoint doesn’t work as cooperatively with other applications we rely on, like WordPerfect, Adobe Acrobat and others.
Regardless, is the addition of metadata to documents a good thing? Obviously, yes. Metadata matters. (Taxonomy matters, too… yet to what purpose?) Do you need to add metadata to documents for effective enterprise search, and specifically with a Google Search Appliance? Not really, not for what we are doing. Why not? Because improvements in search algorithms are such that metadata is not needed to help the search.
The poster child for these gains in enterprise search algorithms is, not surprisingly, Google whose GSA has matured considerably. Google is a verb. Microsoft (or Sharepoint) are not. A principal reason for that is Google years ago broke out early from the search-engine pack and raised the bar in terms of quality of search results. Google became what the average person now expects from search. That is why it is a verb. It is what most people do. They Google. Another reason is that Google simplifies search.
In the context of our project, at the scale and with the resources available to even a fairly large non-profit, what is practical or impractical in using metadata? And even if used, does it affect the quality of enterprise search results enough to warrant those additional costs in time and money?
So far, we don’t see it.

March 27th, 2009 at 4:36 am
An interesting article. But perhaps I missed an explanation of how you performed your evaluation. Did you assign tasks to your users and compare their effectiveness on the two systems? Did you ask them to express their subjective satisfaction with the system? Did you have some productivity measure external to the system, such as efficiency at completing projects?
It may be that a simple out-of-the box ranked search approach, with no annotation, manual or automatic, of your documents, is exactly what your organization need. But it’s very hard to generalize from your experience without understanding better what exactly you where evaluating.
March 27th, 2009 at 9:15 am
Daniel, your observations are spot-on about the evaluation of user need and experience. Our evaluations of whether our users can find what they are looking for are, admittedly, limited to basic user-experience surveys, as well as a considerable amount of first-hand observations of users conducting actual searches. But what we have done is not as thorough or precise as the basic frameworks you suggest. We readily agree that one cannot generalize in any definitive fashion about metadata from our experience. We are not inviting others to do so. We are, however, reporting on our experience and what we do think legal services and other non-profits can and should question, including whether the initial cost and ongoing investment in creating and managing metadata is a practical way to go.
What we are discovering is that our users are finding what they are looking for without the apparent need to add metadata to our targets, most of which do not have added metadata or we could not add it if we wanted. (For example, our targeted Google Sites content.) What is left unstated in the post are other things we are exploiting in the Google Search Appliance, including an array of filters and collections for narrowing search results in a way very easily understood by our users, selective use of Keymatch, and a set of OneBox modules that are very effective in helping our users find some of the most common things they are searching for. In the context of our project, those options strike us as more efficient and less costly ways for us to “help” the search.
March 27th, 2009 at 8:45 pm
Brian, thanks for responding. I hope you don’t mind that I’ve been mirroring this discussion at my blog, The Noisy Channel:
http://thenoisychannel.com/2009/03/27/does-metadata-matter/
The discussion there is unmoderated, and I encourage you (or readers here) to chime in.