The Findability Project Taxonomy – Part One: The Theory
First, a recommendation. Get your hands on a copy of Information Architecture for the World Wide Web (also linked on the right, under “Biblio”) and read chapter 5 about “Organization Systems.”
Why? Well, let me put it to you this way.
We did a lot of homework and scoured a lot of books and, of course, talked to our GSA consultant on what is popularly (if imprecisely) referred to as “taxonomy.” You know, how should we organize all the “stuff” we want our users to be able to find? How hard is that?
As we canvassed widely to get an answer to that basic, practical question, we discovered you can get totally befuddled and sidetracked, not only by any number of levels of abstraction, for example, should you choose to wallow in construction of controlled vocabularies; but also by all too “inside-baseball” discussions by the taxonomy community; or, by yielding to the dark side and joining a formal organization for this sort of thing. Of course, there is also the emerging school of “social organization” of content referred to as folksonomy, more popularly known as tagging. And then there is the school of thought within some sectors of the search community that, after all is said and done, taxonomy may not be particularly useful for enterprise search design.
Needless to say, these initial forays into this subject prompted the thought bubble … “Just shoot me now.”
On this point, the GSA consultant was not as directly helpful as I thought he would be. The short story is that he was supportive of what we thought we needed, but at the end of the day he was essentially agnostic on this point, a view that mirrors Google’s online GSA resources. In discussing how to plan for a GSA implementation, Google says not much more on this point other than “analyze your business’s content and decide which directories and files you want indexed.” (In fairness to our GSA consultant — whose name, by the way, is Igor — you should be sure to read below, for his helpful guidance on simplifying the taxonomy we adopted, and the reasons for doing so.)
Which begs the question, how should we do that?
There are online articles that are straightforward and helpful in grasping, at a rudimentary level, the basics of information architecture, one recent example being Better Living Through Taxonomies, at Digital Web Magazine. But based on our experience, I recommend you pass Go and head straight for Peter Morville and Louis Rosenfeld’s Information Architecture for the World Wide Web, a book that is part of the IA canon, and deservedly so. It is a superbly clear-headed, well written overview of what information architecture is all about, and Chapter 5 on organization systems, specifically, is a model of how to explain a technical and complex subject like “taxonomy,” among other things, in plain, accessible language. And it will hit the mark on the main issues you need to think through to get “stuff” organized.
What are those practical issues? Indulge me a bit, since several of my observations here simply echo what I am recommending you read, but for LSNC we distilled our theoretical approach to taxonomy or organizing our content to these four basic precepts:
1. The directory structures need to be a hierarchical or “top-down” organization of simplified, familiar categories.
In the broadest sense of “organizing” things on a file server, and how that same “organization” is reflected in page menus or page navigation or dialog boxes, users need to know where they are and what the folders or subfolders mean. Lawyers, by training and practice, work in an especially pronounced hierarchical environment. (Can you say, “I, II-A, etc.”) While the work environments of legal services programs are famously “anti-hierarchal,” the practical truth is that almost everyone in that environment organizes their work in some hierarchical fashion. (Certainly, there are exceptions.) Simply put, this is the most common way in which most people organize things, lawyers and non-lawyers alike.
2. Names for content folders, subfolders or categories need to be consistent with the shared vocabulary of your organization.
This may seem self-evident, but in practice may not be what users in your program do or are accustomed to. I actually took the time to look at the folder organization of about a dozen advocates in our Sacramento Office, and while there were predictable folder organizations (for example, organizing files by case or project or substantive area), much of the naming was ambiguous. While no doubt obvious to the advocate who created the directory or subdirectories, to others the same structure or organization may be too subjective, ambiguous or confusing to be useful to anyone other than the person who created it — and even possibly for him or her at some later time, when the subjective rationale for the organization has been long forgotten. So, when working out the naming conventions for folders and subfolders, it was important to focus on commonly understood, familiar shared vocabulary or terminology.
From the perspective of the GSA, the particular names, as such, of directory folders or subfolders is of no consequence. The GSA does not care what you call things, which explains the agnosticism of Google and our GSA consultant on this point. At the blunt-instrument level, all it cares about is the URL, the path to where the content resides. You deal with the Tower of Babel; that’s your problem. The GSA will ferret out the content wherever it resides, regardless.
To be detailed in the next post on this subject, LSNC has adopted the most conventional names for its directories it could come up with, including … I pause, for the pain it causes me to say this … the LSC substantive problem code categories, which comprise roughly half of the directories on our shared document repository. If one were organizing legal services practice today, I am confident it would be organized differently than how LSC organizes it. But roughly 40 years in, LSC still uses an extraordinarily unsubtle and somewhat uninformed organization of legal services practice. But it is what it is, and it is what field programs must use, and it is what users within those organizations know and understand, after decades of use. For better or worse, it is the “shared vocabulary” of our organization, and its use offers consistency with how other information and data is handled, most notably client case data.
3. “Lean toward a broad-and-shallow rather than narrow-and-deep hierarchy.”
That’s a quote from Morville’s book. And his observation is consistent with the advice our GSA consultant gave us. The consultant’s advice was not to go more than two levels down, and really pushed for only one level down. The rationale was two-fold: The more subfolders you have, the less likely users will locate or use content in those folders whenever they are navigating the directory structure, in whatever form it is viewed. From the user side, a deeper vertical hierarchy actually reduces findability.
From the GSA side, deeper hierarchy does little or nothing to improve search results. While the search algorithms baked into the GSA exploit the URL path at the directory and subdirectory and sub-sub-directory to improve search results, having third or fourth or more levels does essentially nothing to improve those results. There’s no harm to doing so. It just doesn’t help you.
A counterpart to this issue is the importance of striking a balance. By going broad-and-shallow, one gets the practical advantage of being able to add content without the need for major restructuring. Assuming you have figured out a set of top-level directories that pretty much covers, in a broad sense, the content your users will want and need to search for, from there on out you can focus on adding content below that level, as warranted.
But if you go too broad, from the user side, things get more cumbersome and impractical. Think about it. Whether your users are advocates or office managers or volunteers, whatever, it is going to be more practical and useful if they can visually and cognitively grok the organization scheme. So it needs to be broad enough to cover the bases, but not so broad that it becomes incomprehensible.
Sure, we could have gone totally nuts with the taxonomy and, say, adopted the thousands-of-points-of-substantive-light offered by the well intentioned but ill fated National Subject Matter Index. (Don’t get me started.) We’re more practical. As detailed in the next article, LSNC is going with a simplified 29 top-level directory structure, and each only going one-level deeper. Works for the users. And works for the GSA.
4. It’s not all about taxonomy.
Having a basic, practical, commonly shared taxonomy or organization structure is essential to a project like this. LSNC content needs to be located somewhere to be targeted by the GSA, and those who add or contribute or remove that content need to be able to comprehend what is where. The practical side of what that all means will make more sense in later articles about the document protocols we have come up for LSNC users to locate and add content and how to add metadata to that content.
But having a traditional taxonomy is not the whole picture. There are other types of content you may want to target that don’t fit the taxonomic model: targeted database content (case management systems come to mind, but are not the only example); external site content (such as select public website content to which your organization has access or permission); and alternate content sites that you would want to target but over which you don’t have the same level of control (a current example would be domain-hosted Google Sites, a subset of Google Apps, which you can “organize” in a superficial way but which at the level that matters to the Google Search Appliance, not so much).
What this means for LSNC is that we are targeting the GSA at more than just a nominal taxonomy on our shared document repository.

October 7th, 2008 at 2:05 am
I like the very pragmatic approach you have taken here. Did you come across my book, Organising Knowledge: Taxonomies, Knowledge and Organisation Effectiveness when you were doing your research, or was it one of those resources that seemed too technical for you? I’d be interested to know.
October 8th, 2008 at 11:49 am
Patrick, I can’t say your title showed up on our research radar as we sampled what was out there about taxonomies, so I can’t say whether it is too or not too technical for folks like us. From the description at Amazon.com, your title looks very interesting but from what I can tell, it is in the “expensive” text book category? (I’m not saying it is not worth it at about $70, but Yipes!) I appreciate the appreciation of how we, as a non-profit, have had to approach taxonomy and other types of content organization in a very practical way. Fortunately, the nomenclature used within legal services programs like ours is fairly predictable, with easily identified categories, and not likely to be a source of contention. So far it has not been. The effective use of metadata, however, is much more of a challenge, given what we are doing with this project and our limited resources.
By the way, like your blog. Readable and learnable.