Posts tagged: findability

  • Friday
  • April 2
  • 2010

Pika and the Google Search Appliance make nice

For those who have followed The Findability Project, I am pleased to report we have surmounted the basic technical problems of targeting our Pika CMS with the Google Search Appliance.

The back story is one I have purposefully repeated whenever giving a presentation about the project, namely, that our Pika Plan A did not work. We encountered code anomalies in Pika that, among other things, cause it to auto-generate new case intakes and case records when it is crawled by the GSA. As a result, we were unable to use the GSA to crawl the Pika client case content dynamically generated as web pages. Plan A would have been the easiest, no-brainer way to go but we were not able to do so. So Plan B was to have the GSA target the Pika MySQL database directly. Status report: Mission accomplished.

There are GSA capacity issues for us, since our particular GSA’s one million “record” capacity means one million web pages or database records, inclusive, and these database records are not the same thing as the count for client case records. At any given time, we may have some 130,000 to nearly 200,000 client cases in our Pika system (and even more in archival data storage), but from a database perspective, these add up to multi-millions of “records,” e.g., various types of time records, case notes, contacts, and so on. Part of the challenge for us was to sort out which pieces of those millions of database records were the ones most needed and useful to our users.

The solution? Using a well-tailored query, we have the GSA do a selective crawl of the Pika MySQL database to return the most commonly sought and used Pika content: Case numbers, client names, office designations and case notes… tons of case notes. The basic technical explanation is the GSA performs a database query, returns it as an XML feed, indexes that feed, against which the user’s search terms are queried and ultimately returned as viewable HTML

What does the the search result look like? A Google search result. The clickable link displays the case number, client name, LSNC office and primary advocate name, e.g., “90-10-123456 ~ John Client ~ Sacramento ~ Jane Advocate.” Below that it displays in-context text with the search terms highlighted in bold, essentially like a regular Google search result. Clicking the link dynamically displays the actual Pika case note shown in context. Assuming there are multiple possible matches for a particular Pika case record, there is a link to display all the “omitted results,” akin to how regular Google searches work, so the users can see all possible, not just probable matches. Clicking through the GSA search result link also gives the user direct clickable access to the particular client case record since clicking through takes the user to the actual Pika client case record.

That’s the name of that tune.

  • Tuesday
  • February 2
  • 2010

Legal research and the need to be “more like Google”

A few months back, there was a good amount of copy about Google Scholar features for searching federal and state court decisions — an impressive step up for using Google, at least at a consumer-user level, to find court decisions, but (puhleeeze) not as a tool for serious research of legal consequence. More recently the New York Times ran a feature article about changes afoot in Westlaw and Lexis, both of which “will undergo sweeping changes in a bid to make it easier and faster for lawyers to find the documents they need.” The opening salvo in this clash of the legal research titans occurred this week with debut of WestlawNext. To hear Westlaw and Lexis talk about it, what they are in part reacting to is the perceived need to be “more like Google.”

Yes, but one’s understanding of that conclusion depends on how one defines or explains what it means to “Google” things. At the recent TIG conference, during the “findability” segment I presented, I made a point stressing the significance of Google as not being “Google” itself, as pervasive as it is in all our lives. Rather, the significance of Google is the dramatic paradigm shift that has occurred in how we search for and use information. Google is a primary agent of this paradigm shift but certainly not the only one. And the connections between specific search paradigms (universal search, vertical search, faceted search, and so on), the relative ease of locating or discovering information, and improvements in user-interface and usability design — all are converging to enhance the findability of what one is looking for.

That said, the impact of all these trends on specialized (re)search tools like Westlaw and Lexis is pretty obvious. If “Wexis” users are demanding their research tools become “more like Google,” what the users are saying is that those companies must make a paradigm shift, or they’ll go to a company that gets it.

  • Wednesday
  • January 27
  • 2010

Findability slides and video from 2010 TIG conference

I’m not sure what happened with the slides or recording of the Knowledge Management session at the recent 2010 TIG Conference. The session doesn’t show up in the LSC documentation of the event.

In any event, here’s a set of the slides for my “findability” segment, about search paradigms, findability as a concept and what we’ve done to implement enterprise search using a Google one-two punch: the Google Search Appliance in combo with the Google Apps platform. Also, here’s the brief flash video of our portal front end and search result/filtering examples that I ran during the presentation but displayed so poorly. The point of the video was to give the audience a real-world feel for how it all works. Again, my apologies for how bad the video displayed in that setting. Lesson learned.

  • Sunday
  • January 17
  • 2010

Coda re 2010 TIG Knowledge Management session

Last Wednesday at the 2010 LSC TIG Conference, Chicago-Kent’s Ron Staudt and I did a joint session, Knowledge Management – What It Is, Why It Matters, and (Google) Options For Making What You Know Findable. Ron, of course, was cogent, concise and charismatic and stayed within his presentation window and hit all his marks. Me? Regrettably, after all these years, I still haven’t figured out how to squeeze 10 pounds of cement into a 5 pound bag, and didn’t even get to several key points I had hoped to make about enterprise search and The Findability Project. To make matters worse on my end, at the beginning of my segment the Flash demo of how LSNC’s enterprise search front end works faltered badly since it displayed so poorly when projected. (More than one person mentioned to me afterwards that they were simply not able to see accurately what I was describing at the moment. (Uh, it seemed like a good idea at the time.)

With those apologies out of the way, allow me to annotate a few points now to make up for at least a few things that I did not cover during the presentation:

The LSNC “portal,” “intranet” and “document repository”

I feel I successfully got across the point that there is a broader sense of “search” at play that is important to grok, as an organization works toward enterprise or so-called “universal” search. However, because I ran out my clock and didn’t have time to talk at length, I didn’t quite get to describing the varied content targets that LSNC has identified as valuable, useful and usable and therefore all that which we wanted to make readily, easily findable. In going over all that, in passing I mentioned that The Findability Project originally included a SharePoint component which is now being abandoned, in favor of our relying on components of the Google Apps platform, specifically, Google Sites.

The LSNC Shared Portal demo’d but not successfully displayed during the presentation is itself not part of Google Sites. The portal is itself a point-of-entry front end built on a WordPress PHP installation, and designed to complement our Pika 4.0 installation, which is also a PHP application. The portal is a point-of-entry but not a strictly controlled one, in the sense that users are not required to go through it to access either Pika or their Google Apps. But the portal is a custom user-interface that affords our users quick, efficient access to all the core web-based applications they need to do their work, plus a program calendar and a slew of LSNC-specific newsfeeds. And then there is the portal’s killer app: The enterprise search box, the findability trigger that searches all of the valued, useful, usable shared content. The enterprise search box initially gives you what I described in the session as “horizontal” search; at the (poorly displayed) search result page our users then have access to “vertical” filtering options.

And, as illustrated with the search for my personnel information and photo, our users can use the enterprise search box to do special data queries to get specially tailored search results. For example, when I did the demo search for “staff brian,” here’s what was basically happening: Triggered by the keyword “staff,” the Google Search Appliance (GSA) activates a OneBox module that did a query of our Pika CMS database, returned that query result as XML, which in turn was processed through XSLT and output for display as HTML.

The other private content areas I described are all now, or soon will be, part of our domain’s Google Sites. All of our organization’s “official” intranet content is now positioned at a Google Sites location, as is our new “shared document repository.” The GSA works very well with the Google Apps platform, and natively integrates with Google Analytics, among other things. Great stuff.

SharePoint issues

My observation at the beginning of my segment that LSNC was the first legal services field program to adopt the Google Apps platform and the first to abandon SharePoint was not intended to be provocative. It was intended to be transparent about what we are doing and why. Unfortunately, I never got around to explaining our organization’s views on SharePoint.

The short version is this: Given what we want and need to do with our shared work and collaboration space, we simply no longer see any advantages to using SharePoint. Zero. Zip. Nada. At launch of The Findability Project we viewed SharePoint as a key component for hosting and building and sharing content. And SharePoint is a great option for that. It is a very impressive product. But about six months into The Findability Project, Google unleashed Google Sites as part of the Google Apps platform, and for us it was a game changer. Google Apps is free (for non-profits, for the foreseeable future), we don’t have to host, maintain, secure, update or fix it, and Google continues to aggressively improve its features, along with everything else within Google Apps. And we are able to do pretty much everything we need to be able to do with it. True, SharePoint has an enormous mindshare within corporate America. And organizations do need to evaluate whether SharePoint has features or functionality that are unique or indispensible to it. For us, it has none.

Oh, and did I mention that the GSA works natively with Google Apps?

While not the reasons why we have bailed out on SharePoint, there are these views questioning what role SharePoint has in your future: Peter Campbell’s article, Why SharePoint Scares Me; and more contrariness from Dion Hichcliffe, Sharepoint and Enterprise 2.0: The good, the bad, and the ugly.

More self-criticism: What we don’t like about our user interface

Perhaps I spent too much time trying to drive home the importance of usability as a concept and how it relates to findability. I am fascinated by usability concepts and, after now years of practical experience, sobered by the reality of how challenging it is to do well. We are very pleased with what we have accomplished with our portal (and related search result page and Pika CMS) designs shown in the slides. But I also had planned on taking a few minutes to highlight what are remaining problems with our design, and “usability” thoughts about improving or fixing them. For example, we already plan on altering how we use tags as part of the portal page, and will soon be modifying the vertical filtering options on the enterprise search result page, to expand those options and make them more intuitive. I think we have done good. I think we can do better. And we will.

Knowledge management as poetry

I wasn’t entirely irresponsible about keeping within my allotted time. One thing I considered doing but dropped from my presentation to save time, was my giving a dramatic reading of the most famous poem ever about knowledge management. Yes, there is such a thing:

“The Unknown” by Donald Rumsfeld

As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don’t know
We don’t know.

[U.S. Department of Defense news briefing, February 12, 2002]

As far as I can tell, this was someone who never actually grasped basic concepts of findability.

But that’s me. What do I know.

  • Monday
  • October 26
  • 2009

The Findability Project site goes dark

The domain-specific Findability Project site will go dark the first week of December 2009. The formal aspects of the TIG-funded project were completed months ago. We have posted a few items since then, but we are now, in a purposeful way, winding down the public aspects of the project.

The project content will endure but in a different location, here at Webdogs 2.0, the LSNC technology blog where we have long archived all of our public web development projects. The Findability Project is the latest, no doubt not the last, to find its archival home at Webdogs 2.0. For now, we have simply duplicated the site over to a subdirectory there. Eventually all TFP content will be integrated natively into the Webdogs 2.0 site.

From time to time, we will continue this conversation about search, enterprise search, and making organizational content findable, and therefore authentically usable, over at Webdogs 2.0.

Watch the skies, people. Or at least your search patterns. We do.

TFP, out.

  • Tuesday
  • September 22
  • 2009

"A List Apart" search / usability trifecta

Search is nothing new but it is, paradoxically, the new new within some circles of web design and definitely a core element of any sensible usability construct for web sites and web applications. On that note, A List Apart, the New York Times of web design, today publishes a search cum usability trifecta hitting on several issues I will be alluding to during the upcoming TIG conference, including what to make of your metrics. All the articles are read-worthy:

  • Thursday
  • September 17
  • 2009

Revised: What the LSNC Shared Portal now looks like

We have now posted a further revised Jing video with audio providing a brief, 4-minute overview of the LSNC Shared Portal. This is the actual intro overview video we circulated internally to provide all staff with a basic visual and feature orientation, before our more extended, in-house live demos to be conducted next week.

It’s not so easy to do a public video demo of our new Pika 4.0 case management system design changes, because of confidentiality issues, but we will post select screenshots reasonably soon so you can get a visual idea of changes we have made to that application.

  • Wednesday
  • July 15
  • 2009

Summer hiatus

Many months of fish to fry ahead, including bearing down on LSNC’s customized rebuild of Pika 4.0 and working out a solution for integrating our Google Search Appliance. So, laying low until October 1. See you back here this Fall and in early 2010 at the next Austin TIG, when we serve up the whole enchilada.

  • Sunday
  • July 5
  • 2009

Getting Google-y with the enterprise

As a coda to the post yesterday about findability, the pervasiveness of the Google search paradigm, and what that means for the non-profit enterprise, I want to take a moment to put focus on a question during the session about an online post screenshot highlighted in one of the slides: “Why Enterprise Search Will Never Be Google-y.” I fear I did a poor job of answering the question about how it is that the author viewed Google enterprise search as different from other types of enterprise search. Mea culpa.

A couple of follow-up observations, to better respond:

As mentioned during the presentation, one point of the slide was to draw attention to The Noisy Channel, a very search geeky, characteristically Google-contrary, but always interesting, worthwhile blog helmed by Daniel Tunkelang, chief scientist at Endeca, a high-end direct competitor with Google in the enterprise market. Agree or not, there is a lot to learn about search from The Noisy Channel. It is one of my must-reads.

The title of Daniel Tunkelang’s highlighted post derives directly from Chris Sherman’s pithy, two-page online article with the same name, Why Enterprise Search Will Never Be Google-y (from the Enterprise Search Sourcebook 2008.) The gist of Daniel’s post and Chris’ article that prompted it is this: The “simple search” or “known item” search we all commonly associate with Google (the noun and the verb) short changes what enterprise search can or should be for those who use it. The tension between these two enterprise search models is why I highlighted these two paragraphs from Daniel’s post:

The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn’t recommend that anyone try to compete with the GSA on its turf.

But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for “enterprise search” is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.

The irony here is that, contrary to the entertainingly provocative “never will be Google-y” in the title, for some market segments enterprise search is already Google-y. In some respects, Daniel’s post and Chris’ article both actually make the case for, not against, the Google enterprise model, which is to say that for some segments of the enterprise market Google and its search appliance may very well be the way to go. Our experience is that it is a particularly viable way for a non-profit legal services program.

Why do I say that? Even assuming arguendo that Google Search Appliance (GSA) improvements “should be seen in the context of state of the art,” for many organizations this state-of-the-art is a rarified and unobtainable reality. One has to wonder, after costing out a solution with one of the three major market leaders in enterprise search (Autonomy, Endeca and FAST), whether a Google box doesn’t look pretty damn good and pretty damn doable, given what it does. As Daniel himself observes, “I wouldn’t recommend that anyone try to compete with the GSA on its turf.” Is that turf a real solution for some market segments? While Chris invokes a clever if overstated “oil and water” metaphor about the differences between web and enterprise search, he follows it by suggesting the exact opposite: Some enterprise search segments are well served by the Google paradigm, notably including “intranet search” –

Many organizations are encouraging employees to communicate internally via blogs, or to participate in community-based knowledge repositories such as internal wikis. This is one area where there is a genuine parallel between enterprise information systems and web content, and Google excels at understanding and surfacing this type of content.

Tell me about it.

  • Saturday
  • July 4
  • 2009

Findability and the Google search paradigm

Following up on an NTAP presentation I gave last Thursday, Findability and the Google Search Paradigm: Integrating Search as a Organizational Solution, here is a publicly viewable set of the presentation slides, which are in a Google Docs presentation format and include embedded links to a lot of the material I discussed during the presentation. You can find the New York Times article I mentioned about Twitter as an example of “crowd-sourcing” at David Pogue’s post, The Twitter Experiment.

I painted with a broad brush during the presentation. The goal of the presentation was to offer the legal services community a broader view, and an emerging view, of what it means to search, to search on the enterprise, and to suggest what it means to Google search on the enterprise. These are just the slides. While I gave a brief live demonstration of how our GSA installation actually functions when generating and filtering search results, you’ll have to come to the upcoming 2010 LSC Technology Initiative Grants Conference to get a more expansive demonstration and technical explanation of our implementation, including a solution (hopefully) to the problems we’ve had with Pika CMS integration into our enterprise search solution.

As is my bad habit, I went long and so the discussion at slide 72 about the real and imagined obstacles to implementing enterprise search in a non-profit environment got short shrift, and for that I apologize. I promise to do a better job with those issues at the TIG conference. In our experience getting our “stuff” organized, and hammering out practices and protocols, was a much larger time commitment on this project than the strictly technical stuff. And then there are the paralysis-against-progress problems that large organizations may experience since, in my view, they mistakenly think they have to have everything about taxonomy, vocabularies, folksonomies and metadata in place. For example, I have argued here, with our somewhat novel Google Search Appliance implementation in a non-profit environment, that we could do fine for now without relying significantly on metadata to make our project work. Others beg to differ.

In any event, I hope the presentation last Thursday was helpful. Let’s all talk again at TIG in January 2010.

  • Sunday
  • May 17
  • 2009

TFP Taxonomy – Part Four: Revisions to the project's structural taxonomy

We’ve made minor revisions to the project’s structural taxonomy described earlier. With only slight changes in wording, we’ve retained the same basic 29 top-level project directories but we have more significantly, although not dramatically, revised the second-level subdirectories so that they conform a bit better to how most of our advocates organize and think of substantive categories in our line of work. Here they are:

To recap, our original thinking was to keep the structural taxonomy sufficiently broad (horizontal), to be reasonably inclusive of the content categorizations in common use by a legal services program, and purposefully shallow in depth (vertical), to offer modest granulation so as to keep the structural organization and navigation simple and practical.

We struck a balance between using all ten of the very familiar LSC legal problem categories as top-level directory names, and adding additional categories to address obvious gaps. The ten LSC legal categorizations are definitely part of the shared, commonly understood “vocabulary” of the organization. But we added another 19 top-level directories that are consistent with the broader range of topics and tasks at play in our work environment. For example, we have “Housing,” yes, but LSNC does a huge amount of work in “Land Use” and related issues (e.g., housing element, inclusionary zoning, etc.), so we added that category and related sub-categories to the structure. (The existing LSC “Other Housing” subcategory just doesn’t cut it. Land use is not a catch-all category for us, if you get the drift.)

There are several categories in this revised structural taxonomy that reflect this shift in our thinking. A good example is under the LSC “Income Maintenance” category, where we retained the basic LSC sub-categories but added new ones for “Child Care,” “General Assistance” and “Refugee Cash Assistance.” We also tweaked the wording of many of the sub-categories to correspond more accurately to how users here refer to things, for example, by changing “Unemployment Compensation” to “Unemployment Insurance.” Another example is where we retained the LSC category for “Individual Rights,” but concluded that the LSC sub-categories are somewhat muddled, so we created a different if still simple subset. We also dropped some of the LSC sub-categories that have little or no anticipated use. (Really, you do a lot of “name changes” in your program?) We then simplified the directory and subdirectory names by eliminating the redundant references to “LSC Code,” eliminated the LSC problem code numbers, and dropped the cumbersome “Not_*” labeling also used with some LSC problem code names.

Basic housecleaning stuff.

  • Sunday
  • March 8
  • 2009

Google Analytics "Conversion University"

This is a cross post from our Webdogs 2.0 sister site, but may be of interest to those following this project. It’s like this:

Last week Google promoted its Conversion University, an “online course” in some 25 parts for groking the basics of Google Analytics. Essentially, it is a series of topics, for each of which there are from 5 to 15 or so rapid-fire (typically 20-30 seconds) Adobe Presentations explaining how its various features and analytical tools work. For legal services and other non-profits, many if not most of the topics covered in this commercially oriented online course are about aspects of web analytics that are not particularly relevant. But for those not familiar with Google Analytics or those who have not used it for anything more than to track site visits and page views, these online presentations do offer insights about other specific tools within Google Analytics. Among other things, these brief presentations can help you better understand the data Google Analytics generates, how to filter it, how to report it, and how to understand the significance of things like “keyword searches” that bring people to your site and the significance of “site search” that tells you what your users look for once they find the site. Stuff even non-commericial sites would find helpful to know and understand.

Relevance of web analytics to findability is self-evident. That said, if you already have a close, personal relationship with Google Analytics, this is likely too basic. If not, this is a good way to start getting a groove with Google Analytics.

  • Sunday
  • March 8
  • 2009

A painless way to learn basics about Google Analytics

Last week Google promoted its Conversion University, an “online course” in some 25 parts for groking the basics of Google Analytics. Essentially, it is a series of topics, for each of which there are from 5 to 15 or so rapid-fire (typically 20-30 seconds) Adobe Presentations explaining how its various features and analytical tools work.

For legal services and other non-profits, many if not most of the topics covered in this commercially oriented online course are about aspects of web analytics that are not particularly relevant. (After all, it is called “Conversion University” for a reason.) But for those not familiar with Google Analytics or those who have not used it for anything more than to track site visits and page views, these online presentations do offer insights about other specific tools within Google Analytics. Among other things, these brief presentations can help you better understand the data Google Analytics generates, how to filter it, how to report it, and how to understand the significance of things like “keyword searches” that bring people to your site and the significance of “site search” that tells you what your users look for once they find the site. Stuff even non-commericial sites would find helpful to know and understand.

  • Thursday
  • February 26
  • 2009

Comparing Google Sites and GSA search results with release 5.2 in place

All went well with the GSA version 5.2 update. The update itself is a humongous 1.53 GB ISO file that, once burned to a DVD disc and loaded, took about 6 hours to install. As recommended, we did a complete crawl refresh which, in our case, took another 72 hours. Other than this considerable but necessary time investment, we had no real problems with the update process.

As mentioned in an earlier post, the principal attraction of this most recent GSA update was the integration of Google Apps, which enables targeting of domain-hosted Google Docs and Google Sites. In that regard we are pleased to report no problema, as well.

In version GSA 5.2 the administrator now sees a menu option for “Google Apps Integration” with a single field for enabling or disabling one’s Google Apps domain as a GSA target:

With Google Apps targeted generally, then it is a matter of constructing URL patterns to include or exclude more specifically what you want targeted within your Google Apps. In our case, that meant our selection of specific Google Sites now serving as our organization’s intranet content platform. More specifically our search goal was to have the GSA index not just pages within those Google Sites but, as importantly, files uploaded to those Google Sites.

There are differences in how search results display between those performed from within Google Sites and those from a GSA frontend. If a search is done from within Sites, it will find and return a search result for keywords or phrases within an uploaded file, but not display the context of the keywords or phrase. For example, using the search law school+"reimburse me" one gets this specific PDF search result from within Google Sites:

The same search done from our test GSA frontend that returns results from everything targeted by our GSA, yields the same search result while showing the keywords and phrase in context:

So, the basic differences in how search results display are these:

An internal Google Site search will find and return results based on keywords and/or phrase within a file uploaded to Google Sites, display the filetype as an icon (in the above example, with a PDF icon), display the link using the file name, but not display the keywords or phrases in context.

In contrast, the GSA search result will find and return the same result but display the keywords and/or phrase in context, display the filetype as an acronym (e.g., “PDF”), and display the link as what the algorithm discerns as the document’s title (in this example, “Law School Loan Reimbursement Request Form”).

  • Thursday
  • December 18
  • 2008

Presumptive Shareability

After the first of the year, we’ll be cranking up as we complete porting of our existing target documents into our new taxonomic organization, resolve some filtering and usability touches we want to integrate into our default GSA front end, primp and polish the layout and presentation of the front end, implement a few basic OneBox modules, and set in motion what we’re now referring to as the “Rolling Thunder Roadshow” to all our eight office locations.

The RTR will be our way to recognize and promote among all our staff the changes in how documents and other files are made easily and intuitively findable, and given a new level of access and usability throughout our non-profit organization. After all, that is the core purpose of enterprise search. And a key element of all this is changing deeply rooted individual notions or assumptions about what can or should be “shareable.”

In working on this project within a non-profit environment, we have learned that most employees have an inclination to undershare, not overshare. Not because they are selfish or secretive; rather, because the type of transparent sharing that enterprise search makes possible is foreign to most of them. It is familiar to them to be asked to provide a document to others on request in person, by phone or by email. It is foreign to them to decide in advance that a document they created or have received from someone else should be transparent to the rest of the entire organization. The concepts of creation and possession are severed from the concept of findability.

To be sure, the increasing use of collaborative web-based document tools within our organization — principally our adoption two years ago of Google Apps — has helped us on this journey. Most staff at this point are familiar with the concept, if not the practice in their individual work, of creating or editing or uploading documents that can be “shared” from a common web location. They get that, even if they don’t do it themselves, because increasingly others demand they do so… when they get a “share” message email from Google Docs about a document someone created or edited there; when they get an email with a link to something someone else posted in our domain’s Google Sites; or when they get a message to fill out a Google Docs form for, well, whatever.

As we prepare for the RTR, the team working on this project have brainstormed about what we can say or demonstrate to the staff in each office, to prompt them to rethink (OK, in some cases just think) what types of documents should be shared with others by adding them to the new document repositories.

We now refer to this as “presumptive shareability.” In particular situations, it may not be appropriate to make the document or file transparent through enterprise search, but in most cases it will be because all are situations where the document or file has served a shareable purpose, i.e, use by more than one person or re-use by one or more persons.

Among the situations we think should trigger staff to think to add the document or file in question to the shared repository are the following:

  • An attachment to an email message you send or forward to someone else.
  • You request or receive a file as an email attachment from someone within the organization.
  • You receive a non-confidential file attachment from someone outside the organization.
  • Every time you re-use a document or form as part of your work.
  • You learn that the PowerPoint (or other presentation format) for a training or conference event you attended is now available for viewing or downloading.
  • You lug home substantive hard-copy handouts distributed from a training or conference.
  • Can you say, “presentation” and/or “portable”? Whatever it is, if it is a PDF or PPT file it is presumptively shareable.
  • If it is the “final” version of a case-related pleading, memorandum, exhibit or correspondence and you think others may find it usable, share it.
  • Usable documents you discover and think to save to your desktop as part of research on the Web, regardless of file type (PDF, DOC, XLS, etc.)
  • Similarly, when doing work-related research on the Web, anytime you think to bookmark a web page or save the page to your desktop as an HTML or TXT file.
  • … you get the drift.

Shareability promotes findability. That’s our story and we’re stickin’ to it.

  • Thursday
  • December 4
  • 2008

The search box as a findability (design) concept

Fair to say that without a “search box” there is no enterprise search? That being true, consider Designing The Holy Search Box: Examples And Best Practices, yet another interesting design compilation/distillation article from Smashing Magazine. True, this is not The Big Wroblewski (the form abides, dude), but it’s a pretty good read on what to think about when designing, labeling and positioning a basic search form.

  • Wednesday
  • November 12
  • 2008

Converting hard-copy documents for addition to the shared repository

A late October post at the Official Google Blog entitled A picture of a thousand words? prompts me to draw attention to an analogous TFP document protocol we worked out a few months. It is worth highlighting because it is so practical and will be an invaluable source of additional knowledge content targeted by our GSA.

But first, the Google post: Read it and you’ll discover that “In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content. We had occasional clues from references to the document, so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format.” (As lawyers are so fond of saying, emphasis added.) As the post illustrates by example, do a Google search for repairing aluminum wiring and at the top you’ll see a PDF listed. If you download the PDF and open it, and you’ll discover it is an image of a text document. The downloaded file is itself not text searchable. But click View as HTML for that same result and you’ll discover that the text is actually indexed and searchable via Google.

Essentially, we are doing the same thing within our own enterprise search ecosystem, but with an added advantage. Not only have we adopted a document handling protocol for using our networked printers/scanners to convert select hard-copy text documents to PDF image files, we also process the resulting PDF images through Adobe Acrobat’s native “OCR Text Recognition” tool, add then save it with some basic metadata added.

Once added to the shared document repository, the scanned and OCR’d text document is then fully indexed and searchable by the GSA. And when the user finds and downloads the file, it is fully text searchable itself when opened in Adobe Acrobat or Adobe Reader. One better than what Google itself now does, superbly.

  • Monday
  • November 10
  • 2008

Selecting GSA targets – Part Two: The Practical Realities

In an earlier post about selecting Google Search Appliance (GSA) targets for this project, the narrative definitely edged toward the more abstract. We highlighted four principal sets of GSA targets: files on our newly created “shared document repositories”; repurposed intranet content being moved over from an MediaWiki installation on an old server; cherry-picked content available on LSNC’s varied public websites (LSNC maintains 13 distinct public websites and subsites); and select records in Pika CMS, the secured, web-based case management system used by all our advocates.

As far as it goes, this abstract list of GSA target sets fairly summarizes what we, as an organization, want to make transparent via enterprise search, which is to say make “findable” in ways not practicable without the GSA. This abstract list of GSA targets, however, fails to convey what we have done at non-abstract, practical level to make those targets useful to our larger search goals.

So, let me hit a few notes about several practical decisions we’ve made at launch as we target the GSA at real files offering real search results.

As described in Part One, when we first unpacked our GSA and aimed it, uh, somewhat aimlessly at any and every file on one of our local servers, the GSA did its job in killer fashion… and blew out our file limit. While one can proceed that way, we were always mindful that we had to sort out how to organize and structure the shared content that we seriously wanted to make searchable and findable. So, one of the first tasks we confronted on this project was to work out our thinking about “taxonomy,” resulting in the basic directory structures we have adopted.

That taxonomic “organization” step was essential to this project, but completing that particular project objective doesn’t translate directly into searchable content organized in a particular way. You see, there is this pesky little detail: Real people need to actually identify the existing and/or newly created files to be included and then somehow get the files in the directories on the shared document repository that are the target of the GSA.

Easier said than done.

In our case — particularly given the limited IT and support staff resources available to us as a typical legal services field program — we had to come up with some practical approaches to move existing files from any number of different locations to the designated shared locations or document repositories. (I will discuss how we handle adding newly created files, in a later post.) Here’s what we did with our existing files to fold them into the content targeted by our GSA:

1. Initially, include all existing “staff-specific” content, with an opt-out

We did find ourselves on the receiving end of a lot of staff enthusiasm about this project. Truly. But it is impractical and unrealistic to expect your individual legal services advocates and other staff to comb through all their thousands of files and then move them over to a different file server location. (Maybe it should be realistic to expect them to do this, but in our experience it just ain’t gonna happen. No way.) But there are tons of content gold in them thar files, so we had to figure out a way to initially get all that good stuff in place, even if not parsed out in a taxonomic sense, so we could target it.

To accomplish this, we first vetted with, and got buy-in from, all our local offices to do the following:

On the local project file server for each local office, we created a special project “archive” directory. Then each local office manager copied each individual staff member’s files wholesale over to a user-specific directory in this so-called archive directory. Having an unequivocal “opt-out” option was important to the success of this approach. Again and again, in formal meetings and informal discussions, we reminded office staff that they could ask that any or all files to be removed from these initial archive-file targets. No questions asked. There were a few such requests, but not a lot: One advocate asked that her files not be targeted at all, so we removed all her files; two others had less than a dozen files they wanted removed as targets, so we did so. No biggie.

The net effect is that this makes the targeted advocate files initially non-taxonomic, but in short order you have a huge repository that has a (allow me to exaggerate here, for literary effect) 99% chance of including pretty much everything the individual staff members would add if they “woulda, coulda, shoulda,” so to speak. In our case, this initially amounts to about 300,000 document files, the vast majority of which are advocate-generated files.

At launch, this does mean that these office-specific, bulk compilations of existing files added as targets include a significant number of drafts and duplicates that one would normally not include as a shared file if it were being added as a newly created file. For example, within our office culture it is not only common but actually expected that advocates not work in isolation on major cases. (We discourage the “lone eagle” model.) So, our early search-results testing shows that often the same file shows up in more than one target location because more than one advocate has a copy of the file in their archive.

It bears mentioning two other factors we kept in mind as part of this initial targeting of shared files: We double- and triple-checked with all management staff to assure nothing management-sensitive or -confidential was moved to a location where it could be targeted. Also, before we moved anything over wholesale, as described, we asked all staff to remove certain types of files that no one would reasonably expect to be part of the searchable content. Examples: Family photos, MP3 music downloads, YouTube videos, yada yada yada. Enough said.

We do have an approach in mind for “peeling off” these office-specific archives over time, to separate out the drafts and duplicates and place them within our taxonomic directory structures. More about that later.

2. Using Google Sites as the platform for our existing intranet content

I recall having a passing conversation with Gabrielle Hammond at last year’s TIG conference about how we were holding off on further intranet development while waiting to see how Google implements its JotSpot-based wiki application, now known as Google Sites.

Well, people, we now know what Google Sites is all about and we love it! For the last several years we had been using MediaWiki as the publication platform for our intranet, but we are in the process of replacing it for our internal wiki needs. We are about half way through that process, which should be completed shortly after the first of the year.

One big bonus of moving our intranet content to Google Sites is that it is quasi-tailor made to work with both the GSA and Google Analytics. I say “quasi” because the interactions between them are good but hardly optimal at the moment. For example, only days ago Google Analytics for Google Apps was rolled out, but the quality of the data we are getting so far is not so easy to get a handle on. More importantly, Google promises GSA integration with Google Sites, but it is still a buggy implementation. We have easily targeted test site pages within our domain’s Google Sites, but have hit a wall with getting the GSA to properly return search results on the indexed content within files uploaded to Google Sites. Turns out we are one of several organizations that have identified this problem and Google Enterprise support assures it will have a fix with its next software upgrade, in about a month or two. We (and our GSA consultant) are confident this will work in due time, but it’s one of those details we have to wait on for now.

3. Updating our public web content

Over the last 10 years, LSNC has placed an enormous amount of its advocate content out on the public Web. But one recent example is the California Food Stamp Guide, a prime example of public content that our advocates can search at that site, but would want to be able to search directly via our GSA shared portal. It is also one example of a content cluster that can be part of or its own GSA collection. (“Collections are logical views of information in the index, as defined by URL patterns. This allows you, for example, to index the entire contents of your intranet, but then divide it up into logical groups of content.”)

Implementation of The Findability Project has prompted some public housekeeping. Our target testing of our public content, predictably, reveals that we have stuff out there that is, well… past its shelf-life, shall we say. So we are working on a systematic way to thoroughly review and clean up that public content. It is obvious but important: Current and correct public content means better search results via the GSA. (Apologies to the larger legal services community for not doing it sooner.)

4. Targeting our case management system

We consider our Pika case management system a key, long-term GSA target. But we are not there yet. We have prioritized getting all the other targeted content organized and in position, with clear protocols in place. We also are busy reworking on our shared portal, which will integrate the GSA search functions and provide users with (hopefully) intuitive ways to filter their search results, search select content collections, and provide the users with some nice Google GSA touches like OneBox searches, among other features.

That all said, being able to target our case management system is a total no-brainer and perhaps the most practical of necessities. In a given day, there is likely nothing more common or more vital to our work for clients than the search for information within our case management system. The native search functions built into the current version 3.07 of Pika are good. But we are optimistic that we can exploit the GSA to make those searches even better. And certainly more integrated with everything else in our new enterprise search universe.

  • Thursday
  • October 23
  • 2008

Going Forward: Document “best practices” and protocols

In earlier posts I have shared memoranda distributed at a recent organization-wide meeting, including an explanation of our taxonomic structures and details of various file-naming conventions adopted for this project. Attached to this post are two additional memos:

As I so often like to say, allow me to explain:

In making practical decisions about handling files targeted by our Google Search Appliance (GSA), we look both backward and forward in time. This dichotomy between the past and the future is one that Google Enterprise itself promotes with its cursory recommendations that its customers decide for themselves where to locate existing content and new content.

Based on our experience working on this project, there are considerable differences in how to handle the “past.” A separate post on those issues will be coming forth, soon. But going forward into the “future,” we have thrashed out the practices and protocols detailed in the two memos linked, above. While there are institutional contexts for some things described in the memos that may be lost on those not part of our organization, the memos are (hopefully) self-explanatory. There are other practical details about the document protocols that will be expanded on in later posts, including how the Shared Repository web interface works, the integration of our metadata models, and so on. (All good things come to those who wait, at least with this project.)

Among the most practical observations in these memos, I think, is breaking through the common but incorrect perception that one needs to save a document to “the” correct directory, as opposed to “a” correct directory, however it is done. And while staff are instintively bewildered, somewhat, by concepts of “taxonomy” and “metadata” and wonder how they will be able to find things if they are not located in “the” correct directory, it is also extraordinarily reassuring to them to know that we’re talking about Google here. Even if they do not understand how it all works, typically they have great faith that Google search will find it for them, as described in an earlier ancecdote.

The memos also attempt to address some of the practical realities and limits of a non-profit, legal services work environment. LSNC has neither the resources nor motivation to micro-manage how users organize their own file directories. Life is too short. But as detailed in the “advocate-user directories” memo, we do now require all LSNC staff to have a user-specific, user-named directory, and that the name used be the user’s full name. The primary motivation for this requirement is a practical need to standardize directory-name conventions throughout the organization, so that the location and targeting of files is predictable, manageable and findable. And, if for no other reason, doing so eliminates the need to guess whether something is located in the Shareen, Shari, Shelly or Sherri directory — a real-world example from our Auburn office, illustrated to the right.

  • Saturday
  • October 11
  • 2008

Rethinking the Pika CMS home page

There are two web projects at LSNC that are moving toward a point of convergence: The most notable of the moment is the ground work we are doing on managing our knowledge content and making it all “findable” via a Google Search Appliance (GSA), which is being documented at The Findability Project (TFP). The other is an in-progress code updating and design refresh of the Pika case management system we have been using with great satisfaction for the last several years, which in earlier iterations we documented at PikaDocs and more recently (and less extensively) here as part of the great and ongoing Pika~palooza.

One of the great lessons we have learned from the whole Pika experience is that our users really like having a commonly shared page from which to begin their work. To some extent, this appetite for a shared page or portal was meet by the Pika CMS “home” page, which we modified by using WordPress to generate a functional, customizable message or news screen. Back in the day, which is to say a mere three years ago, when we did our first customizations of Pika, that modification meant our using WordPress 1.5, a suitable plugin to dynamically convert the native WordPress RSS feed into HTML post content, and recoding a few Pika files to get WordPress and Pika to cooperate with each other. Fun to do, but it was still real work to get it done.

That was 2005. Three years later we are implementing enterprise search and now very consciously building an organizational shared portal to supplant the Pika home page. Simply put, in our next iteration of Pika, which we hope to have in place early next year, what Pika people know as the home or start page of Pika CMS will go away. The functional utility of the Pika home page will be folded into our emerging shared portal, the point of search and entry to many things, one of which will be Pika. (It will take work to do it, but we also expect to supplant the Pika native search function and substitute a subset of our Google Search Appliance functionality to accomplish the same thing, only better.)

To get feedback, we are constantly tooling with varied options for the shared portal as we test things out with different LSNC offices. For example, we have already modified the temporary TFP search portal we experimented with earlier this week in a presentation to our Sacramento Office. After that event, we reorganized things to emphasize the location of the news feed, moving it from the right side to the left, and created a temporary second search field to heighten user awareness of our “Google Sites” special collections. (In final implementation, all our document collections will be searchable from an integrated single search field, with various options for filtering the search results.)

And what about that LSNC “news” thing on the shared portal page?

In 2005 we had to do some attentive coding to get it to work with the Pika home page. Three years later, all we needed to do was pop the native WordPress 2.6 feed into FeedBurner’s totally freebie Buzzboost feature, click on a button to generate a small bit of javascript, past that script code into the shared portal page, and there you have it — an HTML standards-compliant, fully CSS customizable, full-text news module.

  • Friday
  • October 3
  • 2008

Don't overthink file naming conventions

This is a quick post to discuss some basics about file naming conventions worked out as part of this project. At a recent program-wide meeting to discuss details of this project with all our office managers, among the memos distributed was the following:

If you take a look at the files in your advocate staff directories, you are likely to see individualized albeit typical patterns in the file names. There is a discernible Darwinism to the conventions individual users adopt: They use both directory structures and name files in a way that makes them “findable” for them later, if not for others. (OK, basically “unfindable” by others, in a lot of instances.) Common naming patterns include, in almost all instances, at least a generic name descriptive of the type of document (e.g., petition or complaint or writ), plus other descriptive elements that help the user to later locate it, such as a client or project name, the date of the document, and/or whether the file is a draft or a final or a version copy.

There are any number of ways one can go with file naming conventions, as well illustrated in the article at CompuJurist, Are there any recognized “best” practices for file naming conventions? Akin to what is discussed in that article, we’ve adopted the following template for use by advocates to name their files:

[draft/final]  [document type]  [party/case]  [subject]  [date] [file extension]

Yes, there are other longstanding concerns about how files are named, beyond what may concern your attorneys and other advocates. These concerns include naming conventions driven by the demands of particular operating systems, all too often non-intuitive but technical project requirements, or the recommendations of 800-pound gorillas like Google (explaining why Google favors dashes over underscores).

All that said, if you look at the memo linked above, you may notice that the examples promote the use of underscores, rather than spaces, between words in the filenames.

Our thinking is this:

  • As a practical matter, using spaces between words in file names creates file transfer problems when moving the file from one server to the other. Not a good thing. Especially when you are dealing with relocating files that count in the hundreds of thousands.
  • Using spaces creates readability problems when viewing the path of the file in a GSA search result, because the GSA normalizes the URL by inserting special characters wherever the file has a space in its name. Even if you didn’t know what to call it, you’ve seen this phenomenon. Here’s a real world example in a GSA search result from one of our test bed sites:

    /Ukiah%20Office/Former%20Staff/Kan's%20Transfer%20File/letter%20to%20jake.wpd

    Look familiar?
  • File names with underscores are easier to read than files with dashes. They just are, OK? To be fair, not everyone is going to agree with that proposition. But we did some admittedly unscientific user testing (hey, Glenn, you get what you pay for), where we asked staff to read the same file name listed three ways: With spaces, with underscores, and with dashes. Without exception, our crack team of testers said they found it easiest to read the file names if they had spaces (duh!); less easy to read if there were underscores; and least easy where dashes were used.
  • A usability corollary: If you use underscores, a linked file name in a search result is easier to read because as a link the file name appears underlined, so words appear as if they have spaces, which is the easiest of the three formats to read (see test results, above).

This is fairly prosaic stuff and bears some thought, but is not worth overthinking. Or much of an enforcement regime. The project goal is not to have nice, neat, compliant file names. That’s an objective. It is not the goal. We are not investing a lot of time worrying about those who paint outside the lines. The point of the project is to get the files targeted properly so that users can find the content they contain.

We are confident that, as users see file names displayed in the GSA search results, it will sink in why it matters how one names the files, and they will adapt.

  • Tuesday
  • September 30
  • 2008

The Findability Project Taxonomy – Part Three: The Anecdotes

This is a non-extra credit read, somewhat tangentially related to “taxonomy.” But, hey, this project is hard work and I’m entitled, as are you, to have some fun, no?

I previously alluded to how I sat down with each of the advocates in our flagship Sacramento office to view and discuss how each organized their files. I’m not suggesting you need to do this with everyone in your organization. But doing so with at least a fair cross-section of your people will teach lessons not likely learned otherwise. Let’s call it “reality.”

Three particular experiences in doing this are favorites of mine.

The first relates to the same advocate who, the good sport that he is, agreed to let me post a photo of his hard-copy file organizational scheme. When I sat down with him to take a look at how he had organized his files on a local server, it was a gloriously indulgent vision of horizontal organization. The guy (who is one of our best welfare lawyers) had 595 MB in 2,623 document files … wait for it … in one folder. Whew, talk about going “broad-and-shallow”! Because his file-naming conventions include the relevant client name, I really can’t give you a screenshot of this Ripley’s moment. There was something really extraordinary about this encounter, almost anthropological about it, akin to witnessing an indigenous tribe in the primeval, untouched by the outside world.

The second involves the polar opposite, another highly regarded lawyer who is hyper-organized. And well he should be, with 3,616 work files totalling 3 GB tightly organized in 671 folders and subfolders. Peter Morville would not likely approve of his organization scheme, I don’t think, since this advocate went for a “narrow-and-deep” hierarchy, with nine top levels and 662 subfolders, as many as five levels down.

And then there is the third anecdote, my favorite of all. As I went through this attorney’s files, I was authentically impressed by how sensible and well organized her directory structure was. While I organize my personal directories differently, hers were organized much the way many advocates in the program do (by cases or projects or substantive area), easily understood and well suited to how she works. Broad enough to hit all her bases, yet with enough subfolder depth for her to “navigate” to find particular files. A good, functional result for her.

As I showed her the project taxonomy, she was fine with the top-level selections. She understood immediately and instinctively why those choices had been made and had no quarrel with them. But when I showed her that the subfolder organization only went one-level deep, her facial expression changed noticeably. She said nothing but I could see her anxiety. So I asked her, “You look worried, a bit. What are you thinking?”

She paused and then she asked, “If you organize the shared directories this way, with only one level below the topics, how can anyone ever find anything? I don’t work that way.”

This was my response:

“Have you ever used Google?” (She good-naturedly looks back at me with her best “give me a break” smirk on her face.) “Well,” I continued, “when you search with Google you are usually able to find what you are looking for, right?”

“Of course,” she answered.

Then I said, rhetorically, “Do you think Google ‘organizes’ the Web in subfolders like you do?”

“Oh, I get it,” she said. “Everything doesn’t have to be organized that way if I have a way to Google it, right?”

  • Tuesday
  • September 30
  • 2008

The Findability Project Taxonomy – Part Two: The Practice

We’ve laid out our take on the theoretical approach to the TFP taxonomy. But in practice, how is LSNC actually implementing those organizational concepts or principles? That is what this post is about.

I’ll just give you the end-product upfront and then explain how LSNC sorted out the basic taxonomic structures for its shared document repository. The two PDF files linked below are copies of what was distributed at a program-wide meeting a few weeks ago to address and resolve what the basic organization structures would look like.

It was actually quite easy to come up with an initial (if bloated) proposed list of likely substantive advocacy content targets, their location, and how the content would be organized, but even that required process.

LSNC has what it calls a “regional counsel” model, which means there are three designated advocacy leaders with senior substantive, litigation and advocacy experience who are expected to provide just that, “leadership.” (One of the three, by the way, is Mona Tawatao who is the recipient of the 2007 NLADA Reginald Heber Smith award.) The regional counsel, with feedback from other management leadership (including the executive director, a few local office managing attorneys interested in this particular project detail, and the senior office manager representing support staff interests) worked up the list, later vetted more broadly with the entire management team, who in turn vetted it with each of their local offices or other program unit.

In the initial proposal, the substantive advocacy content was organized based on the ten LSC Problem categories in current use by legal services programs, plus roughly an additional 30 or so other general categories. The latter included additional substantive categories (economic development, disaster relief, etc.), practice matters (e.g., federal and state court practice issues, discovery, etc.), and other work-related content (self-help clinic content, specialized training materials, etc.) that reflect what LSNC and other legal services field programs actually do for a living. In response to any number of discussions and comments by the smaller group thrashing the details out, the list at times expanded and contracted, went deeper and then sometimes more shallow. This initial organization structure also included targeted content related to local office and central administrative office work. A similar vetting process was undertaken by the senior office manager with all the other office managers in all the core local offices, as well as administrative and business office managers. As mentioned earlier, each of those, in turn, were asked to vet the structures with their respective staff.

This process did not operate in a project vacuum. As not only one of the three regional counsel but also the person responsible for managing this project, I also did what I think managers should always do: I talk to the people affected. I took the time, a lot of it, to speak directly and individually with all of the forgoing to explain the overall project and its technical demands, and in a non-technical fashion (well, at least I tried) the significance of developing an organizational structure, and other, related issues, such as the use metadata models to attribute value to the targeted content, and so on. The point being, to take the time to assure leadership understood from more than a memo what the project is about, why it matters, and answer their questions or concerns. In response to the vetting and these dialogs, real changes were made in the proposed organization and additional content targets were identified. Time investments paid dividends, at least in this case.

By the time our GSA consultant showed up for a scheduled three-day thrashing of our test-bed installation in Sacramento, we had a taxonomy with over 40 top-level directories and a lot of two- and three-level deep subdirectories. He looked at this, in a non-committal fashion said “that’s fine,” and then began to suggest reasons why it should be simplified. This push by the GSA consultant was prompted by notions of usability and manageability of the content areas. As mentioned in the prior post on project taxonomy, there are not significant advantages or improvements to search results in a repository structure beyond a second-level directory. The consultant also emphasized that most users are not likely to locate or use a directory substructure below the second level. (This has to do with users navigating directory structures to add, remove or modify files, for whatever reason.)

Since a significant portion of the metadata models we are adopting rely on the organization structures in order to build logical, searchable “collections,” we simplified the structures in response to the consultant’s recommendation in this regard. Hence, the 29 top-level directories and the “simplified” taxonomy you can see in the memos linked at the beginning of this post, and the reliance on only one-level deeper for those directories.

As we get life experience with this organization structure, my guess is that we may expand to add a few additional top-level directories but not many, if any. I think we have things pretty much covered at the top-level, at this point. But apart from the rigid yet practical exploitation of the dated — but undeniably familiar — LSC Problem Codes for a large chunk of the substantive organization, my guess also is that the one-level down subdirectory structures will likely change as users give us feedback, and we discover that some subdirectories are not particularly used or useful. Proof’s in the pudding, people.

This all came full circle with our program-wide meeting a few weeks back. By the time of that meeting, every manager within LSNC had seen the organization proposals, every manager had a one-on-one conversation with project staff about the project and the organization structure, every manager had vetted the proposal to his or her people, and the memos you see linked here had been distributed to all offices.

That’s how we roll.

  • Tuesday
  • September 23
  • 2008

The Findability Project Taxonomy – Part One: The Theory

First, a recommendation. Get your hands on a copy of Information Architecture for the World Wide Web (also linked on the right, under “Biblio”) and read chapter 5 about “Organization Systems.”

Why? Well, let me put it to you this way.

We did a lot of homework and scoured a lot of books and, of course, talked to our GSA consultant on what is popularly (if imprecisely) referred to as “taxonomy.” You know, how should we organize all the “stuff” we want our users to be able to find? How hard is that?

As we canvassed widely to get an answer to that basic, practical question, we discovered you can get totally befuddled and sidetracked, not only by any number of levels of abstraction, for example, should you choose to wallow in construction of controlled vocabularies; but also by all too “inside-baseball” discussions by the taxonomy community; or, by yielding to the dark side and joining a formal organization for this sort of thing. Of course, there is also the emerging school of “social organization” of content referred to as folksonomy, more popularly known as tagging. And then there is the school of thought within some sectors of the search community that, after all is said and done, taxonomy may not be particularly useful for enterprise search design.

Needless to say, these initial forays into this subject prompted the thought bubble … “Just shoot me now.”

On this point, the GSA consultant was not as directly helpful as I thought he would be. The short story is that he was supportive of what we thought we needed, but at the end of the day he was essentially agnostic on this point, a view that mirrors Google’s online GSA resources. In discussing how to plan for a GSA implementation, Google says not much more on this point other than “analyze your business’s content and decide which directories and files you want indexed.” (In fairness to our GSA consultant — whose name, by the way, is Igor — you should be sure to read below, for his helpful guidance on simplifying the taxonomy we adopted, and the reasons for doing so.)

Which begs the question, how should we do that?

There are online articles that are straightforward and helpful in grasping, at a rudimentary level, the basics of information architecture, one recent example being Better Living Through Taxonomies, at Digital Web Magazine. But based on our experience, I recommend you pass Go and head straight for Peter Morville and Louis Rosenfeld’s Information Architecture for the World Wide Web, a book that is part of the IA canon, and deservedly so. It is a superbly clear-headed, well written overview of what information architecture is all about, and Chapter 5 on organization systems, specifically, is a model of how to explain a technical and complex subject like “taxonomy,” among other things, in plain, accessible language. And it will hit the mark on the main issues you need to think through to get “stuff” organized.

What are those practical issues? Indulge me a bit, since several of my observations here simply echo what I am recommending you read, but for LSNC we distilled our theoretical approach to taxonomy or organizing our content to these four basic precepts:

1. The directory structures need to be a hierarchical or “top-down” organization of simplified, familiar categories.

In the broadest sense of “organizing” things on a file server, and how that same “organization” is reflected in page menus or page navigation or dialog boxes, users need to know where they are and what the folders or subfolders mean. Lawyers, by training and practice, work in an especially pronounced hierarchical environment. (Can you say, “I, II-A, etc.”) While the work environments of legal services programs are famously “anti-hierarchal,” the practical truth is that almost everyone in that environment organizes their work in some hierarchical fashion. (Certainly, there are exceptions.) Simply put, this is the most common way in which most people organize things, lawyers and non-lawyers alike.

2. Names for content folders, subfolders or categories need to be consistent with the shared vocabulary of your organization.

This may seem self-evident, but in practice may not be what users in your program do or are accustomed to. I actually took the time to look at the folder organization of about a dozen advocates in our Sacramento Office, and while there were predictable folder organizations (for example, organizing files by case or project or substantive area), much of the naming was ambiguous. While no doubt obvious to the advocate who created the directory or subdirectories, to others the same structure or organization may be too subjective, ambiguous or confusing to be useful to anyone other than the person who created it — and even possibly for him or her at some later time, when the subjective rationale for the organization has been long forgotten. So, when working out the naming conventions for folders and subfolders, it was important to focus on commonly understood, familiar shared vocabulary or terminology.

From the perspective of the GSA, the particular names, as such, of directory folders or subfolders is of no consequence. The GSA does not care what you call things, which explains the agnosticism of Google and our GSA consultant on this point. At the blunt-instrument level, all it cares about is the URL, the path to where the content resides. You deal with the Tower of Babel; that’s your problem. The GSA will ferret out the content wherever it resides, regardless.

To be detailed in the next post on this subject, LSNC has adopted the most conventional names for its directories it could come up with, including … I pause, for the pain it causes me to say this … the LSC substantive problem code categories, which comprise roughly half of the directories on our shared document repository. If one were organizing legal services practice today, I am confident it would be organized differently than how LSC organizes it. But roughly 40 years in, LSC still uses an extraordinarily unsubtle and somewhat uninformed organization of legal services practice. But it is what it is, and it is what field programs must use, and it is what users within those organizations know and understand, after decades of use. For better or worse, it is the “shared vocabulary” of our organization, and its use offers consistency with how other information and data is handled, most notably client case data.

3. “Lean toward a broad-and-shallow rather than narrow-and-deep hierarchy.”

That’s a quote from Morville’s book. And his observation is consistent with the advice our GSA consultant gave us. The consultant’s advice was not to go more than two levels down, and really pushed for only one level down. The rationale was two-fold: The more subfolders you have, the less likely users will locate or use content in those folders whenever they are navigating the directory structure, in whatever form it is viewed. From the user side, a deeper vertical hierarchy actually reduces findability.

From the GSA side, deeper hierarchy does little or nothing to improve search results. While the search algorithms baked into the GSA exploit the URL path at the directory and subdirectory and sub-sub-directory to improve search results, having third or fourth or more levels does essentially nothing to improve those results. There’s no harm to doing so. It just doesn’t help you.

A counterpart to this issue is the importance of striking a balance. By going broad-and-shallow, one gets the practical advantage of being able to add content without the need for major restructuring. Assuming you have figured out a set of top-level directories that pretty much covers, in a broad sense, the content your users will want and need to search for, from there on out you can focus on adding content below that level, as warranted.

But if you go too broad, from the user side, things get more cumbersome and impractical. Think about it. Whether your users are advocates or office managers or volunteers, whatever, it is going to be more practical and useful if they can visually and cognitively grok the organization scheme. So it needs to be broad enough to cover the bases, but not so broad that it becomes incomprehensible.

Sure, we could have gone totally nuts with the taxonomy and, say, adopted the thousands-of-points-of-substantive-light offered by the well intentioned but ill fated National Subject Matter Index. (Don’t get me started.) We’re more practical. As detailed in the next article, LSNC is going with a simplified 29 top-level directory structure, and each only going one-level deeper. Works for the users. And works for the GSA.

4. It’s not all about taxonomy.

Having a basic, practical, commonly shared taxonomy or organization structure is essential to a project like this. LSNC content needs to be located somewhere to be targeted by the GSA, and those who add or contribute or remove that content need to be able to comprehend what is where. The practical side of what that all means will make more sense in later articles about the document protocols we have come up for LSNC users to locate and add content and how to add metadata to that content.

But having a traditional taxonomy is not the whole picture. There are other types of content you may want to target that don’t fit the taxonomic model: targeted database content (case management systems come to mind, but are not the only example); external site content (such as select public website content to which your organization has access or permission); and alternate content sites that you would want to target but over which you don’t have the same level of control (a current example would be domain-hosted Google Sites, a subset of Google Apps, which you can “organize” in a superficial way but which at the level that matters to the Google Search Appliance, not so much).

What this means for LSNC is that we are targeting the GSA at more than just a nominal taxonomy on our shared document repository.

  • Sunday
  • August 31
  • 2008

Selecting GSA targets – Part One: Four abstract targets

It is, of course, not enough to simply build an enterprise search platform. Sure, you can do what we did on day one, when our Google Search Appliance (GSA) arrived and we gleefully hooked it up to our local Sacramento Office network and did a global target of everything. You know, just to see if our GSA worked. It did. And in short order, as we blew out its one-million file crawl limit, we discovered the obvious: LSNC has a whole lot of documents and other files strewn about on various file servers and desktops, like so much digital flotsam. Needless to say, we did not need a TIG-funded GSA to reveal that fact. To know that, all one has to do is invoke Windows Explorer and peruse one’s local office file server. Enough said.

From the perspective of our enterprise search goals, most of these files do not contain content that has what we refer to as “shared value.” Namely, advocacy or other work-related content or information that LSNC staff would want to search for because they want it or need it to get the job done.

This observation does not suggest that all the other individual documents or files have no worth. They do, but to other purpose. For example, on a practical level, an advocate may have any number of drafts or versions of a document or file, but what the organizations will want to target and what users will want to get their hands on is the final or more polished version of that content. And that is likely what the original author will intend to share.

But if the organization targets everything, well, in the broadest sense what those who search will get is a lot of extraneous or incorrect or incomplete content. And a less serious but real-world challenge is the organization’s need to separate the true wheat (even if marginal) from the inevitable digital chaff on local office file servers and desktops. (Oh, come on — you know what we’re talking about here! All those personal photos, MP3s, YouTube videos, recipes from the Food Network, National Geographic wallpapers, long forgotten software downloads, … need I go on?)

There is a separate set of challenges to initially identify existing content that one would want to target with a GSA that has, after all, a set file limit. And then one has to work out practical policies and protocols for how to handle new content to be added to those targets. In upcoming posts, we will document how LSNC has approached both of these challenges.

But for now, here is a macro breakdown of what content we value and are initially targeting with the GSA. It is actually more simple to do than we initially thought it would be:

  • Designated document repository master directory structures – that’s a mouthful, but it turns out that’s how we refer to it. We have worked out what we consider to be a basic, workable “taxonomy” for organizing files, to be detailed in an upcoming post. The short version is that both existing and new content that has been identified as valued will reside on project-specific files servers that have purposefully organized directory structures. This will make more sense once we explain (fairly soon) why we are adopting the structures or organizations we have worked out, and why, and how they will serve the overarching goal of “findability.” Stay tuned.
  • Shared intranet content – within LSNC, we refer to our intranet as the “secured network,” the lingua franca here for what other organizations refer to as their intranet. At this juncture, most legal services programs have some sort of intranet structure already in place, with varied user-side implementations to give staff access to its content. (Currently, ours is built out with MediaWiki as the principal content management tool, but soon to be supplanted with either WordPress and/or Google Sites. (I have posted details on that side story at LSNC’s tech blog, Webdogs 2.0.) By historical definition, everything on our existing intranet is valued. It’s fairly lean, mean, to the point, well organized and includes among other things, in no particular order:
    • Administrative manual
    • Case management manual
    • Development and funding-raising resources
    • LSC policy archive
    • LSNC forms (administrative and case-related)
    • LSNC policy archive
    • MCLE – Training resources and forms
    • Personnel and other shared human resource information
    • Specialized Regional Counsel content (content subject to gatekeeper function)
    • Specialized client content (content targeted for LawHelp access)
  • Select LSNC public web content – LSNC is now reaping dramatic benefits from its decade-long focus on using its public web presence to create and share usable content for advocates. We are still in the process of parsing out those portions of the LSNC public content we want to target with the GSA, but these include our rich reservoir of advocate content on CalWorks (the name of California’s TANF program) and Food Stamps, and special project-specific content that derives from our Race Equity Project and housing and economic development work. The point here is that our enterprise search model will include not just valued content behind our firewall but also select public content that is every bit as valuable to our staff in getting the job done.
  • Pika Case Management System – this will likely be the last piece of the enterprise search puzzle for us, but a major chunk of our GSA file limit will be devoted to exploiting the GSA to alter dramatically how LSNC staff search and locate data within Pika. We have already run some initial targeting tests on Pika and we really, really liked what the search results looked like. It is not a technical challenge to target Pika with a GSA, not at all, but there are some significant challenges in sorting out how best to limit the GSA crawl to target precisely what we really want to make searchable, without blowing out our GSA file limit. Once we work out those kinks, we will likely replace the native Pika search functions (which is little more than a raw SQL search function) with a customized subset of GSA functions.

In the scheme of this project, content is king, knowledge content rules, and the Google Search Appliance is Gandalf, the wizard asking “What do you see? Can you see anything?” Indeed.

  • Wednesday
  • July 30
  • 2008

Enterprise Search: Stating the case for a legal services field program

Whatever you do, please don’t call it a “brief bank.”

Language choices have powerful effects, so it does matter what one calls things, to good or ill effect. And for some 40 years legal services field programs have sought the holy grail of a “brief bank.” Having worked in five different field programs and two support centers in five states over 35+ years, I can personally attest that every one of those organizations thought they had or wanted or envisioned or aspired in some way to a “brief bank.” As if.

There are legions of reasons why, in practice, the brief-bank model never really works for most field programs. Among those are program management and resource priorities that obstruct it or at least don’t value it; lack of a commonly understood and shared purpose among its target users (you know, those pesky “employees”) why it matters to have such a model; and an impractical — or at least poorly designed — approach to creating and maintaining the model (you know, like, no one is really responsible to make it happen and/or actually find the time or resources to maintain the damn thing, whatever form it takes).

Within the legal services community, the notion of a “brief bank” long ago morphed into something akin to a vestigial organ: Not entirely useless or without function, but pretty much something no longer used as it once was. If ever it was. And even by its own self-referential term as a “bank,” one gets the message that this is a model for something that one does not actually use on a daily or regular basis. Rather, things of apparent value are placed there for storage, for safe-keeping, for later retrieval but for good reason not readily accessible because they must be secured. You can count on it being there. You can bank on it.

Actually, you cannot. Because the real purpose for which it exists, more often than not, is typically useless. The old-school model “brief bank” was a collection of hard-copy documents stored in your individual office file cabinet (or that pile of folders over there, in the corner of your office); or down the hall somewhere in a different cabinet maybe maintained by someone else (or in a pile of folders that the “someone” would label and organize “by the end of the week”). On a good day (OK, on a really good day), you or someone else could remember which document was about what and where it was located. On most days, not so much. And with the emergence in the last 20 years of the digital-document work style to which we are now accustomed, the “brief bank” has become a case or project folder on your local or a shared network drive. You know, something like our Auburn Office shared directory, in all its indigenous glory:

Example of a shared drive

Surely, this is an advocate’s digital paradise, right, all there but for the taking? … if you can remember what is there … where it is … and find it. (“Oh, what you’re looking for is in a different office? I’ll get back to you.”) And that’s one of our smaller offices. (However, you’ve got to love the use of caps here, sort of a poor person’s metadata model for attributing value to some files.) I thought to illustrate here the four times as large, charmingly nuanced (née dystopic) horizontal and vertical structure of our flagship Sacramento Office, but it was too vast in dimension to use as a visual example. But you get the point.

I must admit, I cringed a touch when reading the fifth “Purpose Served: Knowledge Management” element in LSC’s recent recommendations on baseline technologies for legal services field programs. Stating “what should be in place,” it invokes “pleading and brief banks” as its primary concrete paradigm. As I was saying, language is powerful and apparently the choice of this concrete terminology in the more abstract context of “knowledge management” has not changed. It should.

The concrete challenge for legal services program is not to create a “pleading and brief bank.” The challenge is to identify and organize and manage and make “findable” a wide range of documents and other files that have shared value within the organization. (“Sample pleadings and briefs” are only one piece of that paradigm.) Within that larger framework, the LSC baseline technology recommendation regarding the need for knowledge management is right on the money.

And Legal Services of Northern California (LSNC) is a typical example of this challenge within the post-merger world of legal services. The structural scale and geographic reach of and substantive range of advocacy by LSNC exacerbates a fundamental dilemma all modern legal services field programs suffer: How does one make it fast, easy and intuitive for program staff to find and access all the different types of “knowledge content” within the four walls of the organization?

Within LSNC’s organizational structure there is a wide range of substantive advocacy and administrative expertise, specialization and skill sets, all of which are sources for shared information and knowledge. By “information” I mean that the organization has a variety of documents and other digital data types — most commonly, these are word processing files, PDF documents, spreadsheets, presentation files, HTML pages and client databases — that have content the organization perceives as valued and useful. By “knowledge” I mean that the information exists in a context that offers understanding. A “usable” document offers the promise of shared knowledge because it brings understanding of the information it contains from one person to another.

But there’s the rub: What does a non-profit organization like LSNC do to bring to the surface the usable knowledge of all, i.e., all the specifically identified and valued, usable content wherever it exists within the limits of the organization that can and should be shared and available to other LSNC staff? That is the core question the Findability Project will attempt to answer in a practical way that works for a legal services field program.

The LSNC approach is to build a network infrastructure that supports enterprise search, premised on deployment of a Google Search Appliance. It is also premised on thrashing out practical ways to identify, organize and maintain the valued documents and other files that will be the target of enterprise search. It also premised, as importantly, on figuring out as “user friendly” a way as we can to ensure LSNC staff use the system, want to use the system, know why they would want to use the system … to find what they need.

Hence, the Findability Project.

  • Monday
  • October 8
  • 2007

Explicating the LSNC Findability Project

The day I got on a plane for an extended trip to Europe, Legal Services of Northern California (LSNC) was notified it had been awarded funding for its 2007 Technology Innovation Grant (TIG) application. As it turns out, I was pretty much the last person to know. I spent the month of September without a phone or email or other contact with the office, and so I only heard about the TIG thing and many other developments once I got back to Sacramento.

There was a brief summary of all this year’s TIG awards released on September 12 by LSC but the description there of our project is, well . . . neither apt nor accurate. For those interested, we actually call it the LSNC “Findability Project” and its core purpose is to create a program-wide, highly user-friendly, enterprise-level knowledge-content management system. Here is how we stated the technological challenge in the first three paragraphs of our TIG application:

The structural scale and geographic reach of and substantive range of advocacy by Legal Services of Northern California exacerbates a fundamental dilemma all legal services field programs suffer: How does one make it fast, easy and intuitive for program staff to find and access all the different types of “knowledge content” within the four walls of the organization.

Within this organizational structure there is a wide range of substantive advocacy and administrative expertise, specialization and skill sets, all of which are sources for shared information and knowledge. By “information” we mean that the organization has a variety of documents and other digital data types—word processing files, databases, text files, PDF files, spreadsheets, XML files, presentations, images, video, and so on—that have content the organization perceives as valued and useful; by “knowledge” we mean that the information exists in a context that offers understanding. Random words identified in a document are only pieces of data; words in a document that makes sense and has apparent value is a document that has purpose and is useful; a “usable” document offers the promise of shared knowledge because it brings understanding of the information it contains from one person to another. And knowledge content that can be shared throughout the organization promises that the clients are inherently better served. (The converse is obviously an impossible case to make, i.e., the less the staff knows the better the clients are served.)

But there’s the rub: What does the organization do to assure its staff can find all the knowledge content that should be available to them? Why is this is a practical and technological challenge for LSNC? And what does this have to do with more effectively serving its clients?

So, how do we propose to do this? Our approach has three core components:

Hardware and Software Infrastructure

At its heart, the Project will be built on the enterprise-level Google Search Appliance. This will provide the core hardware and software infrastructure for a single, secure, unified tool for accessing all the usable institutional content available to LSNC staff from any internal or external location. By design, it will enable LSNC users to exploit all the familiar features of Google-based search technology to locate with exceptional relevancy any and all types of knowledge content wherever it may be within any and all organizational zones defined by LSNC. Once implemented, it will enable users to search for up to a million content records within the system. Additional layers of the knowledge content system will be the design and implementation of Google One-Box modules tailored for retrieval and interpretation of particular types of data most commonly valued and useful within the legal services work environment, plus integration of select Google APIs that fit into the larger project goals, including Google Analytics.

Standardized Methodologies and Protocols for Managing the Knowledge Content

Partnering with GSA technical experts, LSNC will work aggressively to formulate and finalize standardized methodologies and protocols for management of all the organization’s valued “knowledge content.” This process will include assessment of a range of techniques and practices to enhance and optimize search results for users of the system: institutional protocols and standards for identifying and tagging knowledge content; effective use of metadata; record naming conventions; vertical and hierarchical organization of data; and so on.

Project Transparency

The LSNC Findability Project will be a public project. LSNC will create a public web-based workspace to document in detail: the planning process for the Findability Project; identify and evaluate resources for the larger legal services community on searchability in general and the Google Search Appliance in particular; and create technical and tutorial content so that those who are interested can more readily understand and replicate the Project. This public aspect of the Project will provide a highly practical way for the LSC and others in the legal services community to monitor and evaluate the Project, i.e., see what is planned, what choices were made and why, how things were designed, what is the pertinent technical information implicated by the Project, what works and what doesn’t, and what has been accomplished.

Hey, we’ve been down this road before. The Pika people will remember our initial foray into this approach toward tech project transparency three years ago with Project Claire: Redesigning Pika, where we put it all out on the table to see what we were doing with Pika implementation at LSNC. So, we’re going to create another Project-specific web development site where you can follow the progress and all the nitty-gritty detail of the LSNC Findability Project and perhaps learn a few things here and there about using modern search technologies to get the job done. Call it the good, the bad and the ugly but whatever it is we’re going to share it all with you as we work our way through it over the 18-month life of the project.

Why do it this way? We’re Webdogs. It’s what we’re all about.

  • Sunday
  • June 3
  • 2007

NYT, QDF and other Google search mysteries

God, do I ever love the New York Times. Most of my days start with Starbucks and my home-delivered copy of the NYT, and Sundays are always the best of all days because there is always something wonderful to read. Today is no exception. For my fellow geekmeisters, here’s a must-read from today’s edition: Google Keeps Tweaking Its Search Engine, a long-form article with predictable bits about the Google corporate culture but with an emphasis on the mindsets of Google search engineers and their prime directive to make search better. Don’t know what QDF is? Read on.

  • Wednesday
  • May 16
  • 2007

Getting ready for Google 2.0

Hardly a day goes by without something new being announced about the growing Google web industrial complex, which increasingly is the central presence for the mainstream web user. And the tech lists we are all on will be weighing in the next few days about the Google push toward more apparent, more usable vertical search. And you’re not likely to find a better article from a better source than Search Engine Land‘s substantial post today about the coming changes: Google 2.0: Google Universal Search. Plus, the site includes a spiffy speed-date laying out it all out in Google’s New Navigational Links: An Illustrated Guide. (Go ahead. Kick yourself again for not buying at $85 a share.)

  • Tuesday
  • March 6
  • 2007

Justia beta Federal Register search site

Justia has announced a public beta of its Federal Register Rules and Regulations site, a (for now) free, specialized search engine for locating recent Federal Register notices, rules and proposed rules. A quick spin around the block with the optional “full text” search suggests that feature is not as helpful as it should be. As far as I can tell, there is no real search syntax. Every keyword entered seems to be treated as having an AND connector, regardless of other search syntax conventions, e.g., putting the words in quotes, inserting a plus or minus sign, etc. The keywords you do enter, however, are shown bold in the search results, which is helpful. This new search site has other features that are more helpful and better implemented: You can drill down by federal agency and sub-agency (e.g., Department of Agriculture -> Food and Nutrition Service) and then do a search for recently published items within that particular agency subset, within a specific set of dates, and limited to only notices, rules and/or proposed rules. Plus, once you find the type of search results that you want, you can then subscribe to that particular search result using the RSS feed displayed with the search result. That is, if you can get the feed to work. When I clicked on the RSS feed button for this sample search of recently published Food Stamp rules (illustrated below), the feed results opened up in a FireFox window, but I couldn’t get the same feed to work either in FeedDemon or as FireFox Live Bookmark. Your feed mileage may vary.

UPDATE:

Here’s a sample search of recently published Food Stamp rules, illustrated below.

Within minutes of my initial post, I heard from Nicholas Moline, a programmer at Justia, who quite properly corrected me on how the keyword connectors work, and provided other clarification about how its beta Federal Register search works: “We are still working on the full text search to bring in the abilities to put in your own search criteria, however I did want to let you know that it does indeed use AND, not OR if you enter in multiple words. We do however have simple English stemming on words, so your search for food stamps also brought in stamp, and stamping, which appear in the first several documents in your search result. I am hoping to have phrase capabilities and search criteria implemented soon.”

Nicholas also fixed the apparent feed problem I reported, and now it seems to be working just fine both in FeedDemon and FireFox Live Bookmarks. Hence, the strike outs, above. You gotta give ‘em props for jumpin’ on it immediately!

  • Monday
  • February 5
  • 2007

Googling California attorneys

As posted earlier to a broader audience at the LSNC main website, the Webdogs have created a very simple custom button for the Google Toolbar so that you can use the search box in the toolbar to do a direct name search for California attorneys at the California State Bar site. For our readers in California (or anyone else who wants to try it), if you have a current version of the toolbar installed, just click on Add California Attorney Search Button, follow the prompts and there you have it. To try it out, do a search for “Tony Joe Whi …,” uh, “Anthony Gilbert White.” Folks in California will appreciate the utilitiy of this, since so many of us use the California State Bar site as an attorney “contacts” address book.

This was pretty easy to do by just following Google’s handy-dandy, step-by-step Guide to Making Custom Buttons for Google Toolbar 4. The guide refers to Internet Explorer, but it works with Firefox as well now that the latest version of the toolbar for Firefox supports custom buttons. The “hardest” thing involved is actually not hard at all, assuming you know how to create a custom icon image. The guide provides a link to a site where you can encode your custom icon into ASCII text using base64 encoding.

  • Saturday
  • January 20
  • 2007

THOMAS beta and “findability”

For both technological and broader advocate “usability” reasons, it may be of interest to take a look at Peggy Garvin’s recent article, The Government Domain: Testing the THOMAS Beta at LLRX. The THOMAS site has, since its inception, been a virtual leviathan and among the most widely used federal government sites within the legal services community. It is notable and of practical consequence that THOMAS has made available a beta version of significant changes on the horizon with how it alters user access to its vast sources of legislative and other legal information. Garvin highlights a number of those changes, including a change in the underlying search software, a move towards more unified, single search functions (further evidence of how the Google global search paradigm increasingly impacts approaches toward search usability), an apparent move away from separate or categorized keynumber-specific searches (e.g, searching by a specific bill number), and changes in the presentation of search results and search navigation. As the age of findability matures, there is likely to be more and more articles of this kind as findability becomes a more widely understood but challenging element of good web design and web application development. Ambient Findability is part of the current canon on these sort of issues.