Corporate Coder: It's all a numbers game

At what level does tagging a document in a hierarchy become meaningless. A break down of our term usages ranges from using the term "United States" 585 times which is 2.178 percent of our total documents to "Fund Strategy - SWF" used once for 0.004 percent. Now, this is all the terms against all the documents. This list of 1639 terms is usually broken into several hierarchies. Classification can range from 20 percent to less than a hundreth of a percent.

What is tragic and comic all in once is that in the tagging for case information the most frequent tagging is "No Tag" with 23% for capability hierarchy and 24% for industry hierarchy.

If such a high percentage of documents can go untagged, is it then necesary to tag cases at the opposite extreme. There is one case tagged "Enterprise ASP" for 0.003 percent. What makes it worse is that if such a tag is used so infrequently, what is the likelihood that someone will A-Search for the term or B-Know what the abbreviation is even for.

If these were internal use only taxonomies used to maintain directories and file manipulation, then such organization might matter. Unfortunately, these are customer facing choices and complexities.

What is getting confused is a taxonomical search vs keyword search. At the level where terms are arbitrarily added onto documents, you have moved from a purposeful taxonomy into a scattered keyword. While it does not matter how you find the document from a user's perspective, different strategies should be used for tagging data when filling out taxonomies vs keywords. We expect rigidity in taxonomies and fluidity in keywords. Right now we have neither.

Corporate Coder

Wednesday, March 11, 2009

It's all a numbers game

No comments:

Post a Comment

Followers

Blog Archive

About Me