Friday, February 27, 2009

Sometimes it's the simple things

Sometimes the solution is so obvious:

Func<Document, bool> predicate = a => a.FileDate >start_date && a.FileDate <end_date;

BuildAndAddRows(table, predicate);

//inside the function

var query = (from d in this.DocumentList orderby d.VehicleName select d).Where(predicate).Select(a => new { a.VehicleName, a.VehicleID });

Thursday, February 26, 2009

Searching for Bibby Fisjer

I am ramping up to start our search project again. The search engine we are using is a 3rd party tool, so most of the work of the development team is going to lie in coding the front end.

But there is much more involved that just coding to an API. There is the actual business implementation of the search as well as the migration from our old system to the new one.

I plan to document as much of this process as possible here as this project should provide me if not the reader with hours of enjoyment. Here is problem #1:

Jargon


The following refers only to navigational searching, not full-text or keyword searching
Suppose there exists a standardized list of industries which the majority of large businesses use to catalog customers. Let's call it an ISO Industry List. For argument sake, let's say this list has 5 industries listed:
1 - Finance
2 - Tourism
3 - Healthcare
4 - Technology
5 - Manufacturing
These are common industries which are likely to exist into the foreseeable future. They are distinct enough so that two will not merge and become a hybrid, even if aspects of one incorporates features of another.

Let us also asuume a company exists which prefers to use its own jargon to label documents.* For whatever reason, whether valid or not, the ISO codes are not acceptable to this company and so they start creating their own list of industries. Let us assume the decision to do so was motivated by two main reasons:
1 - The ISO codes do not divide sectors appropriate for the business
2 - They just don't like the names in the ISO List.

As a result the company begins producing a list of its own based on what it perceives as industries pertinent to its business. Not having the resources to survey and create their own ISO type list, people being just adding to the list industries they think are valid. They create first pass:
1 - Banking
2 - Tourism
3 - Healthcare
4 - Medical Devices
5 - Manufacturing

For now this appears to do the trick. Then the company gets a new client which does not fit into this list and a new category is create:
6 - Private Equity
And then another:
7- Biotechnology
And another:
8- Pharmaceuticals

But then someone things that a client actually belons in two industries and creates:
9 - Biotech & Pharma.

Each of these categories has been attached to a document in our search engine. But before we get to the actualy documents, let's look how we already create a mess.

Industries are not atomistic


In our list of industries, even before we get to #9, we have created a list of industries which are neither atomistic nor reside on the same level of any reasonable heirarch. Ideally a list like the one above should have each leaf of the tree on the same level. Biotechnology is far more specific than Manufacturing. Creating this disjunction leads to confusion when adding new items since the person adding the new category is uncertain as to the level of specificity they should use.

Ever Changing Titles


Having this confusion leads to categories which more reflect jargon than concrete types. The list starts to grow each time a new category is added. The more industries are added the harder it becomes to determine which are valid and which are invalid. Adding industries for corner cases leads us to create industires for only one maybe two clients.

The documents are tainted


All of this above would not be such an issue if this list were not actually ties to document searching. Creating a list based on such whim creates the following issues when trying to search on the documents:
1 - If the jargon industry is changed, the document must be changed and re-indexed. In a system of thousands of documents there is a large overhead in terms of management and time.
2 - If the jargon industry represents corner cases of clients, the likelihood that someone will search on the word is slim. Although the precision on such a search is high, the recall for the document is low. In addition, documents are classified as different levels of a tree. When recall is high the specifity of the document as it relates to the search may not be clear.

Is tagging hierarchy the answer?


One might think the way to resolve the above issue is to tag each document with a heirarchy. But, besides the overhead of creating that much more data to index it does not solve the underlying problem of non-atomistic industries. A hierarchy can suffer from all of the pitfalls of a jargon induced list.

A way out


One way to solve this issue is to use the ISO list to tag documents. In fact that is the only way out, short of creating a new list formed in the same manner and with the same rigid standards as ISO. What we can then do is create a translation dictionary to translate jargon industries to ISO industries at search time. This allows us to maintain a certain sense of identity to the user while preserving the integrity of the documents. We can then create heirarchies either using the ISO lists or by creating our own without affecting the location of that document or having to re-index. We use these hierarchies to direct the search, but use the levels of the hierarchy to do the actual searching.
This solution solves the two reasons for creating one's own list above. first we can divide the industries by using our custom mapping diciotnary which will allow us to rename the categories. However, at the base level we retain our rigid atomistic separation. What we gain hoever is much more valuable. By using a standardized set of industries we are much less likely to end up with corner cases. An ISO type list has already gone through the rigid scrutiny to a level most businesses can't.

But how can we map the categories?


All of the above sounds great in theory, but the technical part is not so clear. Here is what I propose. Using an ISO list of industries and their hierarchies, we tag each document with the highest level and the lowest level that document refers to. For example:
Document #1
Title: Manufacturing in a Port Modern Era
High Industry: Manufacturing
Low Industry: Manufacturing
This document deals with general aspects of manufacturing so we tag the highest level and the lowest level the same.

Document #2
Title: Computer Chip Processing
High Industry: Manufacturing
Low Industry: Computer Components
This document deals with a specific type of manufacturing so we label it as this level

With our mapping dicitonary we can map over "Chip Processing" to "Computer Components" if all the computer components we ever deal with are microchips. If the company changes direction, we can expand our dictionary or shrink it as necessary. The dictionary allows for many to many relationships and can be used to expand or shrink our search as well. Again we must remember that this is for directed navigational searching and not key word searching. Whereas keywords and full-text searching is a clouded nebulous search, navigational searches should be uniform and atomistic.


*a document is any piece of content we wish to add to our search system

Day or Reckoning

The Ideal World


The time has come yet again for our annual review cycle. Once a year we have a 360 degree review of peers and managers in hopes of making us all better people. I would say in hope of making better workers, but you will shortly see why this is not the case. At least not for me.

To start with, let me say that in an ideal world, annual reviews are a great thing. Sometimes we get so wrapped up in our work and we fail to see the bigger picture. Sometimes we continue to make the same mistakes or fall into the same patterns because that is what people do. A little heads up a little light on the path is a good thing.

In addition, being able to give feedback on/to your manager is also a good thing. Managers as well as employees fall into the same ruts and mediocre decisions as everyone. Peers are the same way. We can relate issues to peers but at review time we have the opportuntity to track overall trends. If one person tells you something it might be a fluke, but if the consensus comes back with that same issue, you might actually need to reconsider how you do things.

Reviews also provide opportunity for meritocracy to shine through compensation reviews and bonuses.

And all this is great isn't it?

The Real World


But this is not where I live. And if I had to venture a guess, most of the world doesn't live here either.

Let me revisit the points I just made but in the opposite order.

Compensation


Regardless of the state of the current economy, reviews in most companies arrive at the time of cost of living adjustments, not raises. For many reasons, most of them petty and bogus, real raises based on performance can't be given out. Why?
1 - Everyone will know when someone makes more in a current cycle. This causes resentment. People will wat to know who sabotaged them. Even if they are a horrible employee.
2 - Often managers just don't have the ability to determine % increase.
3 - Managers will attempt to be fair offering everyone the same %.

Manager Input


People also aren't going to tell their manager if something is wrong. People leave jobs, they usually don't try to work things out. Persistant and permeating institutionalized issues are not the something the average employee has the will to take on. Instead the the whole thing becomes a farce. After all we all can't be Erin Brokovich.

Peer input


Either they like you or they don't. There is no such thing as an objective review. What can I say about my colleague when I have never seen their work. As a developer most of our value is in the code we write and the applicaitons we produce, but since I neither see the code nor use the application, I am at best guessing.

My Case


And here is the rub and the totality of it all. In may case, there is no review of the actualy work I have done in the last year. No one looks at my code, reviews my scheduling, the level of bugs in my code, etc. No mention of working late, or my sick days. No code reviews.

So what does the annual review become? It becomes a personaliy test.

The review takes on the feel of a counselling session much more than how to become a better developer. Without the proper infrastructure in place, reviewers cling to the only thing they can - how you make them feel. What makes it much worse on the peer side that so many people need to provide input that have either no idea what you do, or have never actually seen your work. Our teams are so small and separated from business users, the actual human interaction performance (the measure of the review at this point) is not actually contributed to by people I have worked with.

Many people I have talked to have been told they need to participate more in company social events, but very few, if any, have been told they need to start brushing up on their skills.

In this senario, those who are more vocal and social are then promoted with greater responsibilities than those who actually perform. This does nothing for morale since those who are upset with the outcome usually don't stick around for the next one.

Monday, February 23, 2009

Monday, February 9, 2009

I AM NOT A BOBBLEHEAD

A Dark and Dreary Winter Day


Corporate coders don't take ownership. The one thing that allows them to remain stoic, well obtuse, is the fact that each line of code is like the one before it, the one after it and every other one in their repository. There is no context, no concern. They just don't care.

Today I found out officially that I am not longer on a project that I did a ton of work for a little over a year ago. This annoys me not only because of the time invested, but also because this was a project I had a lot of passion for, had a educational background in, and was more than just fitting data into grids.

Why am I annoyed


The paragraph aboves gives good indication as to why this annoyes me on one very personal level, but it exposes a larger issue here:

We as developers are treated as totally and uniformly interchangeable

At some level, each developer should understand the basics and the standards of their company. But at a deeper level we are each individuals who have different interests,strengths and weaknesses. Failing to recognize these traits in your developers will only lead to disatisfaction and demoralization of the very people who create great applications. But the failure to recognize the individuality of the developers and then failing to rcognize the fallout is maddening.

But these things happen when the 80/20 rule has 80% corporate coders and 20% individual contributors.

I can't document that...


It is one thing to do a handoff of a project that is basically data in and data out. It is an entirely different thing to hand off concepts and understanding. This project required me to understanding search engines in general, our new product in particular. These concepts are deeper than just settings and flags in a UI. In addition I worked on creating API calls for custom front end searching, paging, and navigation. Python had to be learned, so that dictionaries could be compiled and merged at indexing time. These are not just step one, two, three type of operations. There has to be a careful overview and understanding as a whole, not only of the technology, but also of human use of language. And this is just the search side, forget about talking about querying the search engine.

Repetition and waste


Two of my biggerst pet peeves are having to repeat myself and negligent waste. Trying to teach someone what took me several months to learn and assimilate, which was assimilated over a certain educational background, is nearly impossible. More time would be spent re-iterating all that had happened than would be needed to actually develop the applicaiton.

But surely developer one can come right in and take over the work of developer 2 right? But now since developer one is no longer on the project, he has to help developer 2. But if the goal was to keep developer one on another project, what is the point? His time will still be spent on the old project just in a different role.

Does any of this make any sense?

....this is so incoherent.....

Friday, February 6, 2009

Lost in Abstraction

There are a million and one post as to what makes a good programmer/developer. Knowing how to find answers is one of those top criteria. I used to think (though shallowly) that being able to use Google was a good place to start judging. The more I thought about how we use tools, however led me to believe that there are two underlying competencies: Abstraction and Translation

Abstraction


Making objects and classes and decoupling and all the jazz is fine once you get to the stage of actually designing an application. That it not the abstraction I am talking about. The abstraction I refer to is the ability to pull out general principles from specific instances. For example, when you search for something in Google which does not appear in the first two pages, how confident are you that you can find what you are looking for? Are you able to abstract out the bigger picture and re-scope your search based on your initial findings or do you give up? As someone without a degree in computer science, I am often at a loss for the precise technical word for a general set of concepts.

Intelligence is being able to traverse the tree up and down in terms of generality and specificity. In essence to be able to cut away what is unnecessary, take what is left and reapply the core to something else. If all of this is too vague, let me give an example.

Let’s say have a co-worker. I mean, I do have a co-worker, but this is just an example. We have a meeting to discuss how we should implement a business rule. If your first reaction is to talk about web control functionality, you are starting on the wrong level. During one such meeting a co-worker asked “How are they going to key in the values?”. This might seem like it has nothing to do with abstraction, but it does.

Some people are so tied to a technology, or what is in front of them that they are unable to talk about the abstract

The reason why this skill is important is because it allows you to interact with the second necessary competency for quality development which is Translation.

Translation


I purposely chose the word translation over interpretation for this very reason:

Every translation is also an interpretation, but every interpretation does not include a translation

Translation is a more complicated competency. Here is a simple example. Looking at a photo of Man Ray’s, you can interpret the photo and ascribe to it a meaning wholly of your own. It can be totally internal without reference to the creator, sometimes with little reference to the work itself. Translation requires that one’s interpretation takes place within a framework that has a verifiable connection to the origin of the phenomenon. While there is seldom, if ever, a right or wrong translation, there are better and worse translation. We then settle on the best translation available. Referring to my earlier post: Translation is an ethics.

In development, translation occurs on several levels which is why it is vital to quality development. Translation occurs first in the gathering business requirements and then turning those requirements into business rules. As with translating natural human languages, there is seldom a one to one ratio of input to possible outputs. We take requirements and filter out the sediment to examine what are the essential rules.

Those business rules then become translated again into models and architecture. Good database design is based on translating actual business needs into properly designed schemas. Classes are then used to translate rules and data into applications. I realize this is all very simplified, but the concept is still there.

Put Together


Abstraction and translation then allow us together to move away from previous experiences (i.e. previous applications, meetings,etc) to gain insight on the current project. Quality developers know when to stop abstracting out, they know when they have reached proper abstraction and are then able to translate and integrate that abstraction with the current application parameters. Frameworks are an example of how this is put to work. The .Net framework is built basically on the most common set of classes. Microsoft has abstracted out the majority of functions necessary to program applications and has translated/compiled them into a framework. While some may argue that there is not enough abstracted out, that again is subjective argument, based on better or worse translations. Both C# and VB are translation of the very same basic concepts.

So now what?


As developers we need to develop these skills constantly. It is easy to become comfortable and stop abstracting and translating. In the worst case scenario we rebuild the same application over and over again. We make the same mistakes because we never take a higher view of our work. In less noticeable ways, we succumb to boredom and malaise by becoming lazy at re-examination of our work. If we consider our work done as soon as it works we will eventually end up in the worst case scenario. Once something actually works we need to re-examine what we did to make it work, see if it can be refactored for better performance and see how the whole works together. What can be applied to the next application, what can be used to maintain existing applications.

For me, this all comes before being able to write the fanciest hashtable.