Friday, April 22, 2005

Another post intended for the future-me, but I don't mind at all if you read it, gentle reader! Yet it is long, and ultimately but an expression of happiness at having found some joy with programming, for the first time in a long time!

The latest assignment for software has provided an excellent opportunity to try out test-driven development. I started off well enough, writing a decent (if excessively verbose) set of tests. A little background on the project - it is meant to be an implementation of a mathematical set. Fairly interesting, and it seems reasonably doable to boot. I learnt an important lesson, and that's that it's good to design your test suite too! I had the impression that I could get away with hacking it together, but aside from the fact that it's "good practice" to have clean design, there are the obvious benefits that come with clean code; primarily among them, reduced repetition and updating.

I could have been fancy and used std::map to help automate the tests completely, but I decided that it would be simpler and quicker to just hand code the tests and the desired output. It seemed easy enough, so I rushed in and made a few tests for equality and subset testing. It all worked, and I was happy. However, there were a few problems; chief among them was that I was having to initialize the set like so:


int elements [] = { 0, 1, 2, 3, 4 };
IntSet set (elements, 5);


As you can see, I was forcing myself to manually specify the length of the array. This could be overcome by using the sizeof (elements) / sizeof (elements [0]) trick, but it still didn't seem "right". Since arrays are evil, I thought there must be an elegant way to do it with a vector instead. No luck, because a vector can be initialized "as-is" with an array: so,


IntSet set (std::vector <int> ({ 0, 1, 2, 3, 4 }));


doesn't compile.

It dawned on me that an easier way would be to read in data from a file, as then I could read directly into a vector and then everything would be automated, and I wouldn't have to specify array lengths and muddy up the code. But I plodded on anyway, with the (very dangerous) thought that it would be too much effort to change. This is usually a very bad mistake to make! There are too many places to name where this very style of thinking is rejected completely, but reading and rationalizing is never quite enough for me. It took me a few hours to see that I was digging myself into a hole, because there seemed to be no end to the amount of repetition in the code. The turning point was when I had made a mistake in one of the statements which led to an obscure bug with one of the tests. Much toil and sweat later, I decided it was time to refactor.

So, last evening, I spent a good hour or so trying to get the tests to be read in from a file. In the process, I realized that this allowed me to be far more flexible with my tests - it became trivial to add new tests, and maintaining them was again a piece of cake. Previously, I was comparing each set to the very first set in the test method; I thought then I would code up separate tests for separate first sets. Terrible idea! With a test file, it took a few minutes to make the code able to read in a pair of sets, and perform the needed comparison. All of a sudden, the tests became much more robust! But I'm jumping ahead of myself, because there was another interesting problem to tackle - how could there be a generic method that worked for all the operator comparisons? I needed the code to work for equality (==), less than equals (<=), and so on - the basic idea was the same, but a different operator needed to be invoked in each case. "Pity you can't pass in an operator", I said to myself, before suddenly remembering the documents I'd read on SGI's STL reference about functors. I was overjoyed, because thanks to std::equal_to and friends, I could effectively pass in an operator as a functor! At this point, I recalled the older days of programming, the unbridled joy to be found when discovering something new. Here, I was realizing how something I'd seen so many times before could be of practical use and massively reduce the work needed.

I was quite satisfied with the progress made, and so I was prepared to move on and test more operators. Again, I came across a design flaw that I should have paid more attention to at the start - all my code was built under the assumption that the return type of the operators being tested was a boolean. When it came to set unions and what have you, the code had no chance of working. You'd think by now I'd learnt the art of patient design, but I was on a roll, and thought to myself "This can't be too hard to fix". Bold words! I quickly wrote up something that would work where the result was another set. It all seemed to compile, and I thought by now that I must be done.

But this time I decided not to let repetition go unnoticed; the code was full of it now, since the method for a boolean comparison and a set comparison were nearly identical, except for a few lines. Essentially, there are two things that are different. First, that we must read in the data from the file differently - namely, we should make sure that we read into a set, not to a boolean. Secondly, we should invoke the appropriate comparisons. Similar behaviour, yet different types..ring a bell?

Templates! I'd used them before, but not very seriously, definitely not with anything of any practical use. So it was rough going at first, trying to make it work. It seemed first that I needed a template of templates: sounds nasty, but it isn't really. Essentially, I needed to make sure that the invoked function could be called either for a pair of sets and a bool, or for a pair of sets and a set. "Seems easy enough", I said, and tried to do this in a few minutes. But I came across a brick wall - since we can't overload based on return types, my code didn't know which function to invoke in order to read the test data. I had originally given them the names getBoolData () and getSetData () (not very informative names, I know), and I obviously needed some way of making them have the same name if I were to invoke a different one based on type. However, the prototype for both methods was identical, except for the different return type, so the compiler couldn't decide which one to choose.

At this point I thought to myself "Let's look at what's different with these two functions". They were virtually identical, except that they have different ways of converting the result of the test-operation (equality would convert it to bool, since the result of equality is a bool). Common functionality everywhere else, so I was pleased with myself and thought "Easy enough, let's move the conversion code outside and merge both methods together". Sounds good, but I ran across the same problem - the comparison code was also being overloaded based on type! So again, the compiler was at odds to choose the right function.

Faced with the bleak prospect of giving in to mindless repetition, I flipped through Stroustrup's book, hoping to find some clues to help me. I came across template specialization, and it struck me that it could be useful - define a generic conversion function like template <class T> convert (const string &);. Then define specializations for bool and IntSet. It seemed easy enough to code, but I was skeptical as to whether this would work. But it did! The now generic getData () function was able to call convert <T>, and the appropriate type was filled in at compile time! Very nice!

Once this was done, the code looked much neater. It worked seamlessly for comparisons returning either booleans or sets, and there was no further redundant updating required. All this effort just for a test mechanism, yes, but it was a valuable learning experience. And, to boot, I can save myself the future worry of a buggy program, thanks to an automated suite!

One might be tempted to say that the proof really lies in the final mark, but I disagree. I found this to be a very valuable learning experience, and that, I think, is what matters more, even if it's very easy to be led otherwise.

There are probably lots of errors in the code fragments, most of them are off the top of my head.

No comments: