A good development system will permit many approaches to the design of an application or service. Some of these approaches will be flexible, some will be rigid; some stable (even in the face of change), and others brittle and prone to failure.
What makes the difference between systems that come together easily working so well, and systems that seem to fight back with their authors, never coming together without a fight? Why is it that the one approach can work so well in one tool, but be a proverbial pain in another?
There is for any development system a preferred way of operation - a natural way of using the system that makes it easier to get results. Work with the flow of the tool, and everything comes together. Work against it, and it falls apart. This approach comes from the consistency (or not) behind the language and it’s supporting systems. More than that, it stems from the vision (or lack of it) held by the architects of the language and those supporting systems.
We can call this preferred approach The One True Way. (Note that the one true way will be different for different development systems).
Once you know and understand the underlying paradigm of a system, it becomes much easier to find the resources you need - and easier to fit them into place once found.
Knowing the One True Way for a tool is one of the characteristics that separates the communities gurus from those who are merely competent. What makes life interesting for ones career as a software developer is that the best approach for one tool is often not so good for another.
In this article, we will explore some of the common elements of the One Way as it applies to various tools and see how common approaches from one system can fall apart in another. We will also examine the consequences of each preferred technique and perhaps learn when we should adhere to convention, and when we should depart for good reasons.
To begin with, let us look at how some different development environments approach a simple problem - when one class needs to wrap a collection of instances.
Problem: Container Classes
You have a class that, amongst other things, acts as a container for a number of other instances - say, an Order containing OrderLines - how do you provide controlled access to those objects for access by client code.
The Best of Breed solution for this common situation can be quite different, depending on the development culture you’re working within.
Let’s look at the way this gets solved in three different environments:
- Delphi;
- Java;
- Smalltalk.
A Delphi Solution
An experienced Delphi developer would do the following: Create an indexed property to give access to the instances as though indexing into an array, together with an integer property giving the count of instances.
Client code may then iterate over the instances using a simple for loop, indexing directly into the main object as though it were a container:
To simplify client code further, the designer of the original class can opt to mark the indexed property as "default". The client code may then read:
A Java Solution
I’m told that an experienced Java developer will solve this in a slightly different way: provide a factory method that returns a List instance containing the OrderLines in order.
Client code may then do anything it likes with the returned list.
A Smalltalk Solution
According to Kent Beck’s book, Smalltalk Best Practice Patterns, the Smalltalk solution would be to Implement the do method from the collection protocol.
This allows clients to perform an action on each item without having to handle the iteration themselves. In effect, client code treats the main container as though it was itself a collection.
Delphi Consequences
The Delphi approach gives easy to understand code, with a simple implementation as well, at the expense of being reliant on the list not changing in the midst of the loop. Among other things, this means that the construct is not thread safe - if the list may be modified by another thread, this client code runs the risk of accessing items that have been removed, or of skipping new items.
As should be clear, accessing items that have been removed from the list can have effects ranging from irritating (showing the wrong information to the user) to catastrophic (a crash).
Java Consequences
In Java, the returned list of instances requires a local variable for storage to avoid making (and discarding) multiple copies of the list.
The time taken to instantiate the return list is (at best) proportional to the number of items in the list. This is acceptable if all the items in the list will be used, but if only a few (or one) will be accessed, this becomes very inefficient.
The up side of this approach is its inherent thread safety - as long as the list is generated once in a thread safe way, the client code is insulated from concern - and the instances in the list won’t be garbage collected until the list itself is discarded.
A Hash table Example
In whatever language you solve this problem, there is one significant issue to be addressed. The use of linear indexing (as into a one dimensional list) implies that it is possible (and sensible) to reduce the data structure into an ordered collection of items. In many cases this is correct. In some cases reducing the data structure to this form can be expensive in time or space, and it often imposes an artificial structure that doesn’t really fit.
An example of this would be a conventional Hash Table. By design, a hash table has no native notion of order, yet to provide indexed access as though the hash table were an array, a sequence needs to be imposed.
One danger of exposing a hash table through array access is that client code can become dependent on the ordering, even though that ordering is artificial.
For example, an observant coder might notice that items added to the hash table later show up later in the list, and thus draw the incorrect conclusion that the ordering reflects the order that items were originally added. The coder may even choose to write code that depends on this behaviour.
Code that relies upon this would almost certainly pass testing, and perhaps work well in production for a time. As soon as the flawed assumption changes (say, after a system upgrade or on a different version of the virtual machine or the operating system) the code would break, probably in ways that produce odd side effects.
Forcing a linear, iterative approach to traversing the list is not the only way to solve the problem.
Iterators
Smalltalk, takes a different approach. Instead of exposing external access to the list of instances, the standard approach is to provide an iterator method that accepts a block as a parameter.
A block is an anonymous method, passed as a parameter. They may be simulated in Delphi and the current version of C# by placing the relevant code in a named method and passing the reference. Support for them is supposed to be included in the next edition of C#.
This moves the problem of iteration out of the client code and into the instance containing the items to be iterated. Problems of thread safety and how to implement the iteration then need to be solved only once, with the iterator method available for all clients to use.
Better yet, there is no need to expose an artificial ordering to client code, and a reduced risk that badly written client code could come to depend on that ordering.
When Paradigms go Bad
Consider what might happen if a Delphi developer naively used her regular convention when writing some Java code. The result might look like this:
While any experience Java (or C#) developer would instantly take issue with this code, the nature of the problem isn’t obvious to the inexperienced. When tested, the code would run fine - in fact, the code will work well even in many production environments.
There is almost certainly production Java code suffering this flaw.
The key problem is that every time the list is accessed, a full list containing all the items is constructed, used for a single item and then discarded for the garbage collector to gather later.
In other words, instead of the list being created once and then iterated over, a separate copy of the list is created every time the list is accessed. Since this happens at least twice per loop (once for the item and once for the count), the total workload is extremely high.
Assuming that the method that returns the list is as efficient as possible, it will return the new list in a time proportional to the list length - that is, an O(n) method. Invoked twice for every item in the list, we end up with an O(n^2) method that works slowly, consumes a lot of memory and creates a very large amount of debris for the garbage collector to sweep up behind it.
Conclusion
Merely knowing the syntax of a language may be enough to get past a recruiter and may be sufficient to effectively maintain an existing system. It isn’t enough for the long term, however.
Every development system has a specific style - a common approach that, more often than not, won’t be explicitly written down. Instead, you’ll need to pick it up by reading other peoples code, lots of it, to discover the common thread through it all.
If you want to call yourself a professional developer in a language, you need to know that languages underlying paradigms well.
Comments
blog comments powered by Disqus