—Michael Feather's ideas on unconditional code.
I’ve recently come across some comments by Michael Feathers relating to unconditional code. The basic idea is that that code containing lots of if-statements or nested if-statements should be viewed with suspicion. He extends this suspicion to switch-statements and loops.
It’s an incredibly powerful idea.
One concept he discusses to eliminate if-statements in a function is to look at error conditions and consider expanding the domain of the function to eliminate the error condition. I mentioned this to a couple of collegues and both asked what that means. I assume that domain refers to the function inputs (just like the mathematical definition of a function over a domain and range).
If I apply this meaning to a function the is seemingly nonmathematical in nature what can I come up with?
Here’s some Python code from Pandas that creates a Pandas data frame from a CSV file.
pandas.read_csv(file_handle, usecols = [ 'Index', 'Column One' ], index_col = [ 'Index' ], parse_dates = [ 'Index' ])
What’s interesting about this function is that it will accept CSV file containing only the header row (defined by usecols
).
The data frame created by this read_cvs
under these conditions is empty.
Nothing unreasonable about this.
If my application uses this function and goes off to compute something it needs to handle the case when no data is present gracefully. If my application just counted rows if data, I could achieve this by producing a count of zero. The point being that zero is as reasonable an answer as one, two or three if this is the number of rows present.
The problem with this example is that it’s not clear I’ve extended the domain of the count function.
If my application computes the average value of a column in this file then things get more interesting. Using the above example, I get a count of 0 rows and my data value is undefined (because it doesn’t exist).
Pandas deals with this very gracefully.
Run read_csv
on a CSV file containing only the header column and you can use pandas.isna
on a column value to determine if the value is unavailable.
My average calculation now needs to be aware of this situation and we can check for this situation using a count or pandas.isna
.
This awareness is likely to come in the form of an if-statement.
This brings me back to the initial question of how to extend the domain of my average calculator in the absence of any column values. In Michael’s example, he introduces the concept of a null object that participates in a computation but essentially does nothing.
In my average calculator, my average function should understand and operate on empty data frames by returning the equivalent of pandas.isna == True
for the array.
If it does, then the domain of the average function includes the any numeric value and “NA”.
My application doesn’t contain an if-statement for this extended domain.
It lives somewhere and that’s ok.
The point Michael is making, is that lots of if-statements are generally a bad sign. The total elimination of them is not.
Check out the Anti-If Campaign.