—Sometimes a person's reasoning is flawed. Sometimes that affects a whole team.
I’m astounded that a memory leak was permitted to exist for several years in a product I work on. Furthermore, the history of this leak included discussion, capture, diagnosis and tests to ensure the size didn’t change. Fixing it took four days.
To put this in perspective, the leak consumes 0.0875% of system memory. The event triggering the leak need occur 300 times to bring the system down. The trigger is the result of a user initiated operation.
Okay. These things happen. What I didn’t count on was the response to the rediscovery of this issue. I guess I should have expected the response given the history of this issue. Still it was weird.
Which brings me to the logical fallacy about the reasoning that led to this leak’s long life.
Several arguments were made on why we shouldn’t fix the leak.
Cost: we should work on something “more important”. A reflection of the time this leak existed but not the effort that had been put into managing it thus far.
Use case: no user would ever trigger so many of these events. An arrogant or naive opinion on how real users use the system but no recognition of the lack of concrete data to support this.
Availability: without an availability requirement we can evaluate the trade-off. A variation of the cost argument veiled by a requirements issue (e.g., we don’t know if we need to fix it).
Okay. These are rational arguments on some level. They are irrational on many.
First, cost doesn’t make any sense unless we know the cost to fix the leak. The cost to discover and rediscover this issue was much larger than the cost of correction.
Second, the use case isn’t helpful unless we know how every user uses the system. We don’t.
Third, availability doesn’t make any sense unless we have some way to reason about the second argument.
The biggest issue is that memory is a shared resource. You can’t reason about the memory usage of one component without considering the others.
Reasoning about the behaviour of memory allocation for a system component in isolation is a base rate fallacy. You can’t make a probability judgment based on conditional probabilities, without taking into account the effect of prior probabilities.
Since we don’t know the prior probabilities of events we can’t make a judgement about the impact of this leak on users. We really have no choice except to fix it.
My stakeholders will rejoice.
For another look at the fallacy of reasoning about bugs in isolation: Code Matters.