January 2, 2019

The Problem You Solve Is More Important Than The Code You Write

  —Recognition that the less code you write the better.

In The Problem You Solve Is More Important Than The Code You Write, Fagner Brack reminds us that not writing code is a better solution. Fagner’s thesis is that choosing what to write is secondary to the problem you solve.

He writes:

There’s a difference between encapsulation of complex logic and abstraction of useful knowledge. Sometimes, information should be made explicit to be comprehensible. If you abstract them, they can have the opposite effect and be harder to understand.

Abstraction as a tool of obsfucation.

This is an elegant framing of a problem I keep running into: When do you stop introducing abstraction? Fagner provides the best answer I’ve seen to this question: don’t introduce an abstraction if it obscures important information. Or impedes understanding.

A corollary, to Fagner’s thesis then might be that the abstraction employed to create an implementation must draw from and not obscure domain knowledge.

December 23, 2018

What is Good Design?

  —What characteristics make up a good design?

I’m trying to improve the design activities conducted by my software team. In the course of making these improvements I began to look for ways to better characterize design. This is important to help align the team around the objective of improving our designs–how do we know good design when we see it?

My initial attempt to characterize design involved reading Christopher Alexander’s Notes on Synthesis and Form. In this book, design has to consider fit (or the context in which it is employed). This implies that design solves a problem whose contraints and use help define it.

Robert C. Martin provides a definiton for bad design in The Dependency Inversion Principle. In defining this principle Robert says:

A piece of software that fulfills its requirements and yet exhibits any of the following traits has a bad design.

  1. It is hard to change because every change affects too many other parts of the system. (Rigidity)
  2. When you make a change, unexpected parts of the system break. (Fragility)
  3. It is hard to reuse in another application because it cannot be disentangled from the current application. (Immobility)

What causes bad design?

  • Rigidity: A system exhibits rigidity if there is heavy interdependence between system components.

  • Fragility: A system exhibits fraglity if it has the tendency to break in many places when a single change is made. Often the new problems are in areas that have no conceptual relationship with the area that was changed.

  • Immobility: A design is immobile when desirable parts of the design are highly dependent upon other details that are not desired. The separation of undesired parts is usually expensive.

I’d argue that in this definition, the functional requirements are met but the non-functional requirements are not. Furthermore, the causes of bad design are all high coupling between system components.

So far, the best characterization I’ve got is good design must solve a problem and the solution to that problem is context (or fit dependent). Bad designs are highly coupled.

This implies that the definition of good design might lie in the direction of defining how to create loosely coupled elements that meet both functional and non-functional requirements.

December 17, 2018

A Simple Pythonic Context Manager

  —A context manager exploiting Python's _with_ statement.

I recently came across the question How do I “cd” in Python? on Stack Overflow. It provides an easy to understand example and use case for writing code that leverage’s Python’s with statement.

Python’s with statement is defined in PEP 343 – The “with” Statement. It introduces two methods __enter__() and __exit__() that are invoked upon entry and exit of the with statement. PEP 343 was written in 2005.

If you haven’t read PEP 343 the discussion on the semantics of with is worth your time.

Other resources:

December 4, 2018

Observability: Deliver Reliable Software Faster

  —A look at how to deliver reliable software.

In Observability: Deliver Reliable Software Faster, Marcelo Boeira asks “How does one ensure code works the way it was design to?”. Indeed.

He looks to control theory for an answer.

Control theory in control systems engineering deals with the control of continuously operating dynamical systems in engineered processes and machines. The objective is to develop a control model for controlling such systems using a control action in an optimum manner without delay or overshoot and ensuring control stability.  — Wikipedia

And focuses on observability:

In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.  — Wikipedia

The focus of observability is to introduce machinery to monitor system performance during use. This has a distinct advantage over tests, which are confined to the build environment. I can buy this.

The application Marcelo has in mind requires apriori knowledge of user behavior. I base this on the motivating example for observability. In this example, a “collector” collects information relevant to the feature “my users need be able to search at any time”.

The collector is used to collect data based upon the observable outcome of having this need fulfilled. That is, the collector monitors search times and frequency and leads to the discovery that “my users usually search 10 times per second from 8am to 10pm”. The collector drives the construction of alarms and triggers for when behaviour lies outside of the expected norms. A recipe is provided to expand upon how to develop the collector and triggers.

But what might an implmentation look like? An answer to this question lies in Capturing and enhancing in situ system observability for failure detection. This paper describes a system called Panorama. Panorama is a system for

detecting complex production failures by enhancing observability (a measure of how well component’s internal states can be inferred from their external interactions) … when a component becomes unhealthy, the issue is likely observable through its effects on the execution of some, if not all, other components.

The only components considered are processes and threads. Components may be observers and subjects. An observer reports status on a subject.

The authors of Capturing and enhancing in situ system observability for failure detection focus on detecting unhealthy systems using the client and caller perspective. These perspectives are critical in detecting gray failures:

a system is defined to experience gray failure when at least one app makes the observation that the system is unhealthy, but the observer observes that the system is healthy.

The advantage of Panorama is it’s use of aspect-oriented programming to create perspectives between client and callers to detect gray failures in components. The focus on gray states and not just clear failed states and the focus on detection is critical.

Can you argue that Panorama doesn’t require apriori knowledge? I think so. Panorama uses a bounded-look-back majority algorithm to determine the health of a system. The description is in the paper but it uses current status to determine health. It moves Marcelo’s notion of a number of searches during a specific period of time to one of did a search succeed when requested. That’s a better position overall because you don’t have to worry about changes in user behaviour.

Am I being too harsh that apriori knowledge of user behaviour is a dangerous criteria to use to develop triggers and alerts? Maybe. The application Marcelo has in mind way embody a truth regarding how many searches during a specified time of day. I’m skeptical that its a better solution even it that’s true.

The patterns of observability is a worthwhile section to read. It has applications outside of Panorama.

November 24, 2018

Domain Decomposition

  —It's a what changes together stays together world-view.

In Documenting Architecture Decisions I look at Ruth Malan’s 2017 Linda M. Northrop Software Architecture Award keynote. In this article, I take another look at Ruth’s keynote with a focus on the Big Ball of Mud. The theme in Ruth’s keynote is the notion that architecture embodies a set of decisions whose importance reflects the cost of change.

Ruth references an article by Bjørn Einar Bjartnes titled Undoing the harm of layers. Bjørn makes an argument for avoiding technology-oriented decomposition in favor of domain-oriented decomposition.

A technology-oriented decomposition manifests as having your top-level architecture devoid of domain concepts. This type of decomposition spreads domain concepts across different technologies instead of localizing them like you’d expect. This increases the number of modules to change when you update a domain concept.

Bjørn mentions the criteria used to decompose systems into modules:

… it is almost always incorrect to begin the decomposition of a system into modules on the basis of a flowchart. We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.

Bjørn sumarizes this as

what-changes-together-lives-together

A flowchart refers to a structured programming technique. Flowchart Techniques for Structured Programming was published in 1972, a year after Parnas’ paper. I assume Parnas was warning not to decompose systems using the process flow depicted in a flowchart. The process flow used to solve a problem doesn’t organize the solution using the structure of the domain. Your architecture should reflect the structure of the domain.