November 5, 2018

Using Entropy to Measure Software Maturity

  —A look at using information theory to measure software.

In Using Entropy to Measure Software Maturity, Kamiel Wanrooij explores a metric based upon changed files as a measure of software maturity. Kamiel calls this entropy and uses Shannon’s information theoretic definition. I applaud Kamiel’s desire to create a metric to measure software change.

Kamiel’s thesis is that software files can be viewed as providing information regarding the ability of the software to support change. That is, a project comprising \(n\) files contains \(\log_{2}{n}\) bits of information. This model ignores the changes within a file. It relies on a measure of the number of changed files.

The utility of this measure is limited. In C and C++, the file is the unit of compilation. I’ve worked in environments where we’ve adhered to a function per file rule and where all related elements get added to the same file. And environments with an annual update of the copyright, which affects every file in the project. Your mileage will vary considerably depending upon how you use files.

The value of this measure lies in the charts wherein the magnitude of the files changed is presented. You can monitor this and investigate the peaks and valleys. (Peaks for the reasons discussed in the article; valleys because someone might add a lot to a few files.) The use of a logarithmic scale needs to be weighed against a count–perhaps you want the peaks and valleys amplified.

Files changed is a response to the change being made to the project. In statistics, a response depends upon one or more explanatory variables. The article points out that coupling is an explanatory variable.

The implication that high coupling causes a lot of files to change. It might. It might not. It depends upon how files are used in the project.

The best rationalization for using Kamiel’s proposal as a software metric is that it works best in an environment where the following are true.

  1. Files and entities of interest positively correlate. For example, you enforce a rule of one function/method/class per file.
  2. There is a tendency to increase the number of files as the structure of the project unfolds. For example, you enforce a rule where refactoring adheres to the principles mentioned in Kamiel’s article.
  3. There are mitigations in the process to prevent adding a lot of changes to a single file. Such changes would represent a valley in the charts and the logarithmic scale would diminish this.

In all, I think the metric proposed by Kamiel has value but needs to be acknowledged as a response. Understanding, nay controlling, the explanatory variables driving the response is paramount to getting high value from this measure. Even then, directly measuring the explanatory variables is better.

The Reddit discussion on this article. I agree with the notion that averaging over time is a poorer choice than using a histogram or heatmap to measure change. Using a heatmap has the advantage of highlighting areas where change is consistently high.

October 26, 2018

Documenting Architecture Decisions

  —Lightweight Approach to Documenting Architecture

In Visual Architecting Keynote, Ruth Malan shares a keynote she presented as part of the 2017 Linda M. Northrop Software Architecture Award. I find the “trace” for Ruth’s talk very helpful to understand the context of her slides. The slides are worth the time to review. Just chasing down the references turned up a rich set of insights for me.

Ruth does a nice job of conveying the notion of architecture. She quotes Grady Booch “Architecture represents the significant design decisions that shape a system.” and “Significance is measured by the cost of change.”. Marcel Weiher points out this echos the criteria used to decompose systems into modules.

The modularizations are intended to describe the design decisions which must be made before the work on independent modules can begin. … the intention is to describe all “system level” decisions (i.e., decisions which affect more than one module).

Significant cost of change implies decisions affecting multiple modules.

The Architecture Design Record (ADR) to the rescue? The ADR is a decision template proposed by Michael Nygard to capture design decisions. The template is based upon the Alexandrian Form used in pattern literatue. Michael Nygard’s article includes hints on a workflow for the decision template. Comments in the article link to this SATURN 2017 Talk which provides insight on their use and examples.

During the SATURN Talk, the presenter mentions that when ADR was first employed, is seemed like everything was in an ADR. A combination of goverence and workflow helped make the ADR effective. Workflow changes included using ADRs during reviews and making sure that they were top-of-mind during design. It also includes a little history on the notion of ADR.

October 20, 2018

Vagrant Boxes on CentOS

  —Using Vagrant boxes effectively.

I’ve been working with Vagrant for sometime Experiments with Packer and Vagrant (Fini) now. One effective practice I’ve developed is the construction of Vagrant boxes for specific applications. A couple of examples: I’ve created Vagrant boxes for Review Board and Cpython.

I use the Cpython Vagrant box to build and examine Cpython source code. I’ve found that the ability to automate the construction of the box and their configuration to build Cpython is handy.

No rocket science here, just a pragmatic approach to constructing environments for specific purposes.

My Vagrant boxes are available on GitHub.

September 27, 2018

Resources for Exceptions in .NET

  —Exceptions in .NET and C#

Some resources for exception handling in .NET:

ADVANCED EXCEPTIONS IN .NET nice, recent introduction to the exeption handling mechanisms in the CLR. References Handling and throwing exceptions in .NET.

ECMA C# and Common Language Infrastructure Standards. Exceptions are discussed in:

  • ECMA-334 defines the standard exception classes and when they are thrown. It also includes a description of the try-catch mechanism used by the language.
  • ECMA-335 defines the exception handling model used by the CLR.

Collecting User-Mode Dumps defines a registry key to enable application dumps.

Application Recovery and Restart defines a mechanism for collecting application information on unhandled exceptions. It is intended for C and C++ developers.

Capturing unhandled exceptions in a mixed native/CLR environment discusses how to manage unhandled exceptions in the CLR and native code.

Tools for Exploring .NET Internalsr: curated list of resources for debugging .NET applications.

September 21, 2018

Global Warming and Climate Change

  —What's the difference between global warming and climate change?

During a conversation about the weather with a colleague I mentioned global warming. They immediately corrected me, stating we were experiencing climate change.

Since that conversation, I’ve worried about my understanding of these terms and gotten a glimpse how words can shape our thinking.

By shifting the conversation from global warming to climate change my colleague attempt to reframe the conversation in a way that presented recent weather patterns as a natural phenomena.

The difference between these two terms is embedded in whether humanity has some responsibility for the change in climate.

Turning to Wikipedia:

Climate change is a change in the statistical distribution of weather patterns when that change lasts for an extended period of time (i.e., decades to millions of years). Climate change may refer to a change in average weather conditions, or in the time variation of weather around longer-term average conditions (i.e., more or fewer extreme weather events). Climate change is caused by factors such as biotic processes, variations in solar radiation received by Earth, plate tectonics, and volcanic eruptions. Certain human activities have also been identified as significant causes of recent climate change, often referred to as global warming.

It looks like global warming is attributed to human activity and contributes to climate change, but climate change is not entirely the result of human activity.

In my colleague’s mind, the temperature increases over the last 18 months are a change in average and longer-term average weather conditions.

In effect, I associate new weather patterns with human activity. My colleague much less so (perhaps, not at all).

I don’t know what scares me more:

  • accepting that humans can’t influence global warming, or
  • that people think these trends are a natural phenomena.

Only time will tell.