—When I watch the team interact during stand up I'm dismayed by what I see. It's like I'm looking at two different groups of people. In reality, it's the same people in different situations.

The software team and I have been struggling with how to improve our stand up.
One challenge was improving communication.
This challenge involves creating an environment where the team is not stiffled by outsiders.

When I watch the team interact I’m happy with what I see.
I see a team that’s engaged in achieving its objectives and heavily embracing collaboration.
When I watch the team interact during stand up I’m dismayed by what I see.
It’s like I’m looking at two different groups of people.
In reality, it’s the same people in different situations.
One appears comfortable.
The other not so much.

My team hates their stand up.
As their manager, it’s painful to watch.
As I watch, I am reminded of awful circumstances.

I’ve long acknowledged that the outside presence is too heavy for this team.
It smothers and stiffles them.
The best characterization I can give is that the team is a collection of gentle souls.
The heaviness comes from the weight of opinion and direction from outside the team.
For some reason, the team collapses in the face of this onslaught.
For me, it’s a failure of the Manager and the Scrum Master.

My challenge as the manager is to inject a change into the stand up that achieves two goals.

Doesn’t censor people.

Provides the team with breathing space.

We’ve played with the notion of asking for huddles up front.
That didn’t work because the team didn’t execute.
This makes the situation complicated, because it implies that

they don’t need the huddles.

they don’t understand how to use the huddles.

The solution to address the challenges and to flesh out the huddle is to force the huddle.
The idea is to use the huddle to provide the team a way to extract themselves to make a decision without outside influence.
Basically, I’m going to trust my instincts and whenever I percieve the team being pushed around I will tell, not ask, for a huddle and assign someone to communicate the decision.

It’s the best that I can do.
It’s also the smallest change I can make to enable better outcomes.

I was looking for a simple, pragmatic example on how to inject logging into an application and came across this solution.
Game Programming Patterns provides a collection of design patterns for, well, game programming.
I’m not a game programmer, but I found the writing style engaging and the the information pragmatic.

On the writting style:

But the architecture this brilliant code hung from was often an afterthought.
They were so focused on features that organization went overlooked.
Coupling was rife between modules.
New features were often bolted onto the codebase wherever they could be made to fit.
To my disillusioned eyes, it looked like many programmers, if they ever cracked open Design Patterns at all, never got past Singleton.

It goes on to describe how this brilliant code base contained gems that were difficult to unearth because the programmers creating the work had obscured things.
They obscured things because of schedule pressure.
This sentiment here reflects “Awful Code. Awful Circumstances.”.

Then there is the Service Locator pattern:

Provide a global point of access to a service without coupling users to the concrete class that implements it.

2010: Service Locator is an Anti-Pattern describes an abstract service locator as an improvement, but
still a poor idea.
The pattern in the book provides a solution for the runtime issues described here.

—The Python code behind the Revising the Pareto Chart

This Juypter notebook reproduces the experiment in Revising the Pareto Chart. It includes the source code for all of the graphs in that article and the Monte Carlo algorithm used.

It’s important to note that what appears here and in Revising the Pareto Chart may not accurately reproduce the experiments described in Wilkinson’s original paper.

What appears in my articles are my attempt to reproduce Wilkinson’s work. Deviations between Wilkinson’s results and my own are noted.

Defect Types

There are 6 defect types.
The nature of these defects is irrelevant.
It is sufficient that the categories be identified.

The number of trials is key to reproducing the experiment. You get different results for $5000$, $20000$ and $40000$ trials.

An open question regarding the original paper and these experiments is what does “a few thousand trials” mean. I suspect that the original paper used closer to 5,000 trials. These experiments and the results in my artical show that the order statistics change through the $30000$ to $350000$ trials.

num_trials=5000

Sample Generation

For each sample, randomly assign the defects to each category.

defgenerate_sample():"""Obtain a single sample."""returndefect_distribution_per_trial.rvs()[0]

Two collections of trials are retained. Sorted trials mimics the algorithm in the paper. Trials takes a look at what happens if the results are not sorted according to the Pareto method.

The value of sorted and unsorted trials lies in observing the banding that occurs in the sorted trials. This banding reflects the Pareto distribution and is critical to reproducing the experiement.

The value of charting the unsorted frequency lies in ensuring that the sorting the trails reproduces the Pareto distribution required by the experiement.

importmatplotlib.pyplotaspltcategories=pd.DataFrame(trial,columns=defect_types)categories.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Frequency by Category (Unsorted)',rot=0)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x11b4721d0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11bf13c50>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11bed4eb8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11bf561d0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11bf824a8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11be6c780>],
dtype=object)

Frequency By Category (Sorted)

The value of charting the sorted frequency is the insight it provides on the effect of sorting the data in each trial.

sorted_categories=pd.DataFrame(sorted_trial,columns=defect_types)sorted_categories.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Frequency by Category (Sorted)',rot=0)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x11be6c9e8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11baec908>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b8fa630>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b9f47f0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b90c748>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b94b4a8>],
dtype=object)

Order Statistics

Wilkinson describes the requirement for order statistics to stablize. The paper doesn’t describe the use of any order statistics, except upper and lower frequency bounds. I explore several different order statitics in this section.

minimum_statistic.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Minimum Frequency by Category',rot=0)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x11b8fbc18>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b56add8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b8c16a0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b6a9668>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b75c630>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b8015f8>],
dtype=object)

Maximum Order Statistic

maximum_statistic.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Maximum Frequency by Category',rot=0)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x11b7e4cc0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b6ec828>,
<matplotlib.axes._subplots.AxesSubplot object at 0x116bc9a58>,
<matplotlib.axes._subplots.AxesSubplot object at 0x116bf1d30>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11ac9f048>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11b56f320>],
dtype=object)

Median Order Statistic

median_statistic.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Median Frequency by Category',rot=0)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x11b8fd5c0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c340278>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c365518>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c38d7f0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c3b4ac8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c3dcda0>],
dtype=object)

Range Order Statistic

range_statistic.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Frequency Range by Category',rot=0)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x11c5357b8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c57b8d0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c5a1ba8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c5c9e80>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c5fb198>,
<matplotlib.axes._subplots.AxesSubplot object at 0x11c624470>],
dtype=object)

Histogram by Category

The value of charting a histogram by category is that it shows the distribution of the sorted data and permits the exploration of whether this sorting matches a normal distribution.

Determination of whether this is a normal distribution is important to understanding if the calculation of the confidence intervals for a percentile should use the normal distribution’s $Z$ value.

num_bins=num_defect_typesfig,ax=plt.subplots(nrows=3,ncols=2,sharex=True,sharey=True)fig.suptitle('Histogram of Categories')fig.subplots_adjust(hspace=0.6)foriinrange(0,num_defect_types):mu=round(sorted_categories.T.iloc[i].mean())std=round(sorted_categories.T.iloc[i].std())col=i%2row=i//2n,bins,patches=ax[row,col].hist(sorted_categories.T.iloc[i],num_bins,density=1)xmin,xmax=plt.xlim()y=np.linspace(xmin,xmax,(num_bins+1)**2)# Selected points to create rounded curve.
p=norm.pdf(y,mu,std)ax[row,col].plot(y,p,'--')ax[row,col].set_xlabel('Category {}'.format(i+1))ax[row,col].set_title('$\mu={:10.0f}$, $\sigma={:10.0f}$'.format(mu,std))ax[row,col].set_ylabel('Probability Density')

Confidence Interval for a Percentile

I chose to interpret the $\alpha = 0.05$ and the calculation of the frequency bounds $\frac{1}{\alpha}$ and $1 - \frac{1}{\alpha}$ using the confidence interval for a percentile. Both frequency bounds are calculated using the high and low percentiles derived from this $\alpha$.

This shows only that the required percentiles are 95% likely to fall into the calculated interval. It does not mean that these percentiles will always fall into this interval.

lower_boundary_statistics=pd.DataFrame(index=defect_types)upper_boundary_statistics=pd.DataFrame(index=defect_types)foriinrange(1,num_trials):sorted_observations=np.sort(sorted_categories.head(i),axis=0)R_lower=max(0,round(i*lower_quartile-Z*math.sqrt(i*lower_quartile*(1-lower_quartile)))-1)R_upper=min(i-1,round(i*upper_quartile+Z*math.sqrt(i*upper_quartile*(1-upper_quartile)))-1)lower_boundary_statistics[i]=sorted_observations[R_lower,:]upper_boundary_statistics[i]=sorted_observations[R_upper,:]lower_boundary_statistics.T.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Lower Bounds by Category',rot=0)upper_boundary_statistics.T.plot(kind='line',subplots=True,sharex=True,sharey=True,legend=False,title='Upper Bounds by Category',rot=0)R_lower=max(0,round(num_trials*lower_quartile-Z*math.sqrt(num_trials*lower_quartile*(1-lower_quartile)))-1)R_upper=min(num_trials-1,round(num_trials*upper_quartile+Z*math.sqrt(num_trials*upper_quartile*(1-upper_quartile)))-1)acceptance_levels=pd.DataFrame.from_dict({'data':[17,9,5,4,3,2],'lower':list(sorted_observations[R_lower,:]),'upper':list(sorted_observations[R_upper,:]-sorted_observations[R_lower,:])},orient='index',)acceptance_levels.columns=defect_typesplt.show()

—A look at the statistics underlying the Pareto Chart and how to improve it.

In Data Measurement and Analysis for Software Engineers, I go through Part 4 of a talk by Dennis J. Frailey.
Part 4 covers statistical techniques for data analysis that are well suited for software.
A number of references provided in Dennis’ talks are valuable for understanding measurement:
A Look at Scales of Measurement explores the use of ordinal scales of defect severity.
Part 4 makes reference to Revising the Pareto Chart, a paper by Leland Wilkinson.
Wilkinson’s revision includes introducing acceptance intervals to address his concerns with the Pareto Chart.

The Pareto Chart

Joseph Juran proposed the Pareto Chart on the left and then updated it to the diagram on the right.

Both versions are in Juran’s Quality Handbook.
Both charts are from Wilkinson’s paper.
Neither chart includes acceptance intervals but the one on the right eliminates dual scales plots and better conveys intent.

Improving the Pareto Chart

Wilkinson proposes improving the Pareto chart by introducing acceptance intervals.
An acceptance interval is derived from the assumption that all categories are equally probable before sorting the data by frequency.
This interval is generated using the Monte Carlo Method.

The creation of a Pareto Chart involves sorting of unordered categorical data.
A comparison to sorted but equally probably occurrences of unordered categorical data provides a way to identify outliers.
This comparision is designed to challenge the notion that the most frequently occurring issue is the one to address.
This is important when the costs of remedies are not uniform.

Wilkinson wanted to acheive two goals in revising the Pareto chart.

To point out there is no point in attacking the vital few failures unless the data fits a Pareto or similar power distribution.

To improve the Pareto chart so that acceptance intervals could be incorporated.

Wilkinson provides a way to compare the data distribution of the Pareto against a random distribution to identify when the data does not fits a Pareto distribution.
The lack of fit provides motivation to dig a little deeper into the data and hopefully improve decision making.

What follows is an exploration of Wilkinson’s approach, including a Jupyter Notebook that I believe faithfully reproduces Wilkinson’s experiments.

Reproducing the Results

Wilkinson’s thesis challenges the wisdom of attacking Category 1 if frequencies are random and costs are not uniform.
It seeks to remedy this by introducing acceptance intervals derived from assuming that all categories are equally probable before sorting.

From the paper:

Recall that the motivation for the Pareto chart was Juran’s interpretation of Pareto’s rule, that a small percent of defect types causes most defects.
To identify this situation, Juran sorted the Pareto chart by frequency.
Post-hoc sorting of unordered categories by frequency induces a downward trend on random data, however, so care is needed in interpreting heights of bars.

Wilkinson provides a recipe for reproducing his results.
His results provide an opportunity to explore how to employ different statistics to generate acceptance intervals.
What follows is how I reproduced these results.

Use a multinomial distribution with \(n = 40\) and \(\mbox{p-values} = \{ 1/6, 1/6, 1/6, 1/6, 1/6, 1/6 \}\) for each category.
The identical p-value ensure all categories are equally probable.

In this case, the multinomial is a random variable \(X\), such that \(X = x_{i}, \forall i, 1 \ge i \le 6\).

For \(N = 40000\) trials, generate a multinomial \(X_{i}, \forall i, 1 \ge i \le N\) and sort each trial from the highest to lowest value.
This ensures the Pareto ordering is imposed upon the random data generated during each trial.

In my experiment, the order statistics changed up to the \(30000^{th}\) to \(35000^{th}\) trial.
My experiments were with 20,000 and 40,000 trials.
The charts below capture the results of 40,000 trials.

Order each trial from smallest to largest.
This simulates a Pareto distribution using random data.

Calculate the lower \(1 - \frac{\alpha}{2}\) and upper \(\frac{\alpha}{2}\) frequencies, where \(\alpha = 0.05\) and \(Z = 1.96\).

A non-parametric way of generating these frequencies uses the formula \(Y = N \pm Z \times \sqrt{N \times p \times (1 - p)}\),
where \(p = \frac{\alpha}{2}\).
In this case, Y is the position of the \(Y^{th}\) value in the ordered trials created in the previous step.

Importantly, this step identifies the \(95^{th}\) percentile using a \(95\%\) confidence interval.
This is not a prediction interval for future events.
It shows only where the \(95^{th}\) percentile is likely to lie.

(Category 1 is the top subplot.)
The interesting artifact in the frequency by category chart is the banding introduced by sorting the multinomial in each trial.
This sorting applies the Pareto method to random data.

By comparison, the unsorted distribution:

The unsorted distribution has bands that are of equal range for every category.
The sorted distribution has the effect of limiting bands in each category to a much narrower range.

Order statistics include minimum and maximum frequency along with range and median.
It’s unclear whether these statistics were used to ensure the Monte Carlo simulation stablized.
It’s possible that only the upper and lower frequencies were used.

These charts compute their statistic for the \(n^{th}\) trial of each category.
This computation includes all trials from \(1\) through \(n, n \le N\).
Similarly, the \(n + 1^{st}\) trial includes all trials from \(1\) through \(n + 1\).

The minimum value of all trials up to and including the \(n^{th}\) trial.

The maximum value of all trials up to and including the \(n^{th}\) trial.

The median of all trials up to and including the \(n^{th}\) trial.
It is \(Y_{\frac{n + 1}{2}}\) if \(n\) is odd; \(\frac{1}{2}(Y_{\frac{n}{2}} + Y_{\frac{n + 1}{2}})\) otherwise.

The range is \(max - min\) of all trials up to and including the \(n^{th}\) trial.

The objective of the study was to stablize the upper and lower frequencies.
These frequencies match those in the original paper, implying this method has successfully reproduced the one used in the paper.

The paper says only a “few thousand” Monte Carlo simulations are required to compute the frequency bounds.
Indeed, these frequencies stablize at less than 5,000 trials for most of the Monte Carlo runs.
An exception exists for Category 3’s lower bound which changes between the \(17500^{th}\) and \(30000^{th}\) trial.

The upper frequency bound stablizes before \(5000\) trials.

The stabilization of the upper and lower bounds happens just after the \(30000^{th}\) trial.
This may be what Wilkinson refers to when he says the statisitc stabilizes in a few thousand trials.
I don’t have a better explanation for the explanation of a “few thousand trials”.

In all, the use of so many order statistics may not be necessary as the results imply that the frequency bounds are the focus.
An interesting question is whether the use of the non-parametric statistics for these bounds is appropriate.

Non-Parametric Statistics

Is the non-parametric approach for calculating bounds meaningful?
The formula used above works well if the frequency of each trial for each category has a normal distribution.
To see this I plotted a histogram for each category.

These distributions look good.
Some of the misalignment of the normal is due to rounding.

Acceptance Levels

What do the acceptance levels look like for these results in comparision to those Wilkinson achieved?

The original acceptance intervals from the paper, with 40 defects allocated to 6 categories.

The experimental version below.

Looks like a good reproduction of the original paper.

References

Some excellent explanations of the Monte Carlo method.

A database migration involves executing queries to alter the database structure of a live application so the schema contains what the application expects.
It’s useful for application upgrades (and downgrades).

Imagine you have an application, App and that App version 1 (v1) is deployed.
When v2 is deployed, there are two important use cases to handle.
Assume that v2 contains database schema changes (e.g., v2 might require a table column that doesn’t exist in v1).

A new user installs v2.
This experience should be identical to a user installing v1.
This user has no data and should be able to install and use the application immediately.

An existing user upgrades from v1 to v2.
This experience should be identical to installing v1 but it’s likely that they have information they want to keep in their existing database.
Since the database schema has changed you need a way to update the database structure to the revision required by v2.

It is this situation where the notion of a database migration is helpful.
In a migration you write the SQL statements to move from database schema, revision 1 to revision 2 and execute them.
This get more interesting when moving from v1 to higher a version, say v3, v5 or v10.

The idea provided in Diesel is to write queries for up- and down- grading the database schema by creating SQL statements that change the schema or table structure.
My example only supports schema upgrades.
It’s also assumes upgrading from App, v1, schema revison 1 to the current schema revision.

The essential idea is to recognize that a database schema is a whole-part hierarchy and that the Composite design pattern might be ideal to represent it.

In my implementation, the Component (Schema) is a base class to the Leaf (TableRevision) and the Composite (TableRevisionHistory).

The TableRevisionHistory contains a vector of TableRevisions that are executed in the order they are created by the migrate() method.
Following the execution of migrate() the database structure reflects the lastest revision of the schema.

The Schema interface.
The database is a parameter and not a state variable because the schema state does not depend upon a database.

A table revision contains only the SQL statement defining the revision.
A better choice for the SQL statement type might be q QSqlQuery or a QString.
Using the definition of an SQL statement below has the advantage of keeping Qt dependencies out of this class.

class TableRevision // Leaf
: public Schema
{
public:
typedef const char *sql_statement_type;
TableRevision(sql_statement_type s)
: sql_statement(s) { }
~TableRevision() { }
virtual const bool migrate(Database&) override;
private:
sql_statement_type sql_statement; // The SQL statement used to revise the table structure.
};

To define a table and create a revision.
Strictly speaking I don’t need a table revision class (e.g., table could inherit directly from schema).
Its presence conveys the intent of the pattern more clearly.

class Table
: public TableRevision
{
public:
typedef TableRevision::sql_statement_type sql_statement_type;
Table(sql_statement_type s)
: TableRevision(s) { }
~Table() { }
};
Table revision0("CREATE TABLE people (id INTEGER PRIMARY KEY, name TEXT)");

Schema revision isn’t supported explicitly.
It’s implied by the number of table revisions stored in the table revision history.
This was a surprise, as I thought revision would be part of the schema state.

One way to manage schema revision is to use a database table metadata that includes schema revision information.
For example, table could the current revision of each table (e.g., the highest index used during the last update).
An upgrade would require that the ‘migrate()’ begin it’s migration at the metadata revision’s plus one.

Another issue with schema revision is how to store it given that there are any number of tables whose revisions might all be different.
It seems there is a lot calculation (e.g., schema revision is the sum of all table revisions) or a maintenance issue (e.g., hard coded revision number).
Neither approach is strictly necessary.

To add an entry to the revision history.

class TableHistory
: public TableRevisionHistory
{
public:
/*! @brief Create the table history object.
*/
TableHistory()
: TableRevisionHistory() { }
/*! @brief Destroy the table history object.
*/
~TableHistory() { }
};
TableHistory history;
history.add(&revision0);
if(history.migrate(database)) {
return true;
}
else {
return false; // Handle error.
}

Of course, you could managed the revision history any number of ways.
A simple vector would suffice.
The advantage of using the Composite pattern is the client can use the same interface regardless of the number of tables or the number of revisions of those table.

When I watch the team interact during stand up I'm dismayed by what I see. It's like I'm looking at two different groups of people. In reality, it's the same people in different situations.

About

Brian Minard's blog.
I live and work in the Greater Toronto Area, Canada.

My posts don't represent the positions, strategies or opinions of my employer.
Really.
They don't.