In Graphical Methods, I described a problem I was having with an implementation. Basically, I was having difficulty in finding what felt like the right abstraction for a collection of methods that looked like this:
@gm.command()
@click.argument('column')
@click.pass_context
def box_plot(ctx, column):
fig, ax = plt.subplots(1, 1, figsize = FIGURE_SIZE, tight_layout = True)
ax.set_title('Box Plot')
ax.set_xlabel(column)
ax.set_ylabel('Data Set')
ax.boxplot(ctx.obj['data frame'][column], vert=False)
doWritePngFile(ctx.obj['output file'] + '_' + 'box_plot', doSaveFigure(fig))
I ended up with the following:
@gm.command()
@click.argument('column')
@click.argument('output', type=click.File('wb'))
@click.option('--format', type=click.Choice([ 'png', ], case_sensitive = False), default = 'png')
@click.pass_context
def box_plot(ctx, column, output, format):
"""Create a box plot from a COLUMN of data in the CSV-FILE.
COLUMN is the name of a single column in the CSV-FILE. This column
is used in the univariate plot.
OUTPUT is the name of the output file.
FORMAT is the type of output format.
"""
plot = BoxPlot(figsize = FIGURE_SIZE, tight_layout = True)
plot.plot(ctx.obj['data frame'], column)
plot.save(output, format)
This still follows the setup, plot and write the data pattern that I liked from the last post but introduces BoxPlot
to manage the box plot lifecycle.
The BoxPlot
class looks like this:
import matplotlib.pyplot as plt
class BoxPlot(object):
def __init__(self, **kwargs):
"""Construct the box plot object."""
self._figure, self._axis = plt.subplots(**kwargs)
def __del__(self):
"""Destroy the figure once it's finished being used."""
plt.close(self._figure)
def plot(self, data_set, column_name):
"""Create a univariate box plot of the data set's column."""
self._axis.set_title('Box Plot')
self._axis.set_ylabel('Data Set')
self._axis.set_xlabel(column_name)
self._axis.boxplot(data_set[column_name], vert=False)
def save(self, file_pointer, file_format: str):
"""Write the box plot to the specified file or file-like object."""
plt.savefig(file_pointer, format = file_format)
What works for me here is that the box plot has become the focus of the domain, not the Matplotlib figure and axis. Identifying the domain is the same challenge I described in Better Class Design. That’s the hard part of getting this correct–finding the right concept in the domain.
I still have a lot of duplicate code–the __init__()
, __del()__
and save()
methods are the same for all of my plot classes.
The next refactor will clean that up.
Every function like box_plot()
and every class like BoxPlot
follows the same pattern.
This tells me there is still a couple of abstractions that are missing from the implementation, but that I’m on the right track.
Where this last refactor has paid off is the elimination of doWritePngFile()
.
There was an excessive amount of layering there that was overly focused on writing to just a file.
The save()
API is much more general, taking only a file-type object and a format string supported Matplotlib’s savefig()
.
This was achieved that the expense of pushing the format parameter through to the command-line.
Another important point is that this refactor moved me away from the recurring problem of the Data Class. Another indication that this last refactor was a positive step forward.