November 24, 2021

Graphical Methods for Statistical Analysis (Part III)

In Graphical Methods, I described an improvement I made to an implementation and showed how it was on it’s way to being better.

I ended up with the following:

import matplotlib.pyplot as plt

class Plot(object):
    def __init__(self, **kwargs):
        """Construct the plot object."""
        self._figure, self._axis = plt.subplots(**kwargs)

    def __del__(self):
        """Destroy the figure once it's finished being used."""
        plt.close(self._figure)

    def plot(self, data_set, column_name):
        """Placeholder for univariate data plot."""
        pass

    def save(self, file_pointer, file_format: str):
        """Write the histogram plot to the specified file or file-like object."""
        plt.savefig(file_pointer, format = file_format)

and

from ..plot import Plot

import matplotlib.pyplot as plt

class BoxPlot(Plot):
    def __init__(self, **kwargs):
        """Construct the box plot object."""
        super().__init__(**kwargs)

    def plot(self, data_set, column_name):
        """Create a univariate box plot of the data set's column."""
        super().plot(data_set, column_name)
        self._axis.set_title('Box Plot')
        self._axis.set_ylabel('Data Set')
        self._axis.set_xlabel(column_name)
        self._axis.boxplot(data_set[column_name], vert=False)

This is the same class hierarchy I proposed in Graphical Methods for Statistical Analysis. The difference here is that the list of arguments provided to Plot is no longer focused on plot attributes (title, labels, etc). Instead it’s focused on the construction of the figure and axes required to make the plot.

The plot attributes are delegated to the derived classes (BoxPlot in this example). A nice property of this is that the BoxPlot’s plot function can change the plot attributes without affecting anything else. This implementation also has a nice property in that changing the axes (i.e., from the default of 1) doesn’t affect the clients or the parent class. Everything relating to a box plot is in this class.

This design doesn’t currently support bivariate analysis and the plot method won’t adjust well to the integration of that requirement.

About the only thing that I debate in the implementation of BoxPlot is whether the titles and labels should be set in by the plot() method. I intend to leave them there for a couple of reasons. The distance between their use and the univariate plot is small. Extending the box plots to bivariate values is going to force me to extend the implementation in some way that I haven’t fully through through.

The only other point in the implementation is the recurrence of the pattern of setup, plot and write:

@gm.command()
@click.argument('column')
@click.argument('output', type=click.File('wb'))
@click.option('--format', type=click.Choice([ 'png', ], case_sensitive = False), default = 'png')
@click.pass_context
def box_plot(ctx, column, output, format):
    """Create a box plot from a COLUMN of data in the CSV-FILE.

    COLUMN is the name of a single column in the CSV-FILE. This column
    is used in the univariate plot.

    OUTPUT is the name of the output file.

    FORMAT is the type of output format.
    """

    plot = BoxPlot(figsize = FIGURE_SIZE, tight_layout = True)
    plot.plot(ctx.obj['data frame'], column)
    plot.save(output, format)

I haven’t convinced myself that taking advantage of the Plot base class so that a method like the following is a good idea.

def plot_it(plot: Plot):
    plot.plot(...)
    plot.save(...)

It’s tempting, but the problem is that it’s too soon. Bivariate plots are coming and it’s not clear how to change the implementation to support this. Better to wait.

comments powered by Disqus