November 1, 2019

Class Breakdown for a JIRA Worklog

—Domain entities are the most important elements of models.

I’ve had occasion to write some tools that explore ticket data using JIRA. JIRA provides a rich REST API for doing these kinds of explorations.

One of the problems I ran into was the concept of worklog. JIRA associates a worklog with each JIRA issue.

The basic relationship is that a project is associated with zero or more issues. Each issue is associated a worklog. Each worklog has zero or more worklogs (I’ll call these worklog entries to differentiate them from the issue worklog).

My problem was finding a good class breakdown for this structure. I started by focusing on the collections of worklogs associated with a project.

The code looked something like this:

Worklog = namedtuple('Worklog', 'created comment timeSpentSeconds')

class IssueWorklog(object):
    def __init__(self, issue_key):
        self._worklog = list(map(lambda worklog: Worklog._make(worklog['created'], worklog['comment'], worklog['timeSpentSeconds']), GetIssueWorklog(issue_key)

    def get_worklog_entry(self):
        for worklog in self._worklog:
            yield worklog

class ProjectWorklog(object):
    def __init__(self, project_key):
        self._worklog = list(map(lamda issue: IssueWorklog(issue['key']), SearchUri(project_key)))

    def get_worklog_entry(self):
        for worklog in self._worklog:
            yield worklog

(GetIssueWorklog() and SearchUri() are methods that manages calls to the Get Issue Worklog and Search resources defined in JIRA’s REST API.)

Needless to say, I hated this implementation. I hated it because the use of classes in this instance is overkill. All this does is create a collection of worklog entries.

Other parts of my applications took the worklog entries and generated plots from them. In all, you could do this with three functions (or a nested for-loop):

  For each issue in the project:
     For each worklog entry in the issue:
         Add worklog entry to a container.

In my opinion, I’ve completely missed the point of good design.

An alternative design that I like much better looks like this.

WorklogEntry = namedtuple('WorklogEntry', 'created comment timeSpentSeconds')

class ResampleFrequency(Enum):
    MONTH_END = 'M'

class Worklog(object):
    """A collection of worklog entries from a project, an issue, or both."""

   def __init__(self, worklog_entries):
       self._df = pd.DataFrame(sorted(worklog_entries, key = lambda worklog: worklog.created), columns = WorklogEntry._fields)

   def doSumAndResample(self, new_frequency):
       return self._df.resample(new_frequency).sum(), columns = [ 'timeSpentSeconds'])

The important part here is the shift in emphasis from operations for the collection of worklog entries to operations on WorklogEntry objects. Here a worklog entry is a container and the Worklog class does all of the heavy lifting.

Can I do better than the doResampleAndSumTimeSeries() method? Well, let’s see.

# Worklog information collected from a JIRA issue.
WorklogEntry = namedtuple('Worklog', 'key summary comment created timeSpentSeconds')

class ResampleFrequency(Enum):
    """Enumerate different Pandas resample freqencies using Pandas DateOffset objects."""
    MONTH_END = 'M'

class Worklog(object):
    """Construct a Panda data frame from collected worklogs.

    A worklog can be viewed as a time series depicting time spent, along with other metadata.
    """
    def __init__(self, worklog_entries: List[WorklogEntry]):
        """Construct the worklog object using the provided worklog entries."""
        df = pd.DataFrame.from_records(sorted(worklog_entries, key = lambda worklog: worklog.created), columns = WorklogEntry._fields)
        self._df = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['created'], utc = True)))

    def doResampleAndSumTimeSeries(self, resample_frequency: ResampleFrequency):
        """Resample and sum the worklog's time spent values using the provided resample frequency."""
        self._df = pd.DataFrame(self._df.resample(resample_frequency.value).sum(), columns = [ 'timeSpentSeconds' ])

    def doCalculateRollingAverage(self, window_size: int):
        """Caclulate a rolling average of the worklog's time spent values."""
        self._df['rollingAverageTimeSpentSeconds'] = self._df.rolling(window = windw_size).mean()

    def doMakeDataFrame(self) -> pd.DataFrame:
        """Make a data frame from the worklog object."""
        return self._df

It’s not perfect. It’s not perfect because I wrap the Pandas data frame with operations that manipulate the JIRA worklogs. It provides an advantage in that the maniplation of the data frame is contained within the class. It has the disadvantage of exporting the data frame which seems disappointing in terms of extending the class.