May 5, 2018

Exploring an Integer Overflow

  —Someone reported an overflow in Boost.Regex.

I was lucky to find a ticket against Boost.Regex describing an integer overflow. Never having worked on Boost, I took the opportunity to create a patch.

Boost.Regex was added to C++11, so I don’t know if my pull request will ever get merged. My rationale for providing a patch was for the experience and because the reporter may not be able to use C++11.

The source code:

  std::ptrdiff_t states = re.size();
  if(states == 0)
    states = 1;
  states *= states; // overflows here on 32 bit platforms if regex
                    // string length greater than 2**16

Two challenges in tackling this bug.

  1. The signed integer overflow.

    Clearly, \((2^{16})^{2}\) is \(2^{32}\), so an integer overflow occurs. Unfortunately, states is a signed integral type, so the overflow occurs when the value of states is greater than \(\sqrt{2^{31}}\), or \(46,341\).

  2. Implementation specific behaviour in some circumstances during initialization.

    Implementation specific because the conversion of an unsigned value to a signed value works as expected if the unsigned value can be represented in the signed value. Otherwise the result is implementation dependent.

    The function declaration for re.size() sets the return type as std::size_t. Unfortunately, whenever re.size() returns a value greater than std::numeric_limits<std::ptrdiff_t>::max() the value of states is implementation dependent.

The issue exists on 64-bit platforms as well. Only the magnitude of the values changes.

Signed integer overflow can be detected using CERTS’s INT32.C secure coding recommendation. They provide a compliant algorithm that works with all compilers.

#include <limits.h>
 
void func(signed int si_a, signed int si_b) {
  signed int result; 
  if (si_a > 0) {  /* si_a is positive */
    if (si_b > 0) {  /* si_a and si_b are positive */
      if (si_a > (INT_MAX / si_b)) {
        /* Handle error */
      }
    } else { /* si_a positive, si_b nonpositive */
      if (si_b < (INT_MIN / si_a)) {
        /* Handle error */
      }
    } /* si_a positive, si_b nonpositive */
  } else { /* si_a is nonpositive */
    if (si_b > 0) { /* si_a is nonpositive, si_b is positive */
      if (si_a < (INT_MIN / si_b)) {
        /* Handle error */
      }
    } else { /* si_a and si_b are nonpositive */
      if ( (si_a != 0) && (si_b < (INT_MAX / si_a))) {
        /* Handle error */
      }
    } /* End if si_a and si_b are nonpositive */
  } /* End if si_a is nonpositive */
 
  result = si_a * si_b;
}

Fortunately, the algorithm provided by CERT can be simplified in this case. The states variable is initialized with an unsigned integral type whose value is guaranteed to be greater than zero and _a = si_b. So we are really looking at:

#include <limits.h>
 
void func(signed int si_a, signed int si_b) {
  if (si_a > (INT_MAX / si_b)) {
    /* Handle error */
  }

The initialization of std::ptrdiff_t with a std::size_t is strange. My best rationale for why this is written like this is that no regex state space is greater than (std::numeric_limits<std::ptrdiff_t>::max)(). I never established why the declaration of states used a type unable to contain the size of the regex. I figure the pull request will tell the tail.

In any event, changing the type declaration of states to match that of the value used to initialize it permits unsigned integer overflow checks.

Side note:

I’d be interested in understanding what the original reporter is doing that requires so many states in a regular expression.

April 29, 2018

The Temporary Scrum Master

  —Not sure I like rotating the Scrum Master role through a team.

I'm curious how rotating the Scrum Master role through the Development Team works out for the Development Team and the Organization as a whole.  I'm not sure that rotating the Scrum Master role is healthy. Selecting one member of the Development Team to permanently become the Scrum Master seems the better choice.

The Scrum Guide permits the the Product Owner and Scrum Master to execute work in the Sprint Backlog. I take this to mean both roles can be carried out by someone in the Development Team.

A review of the servant-leadership philosophy applied to the Scrum Master role provides insight on the challenges:
  • service to the Product Owner: the support the Scrum Master provides the Product Owner is not focused solely upon the domain. It includes Product Backlog management.
  • service to the Development Team: this focuses on the organization and the Scrum Team. It includes building bridges to other parts of the organization. It includes coaching the Development Team on self-organization and cross-functionality.
  • service to the Organization: this includes helping the organization leverage Scrum better.
When I hear about the Scrum Master role being fulfilled by the Development Team it usually includes a concessions to ensure the Scrum Master isn't taking on that role permanently. The motivation behind this concession is interesting.

Rotation implies that the organization isn't fully vested in Scrum. Further:
  • it implies the Scrum Master role is less valuable than the "other" role the Scrum Master has. 
  • it implies that Scrum Master isn't a good career choice for domain experts. 
  • it subjects the team to different Scrum Master's each with their own set of values and approaches.
Different values and approaches aren't bad. They are opportunities for learning. But they may cause confusion if you are just rolling out your Scrum implementation.

In all, rotating the Scrum Master doesn't sit well with me. The Scrum Master seems better suited as a permanent role. Even if I assume the Developer turned Scrum Master is a domain expert there are significant trade-offs involved in this approach, especially if you view Scrum as an important initiative that can benefit the entire organization.

April 6, 2018

GDB Command Files

  —Text files for storing gdb commands.

A favorite feature of mine in GDB is the command file. A command file is a simple text file used to store GDB commands.

I usually use these files to store breakpoints which simplifies the set up of the debugging session.

For example, from within gdb run:

source command.txt

where command.txt contains:

set breakpoint pending on
b main

I use breakpoint pending on to permit deferred loading of shared libraries.

The fantastic thing about these files is that you can store them under revision control and build them up while you chase down an issue. And then use them to confirm the correction.

March 31, 2018

RBTLIB v0.3.0 On Travis CI

  —A client-side library for Review Board.

In RBTLIB v0.3.0 On Read The Docs, I discussed adding support for Read the Docs to RBTLIB. Recently, I added RBTLIB to Travis CI. Travis CI is super easy to work with. It provided the opportunity to eliminate deployment issues. This is important, as my ultimate goal for RBTLIB is availability through PyPi.

The main advantage Travis CI provides is the ability to test on different platforms and to eliminate issues of portability. I lack experience with Python's setup tools so there are likely to be issues as I move RBTLIB to PyPi.

All in all, v0.3.0 has significant infrastructure improvements over v0.2. Functionally, v0.3.0 targets posting of review requests through rbt.

March 20, 2018

Experiments with Packer and Vagrant (Fini)

  —A look at Vagrant and Packer (Lessons Learned).

I spent time exploring HashiCorp’s Packer and Vagrant tools. My objective for this exploration was to understand how Packer and Vagrant could help me develop and maintain my infrastructure. I like both tools. I like them a lot.

The power of Packer is that it turns infrastructure into code. You can configure virtual machines using Packer with a small collection of scripts. The advantages Packer introduces are the ability to use source control to manage the configuration. This permits the update of the virtual machine through the modification of these scripts.

The power of Vagrant is that it enables deployment of the virtual machine. It’s genius is that you can use Vagrant to deploy clusters of virtual machines. My use case is the deployment of continuous integration servers but I have other use cases wherein web servers and application servers can be created with the Packer and Vagrant combination and then deployed into a test and production environment.

The main contribution my exploration makes it that I introduce my own SSH key pairs into my Vagrant Boxes and I took steps to update the Kickstart Configuration and Preseed files with encrypted root passwords. I also locked out the Vagrant user account so that access to the virtual machine can only occur over SSH using my key pair.

I developed a collection of make files to coordinate provisioning the Vagrant Boxes. I don’t actually like the target structure used by these make files. In hindsight they would be more useful if the target names reflected the purpose of the Vagrant Box (i.e., web server instead of delian-jessie).

I use a script to generate my Kickstart and Preseed files. Possibly useful for examples.

I don’t like the way my Packer template provisioning scripts are structured. I initially thought that separating scripts by service (e.g., nfs instead networking). Ultimately I think that a better structure for provisioning scripts is closer to the purpose of the Vagrant Box.

For example, to build of a developer and production environment I want makefile targets like:
<ul><li>base_box (provision to enable vagrant user)</li><li>developer_box (provision to enable vagrant user and developer tool chain)</li><li>production_box (provision to enable vagrant user and no developer tool chain)</li></ul>
and scripts that provision these boxes. In this example, creating a developer box should rely on the base box script and the developer box script. This enables a minimal approach to creating additional boxes.

My production box need ever be provisioned to include a developer tool chain, which ensures that only production services and applications flow into the production environment.

The source: experiments with Vagrant