—Someone reported an overflow in Boost.Regex.
I was lucky to find a ticket against Boost.Regex describing an integer overflow. Never having worked on Boost, I took the opportunity to create a patch.
Boost.Regex was added to C++11, so I don’t know if my pull request will ever get merged. My rationale for providing a patch was for the experience and because the reporter may not be able to use C++11.
The source code:
std::ptrdiff_t states = re.size();
if(states == 0)
states = 1;
states *= states; // overflows here on 32 bit platforms if regex
// string length greater than 2**16
Two challenges in tackling this bug.
The signed integer overflow.
Clearly, \((2^{16})^{2}\) is \(2^{32}\), so an integer overflow occurs. Unfortunately, states is a signed integral type, so the overflow occurs when the value of states is greater than \(\sqrt{2^{31}}\), or \(46,341\).
Implementation specific behaviour in some circumstances during initialization.
Implementation specific because the conversion of an unsigned value to a signed value works as expected if the unsigned value can be represented in the signed value. Otherwise the result is implementation dependent.
The function declaration for re.size()
sets the return type as
std::size_t
. Unfortunately, whenever re.size()
returns a
value greater than std::numeric_limits<std::ptrdiff_t>::max()
the value of states is implementation dependent.
The issue exists on 64-bit platforms as well. Only the magnitude of the values changes.
Signed integer overflow can be detected using CERTS’s INT32.C secure coding recommendation. They provide a compliant algorithm that works with all compilers.
#include <limits.h>
void func(signed int si_a, signed int si_b) {
signed int result;
if (si_a > 0) { /* si_a is positive */
if (si_b > 0) { /* si_a and si_b are positive */
if (si_a > (INT_MAX / si_b)) {
/* Handle error */
}
} else { /* si_a positive, si_b nonpositive */
if (si_b < (INT_MIN / si_a)) {
/* Handle error */
}
} /* si_a positive, si_b nonpositive */
} else { /* si_a is nonpositive */
if (si_b > 0) { /* si_a is nonpositive, si_b is positive */
if (si_a < (INT_MIN / si_b)) {
/* Handle error */
}
} else { /* si_a and si_b are nonpositive */
if ( (si_a != 0) && (si_b < (INT_MAX / si_a))) {
/* Handle error */
}
} /* End if si_a and si_b are nonpositive */
} /* End if si_a is nonpositive */
result = si_a * si_b;
}
Fortunately, the algorithm provided by CERT can be simplified in this case. The
states variable is initialized with an unsigned integral type whose value is
guaranteed to be greater than zero and _a = si_b
. So we are really looking at:
#include <limits.h>
void func(signed int si_a, signed int si_b) {
if (si_a > (INT_MAX / si_b)) {
/* Handle error */
}
The initialization of std::ptrdiff_t
with a std::size_t
is
strange. My best rationale for why this is written like this is that
no regex state space is greater than
(std::numeric_limits<std::ptrdiff_t>::max)()
. I never established
why the declaration of states
used a type unable to contain the size
of the regex. I figure the pull request will tell the tail.
In any event, changing the type declaration of states
to match that
of the value used to initialize it permits unsigned integer overflow
checks.
Side note:
I’d be interested in understanding what the original reporter is doing that requires so many states in a regular expression.