You are here

An Inefficiency in the PHP Source Code?

Corey Pennycuff's picture

I was browsing through the PHP source code the other day when two things struck me. First, I must be weird for being curious enough to look up the source code. Second, I found an inefficiency in a seemingly straightforward function that I would like to discuss here.

In the PHP source code for quote_print.c, the first function is php_hex2int(), which I will provide here:

/*
*  Converting HEX char to INT value
*/
static char php_hex2int(int c) /* {{{ */
{
        if (isdigit(c)) {
                return c - '0';
        }
        else if (c >= 'A' && c <= 'F') {
                return c - 'A' + 10;
        }
        else if (c >= 'a' && c <= 'f') {
                return c - 'a' + 10;
        }
        else {
                return -1;
        }
}
/* }}} */

The purpose of this function is easy to see, in that it converts an ASCII value into an integer value. The code is undoubtedly easy to read, but look at the number of comparisons that must be made. For any single request, there is either 1, 3, or 5 comparisons that will be executed. isdigit(), in the first if() statement, does not perform any comparisons itself, but does involve bit operations according to this source. There is also at least one arithmetical operation that will be performed if the value is a valid digit. Two possible optimizations will be examined.

Binary Search

Knowing that the desired return value is derived from the ASCII value of an integer, and knowing that ASCII values are always in the same order with the same value, we can use binary search principles to reduce the maximum number of comparisons.

/*
*  Converting HEX char to INT value using a binary search
*/
static char php_hex2int(int c) /* {{{ */
{
        if (c > 'F') {
                if (c >= 'a' && c <= 'f') {
                        return c - 'a' + 10;
                }
        }
        else if (isdigit(c)) {
                return c - '0';
        }
        else if (c >= 'A') {
                return c - 'A' + 10;
        }
        return -1;
}
/* }}} */

The code provided here assures us a maximum of three comparisons per function call, but it loses the clarity of the original version.

Lookup Table

A different option would be to use a lookup table. The code is provided below:

/*
*  Converting HEX char to INT value using a lookup table
*/
static char php_hex2int(int c) /* {{{ */
{
        static char lookup[] = {
                -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
                -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
                -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
                 0, 1, 2, 3, 4, 5, 6, 7,  8, 9,-1,-1,-1,-1,-1,-1,
                -1,10,11,12,13,14,15,-1, -1,-1,-1,-1,-1,-1,-1,-1,
                -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
                -1,10,11,12,13,14,15};
        if ((unsigned int)c <= 'f') {
                return lookup[c];
        }
        return -1;
}
/* }}} */

This method is perhaps the least readable, but it is undoubtedly the fastest. It still performs one comparison per function call for range checking, but it is limited to that one comparison in every case. I avoid having to do the lower array bound check by casting c to an unsigned int (because c could be negative), so any negative values will now be quite large and therefore invalid, returning -1 as in the original function.

Conclusion

It is entirely possible that this function is called very little in actual code execution, and it is possible that optimization of an obscure function does little in the overall context of performance, but it is always interesting to see real-world applications of the techniques learned in class. Yeah, I probably won't win the Turing award for this, but I did have fun!

Tags: