0

We are scanning a third party library using a well know static analysis tool here.

We keep getting:

"Failure to Handle Missing Parameter CWE ID 234"

I'm not a C developer but I can't see what is different or wrong with the examples that have been highlighted? (lines: 1845,1853,2276,2301)

I understand the underlying threat, I just don't understand why the code is wrong.

Can anyone shed some light on this please as I'm a bit stuck!

Affected lines are:

1845:

snprintf(buff, sizeof(buff), "/BitsPerComponent %d", cid->bps);

1853:

snprintf(buff, sizeof(buff),
                     "/DecodeParms\n"
                     "<<\n"
                     " /Columns %d\n"
                     " /Predictor 14\n"
                     " /Colors %d\n"
                     " /BitsPerComponent %d\n"
                     ">>\n", cid->w, cid->spp, cid->bps);

2276:

snprintf((char *)buf, 32, "%d", objout); 2301:

snprintf((char *)buf, 32, "%d", objout);  
************** Failure to handle missing parameter - leptonica/src/colormap.c

2045

snprintf(buf, sizeof(buf), "%02x", data[3 * i + 1]);

**************
slayer
  • 402
  • 3
  • 14
Smitch
  • 3
  • 1

1 Answers1

0

Well... As you may know, it can lead to undefined behavior (i.e., either format string, memory leakage, denial of service or even more).

However, I will describe some caution to be taken on snprintf(3) function at following.


Difference between snprintf(3) and sprintf(3)

The goal of both is redirect the output of printf(3) to a buffer. Thus, you can build with a desired content and echoes it using one format string at once. What differs one to another is the at least bound requirement, being explicit in their parameter. It helps out to avoid the lengths of buffer (first parameter) exceed the size that it can deliberately take it. That's why there's the letter 'n' in the suchlike functions (i.e., strncat, strncmp, strncpy, strndup, ...), it indicates the functions that can automatically put a null byte (aka null terminator) at the end of string. So, despite it avoid such errors as stack based buffer overflow1, it can avoid alone against adjacent memory attacks2.

some technical notation about snprintf(3):

One of utmost important information that you should know is "the size of buffer shall always be equals to the size parameter."

In detail:

According to the GNU C Library (glibc) documentation, the snprintf(3) prototype is:

int
snprintf(char * restrict str, size_t size, const char * restrict format,
    ...);

char * restrict str: A restrict char pointer. In C99, the restrict keyword refers to an only pointer that is capable to access the object value that the pointer itself points to. It means that the whose value in a memory allocation is accessed through the only pointer that was declared as restrict. With that in mind, you can note that is made to avoid other declared pointer susceptible access value of the same place of the memory (such thing called "pointer aliasing") –– For more information read the reference 3 (from Wikipedia) and 4 (from StackOverflow, taking as a 2nd example)

size_t size: Denotes the at least (not less than) amount of byte that the buffer (described above) can stores. Suppose that it buffer holds "s|l|a|y|e|r|\0" values, each byte in its respective substring, ergo the prime obligation is the size of buffer to be equals to 6 –– When I say "size of buffer", I am referring to the buffer itself (1st parameter and not the 2nd parameter, which is a recipe of how much bytes the buffer can take it). Therefore, whether the size of buffer is less than 6, you are prone to dealing with undefined behavior such thing that I aforementioned at the beginning of this answer. However, you can have a truncated buffer's string contrariwise (when size of buffer is greater than 6).

const char * restrict format: A constant char restrict pointer that handles values corresponding to the type of format string5.

...: It is the optional numbers of arguments, functions that contains this parameter are called variadic function6 and can handle it using symbols existing in stdargs.h header7.


Analyzing a part of pdfio2.c source code

The char keyword is equivalent to from -128 to 127 (8 bits) in decimal and it can also reaches from 0 to 255 (still using 8 bits) when you use unsigned right before him (127 + 128 = 255). So, given the complete code that you left in reference, at line 2276:

snprintf((char *)buf, 32, "%d", objout);

If you analyze some lines before it, at lines 2262, 2263 and 2264 you can see the declaration of some variables including buf, objout and objin:

2262: l_uint8   buf[32];  /* only needs to hold one integer in ascii format */
2263: l_int32 start, nrepl, i, j, objin, objout, found;
2264: l_int32 *objs, *matches;

l_uint88: It stands for "leptonica_unsigned_int8", when the l_int8 is equivalent to char keyword because the range of bits it from -128 to 127, namely, a signed 1 byte integer (or 8 bits integer). It is a smallest bits of a object that is not recognized as a bit-field.

l_int329: The logical is the same as right above. But now, it is equivalent to 4 bytes, in other words, it ranges from 2^3^2 to -2^3^2.

Nevertheless, the buffer named as buf is declared as l_uint8 that can holds up 8 bits long being from 0 to 255 in decimal (unsigned char), but when receives such data, it casting to char, it is necessary to prepare the buffer to be compatible with the data that will be received from 3rd and 4th arguments – otherwise you'll gotta a compilation error because you are storing the expected char data in l_uint8 typedef variable without casting to.

Continuing to looking at substituteObjectNumbers() function:

At line 2241 until 2255 you'll see a brief commentary about what the function is suppose to do:

/*!
 * \brief   substituteObjectNumbers()
 *
 *  Input:  bas (lba of a pdf object)
 *          na_objs (object number mapping array)
 *  Return: bad (lba of rewritten pdf for the object)
 *
 *  Notes:
 *      (1) Interpret the first set of bytes as the object number,
 *          map to the new number, and write it out.
 *      (2) Find all occurrences of this 4-byte sequence: " 0 R"
 *      (3) Find the location and value of the integer preceding this,
 *          and map it to the new value.
 *      (4) Rewrite the object with new object numbers.
 */

A little bit of PDF syntax:

Note: this section is a brief of the book Developing with PDF10.

In light of this, the first "Note" in the commented code above you must noticed that he's talking about the PDF format file. In a PDF file format are 9 types of Objects (also referred as COS Objects – COS stands for Carousel Object System) and they can be represented as Indirect Objects or Direct Objects and can still referenced to each other.

Type of Objects: A PDF file format supports:

1. null;
2. boolean;
3. integer;
4. real;
5. name;
6. string;
7. array;
8. dictionary;
9. and, stream.

Direct object: Appears inline and are usually found as the value of a dictionary key––being an entry in an array and by the Object type.

Example 1 – Indirect objects made entirely from direct objects

3 0 obj        % object ID 3, generation 0
<<
 /ProcSet [ /PDF /Text /ImageC /ImageI ]
 /Font <<
     /F1 <<
        /Type /Font
        /Subtype /Type1
        /Name /F1
        /BaseFont/Helvetica
        >>
>> >>
endobj
5 0 obj
(an indirect string)
endobj
% an indirect number
4 0 obj
1234567890
endobj

Indirect object: Those are referred to by the reference. In order to figure out which object is being referred to, every indirect object has unique ID (per-PDF), expressed as a positive number and a generation number that's a positive number too, and usually is represented by zero (0)––either are used to identify the object and to reference the object.

Example 2 – An indirect object that references other indirect objects

3 0 obj                           % object ID 3, generation 0
<<
 /ProcSet 5 0 R                   % reference the indirect object with ID 5, generation 0
 /Font <</F1 4 0 R >>             % reference the indirect object with ID 4, generation 0
>>
endobj
4 0 obj                           % object ID 4, generation 0
<<
 /Type /Font
 /Subtype /Type1
 /Name /F1
 /BaseFont/Helvetica
>>
endobj
5 0 obj                           % object ID 5, generation 0
[ /PDF /Text /ImageC /ImageI ]
endobj

If you would like more examples, take a look at Minimal PDF as well11.

Back to the function code:

The objout variable given from snprintf() at line 2276 holds value of the buffer objs[objin]. This buffer receives the data from the 3rd parameter of sscanf() at line 2274 and that data comes from the 1st parameter named as datas, which is a pointer that serves as storage for the returned data from the l_byteaGetData(bas, &size) function, at line 2269. The function l_byteaGetData() returns the array of data from a given file.

To conclude...

So, it requires an argument to execute and hand on. And I am stuck on a curiosity of how do you execute the static analyze upon this code? A PDF file format is required to sanitize data until pass to the substituteObjectNumbers() and it’ll find out every “ 0 R”, meaning references to any Object type and performs your 3rd and––the final––the 4th act which is described in the commentary at line 2252. Remapping object through its found number.

Now, I have no clue what exactly static analyze tool you'are using to do so, and the measures taken by the tool to acknowledge the presence of this type of weakness in the code. I cannot speculate whether it is either false positive or not, though.

slayer
  • 402
  • 3
  • 14