Unpacking a struct ending with an ASCIIZ string

2019-04-29 09:24发布

问题:

I am trying to use struct.unpack() to take apart a data record that ends with an ASCII string.

The record (it happens to be a TomTom ov2 record) has this format (stored little-endian):

  • 1 byte
  • 4 byte int for total record size (including this field)
  • 4 byte int
  • 4 byte int
  • variable-length string, null-terminated

unpack() requires that the string's length be included in the format you pass it. I can use the second field and the known size of the rest of the record -- 13 bytes -- to get the string length:

str_len = struct.unpack("<xi", record[:5])[0] - 13
fmt = "<biii{0}s".format(str_len)

then proceed with the full unpacking, but since the string is null-terminated, I really wish unpack() would do it for me. It'd also be nice to have this should I run across a struct that doesn't include its own size.

How can I make that happen?

回答1:

I made two new functions that should be useable as drop-in replacements for the standard pack and unpack functions. They both support the 'z' character to pack/unpack an ASCIIZ string. There are no restrictions to the location or number of occurrences of the 'z' character in the format string:

import struct

def unpack (format, buffer) :
    while True :
        pos = format.find ('z')
        if pos < 0 :
            break
        asciiz_start = struct.calcsize (format[:pos])
        asciiz_len = buffer[asciiz_start:].find('\0')
        format = '%s%dsx%s' % (format[:pos], asciiz_len, format[pos+1:])
    return struct.unpack (format, buffer)

def pack (format, *args) :
    new_format = ''
    arg_number = 0
    for c in format :
        if c == 'z' :
            new_format += '%ds' % (len(args[arg_number])+1)
            arg_number += 1
        else :
            new_format += c
            if c in 'cbB?hHiIlLqQfdspP' :
                arg_number += 1
    return struct.pack (new_format, *args)

Here's an example of how to use them:

>>> from struct_z import pack, unpack
>>> line = pack ('<izizi', 1, 'Hello', 2, ' world!', 3)
>>> print line.encode('hex')
0100000048656c6c6f000200000020776f726c64210003000000
>>> print unpack ('<izizi',line)
(1, 'Hello', 2, ' world!', 3)
>>>


回答2:

The size-less record is fairly easy to handle, actually, since struct.calcsize() will tell you the length it expects. You can use that and the actual length of the data to construct a new format string for unpack() that includes the correct string length.

This function is just a wrapper for unpack(), allowing a new format character in the last position that will drop the terminal NUL:

import struct
def unpack_with_final_asciiz(fmt, dat):
    """
    Unpack binary data, handling a null-terminated string at the end 
    (and only at the end) automatically.

    The first argument, fmt, is a struct.unpack() format string with the 
    following modfications:
    If fmt's last character is 'z', the returned string will drop the NUL.
    If it is 's' with no length, the string including NUL will be returned.
    If it is 's' with a length, behavior is identical to normal unpack().
    """
    # Just pass on if no special behavior is required
    if fmt[-1] not in ('z', 's') or (fmt[-1] == 's' and fmt[-2].isdigit()):
        return struct.unpack(fmt, dat)

    # Use format string to get size of contained string and rest of record
    non_str_len = struct.calcsize(fmt[:-1])
    str_len = len(dat) - non_str_len

    # Set up new format string
    # If passed 'z', treat terminating NUL as a "pad byte"
    if fmt[-1] == 'z':
        str_fmt = "{0}sx".format(str_len - 1)
    else:
        str_fmt = "{0}s".format(str_len)
    new_fmt = fmt[:-1] + str_fmt

    return struct.unpack(new_fmt, dat)

>>> dat = b'\x02\x1e\x00\x00\x00z\x8eJ\x00\xb1\x7f\x03\x00Down by the river\x00'
>>> unpack_with_final_asciiz("<biiiz", dat)
(2, 30, 4886138, 229297, b'Down by the river')


标签: python struct