pyshark: access raw udp payload

2019-08-02 23:08发布

问题:

I'm new to pyshark. I'm trying to write a parser for custom UDP packets. I'm using the FileCapture object to read packets from a file.

>>> cap = pyshark.FileCapture('sample.pcap')
>>> pkt = cap.next()
>>> pkt
<UDP/DATA Packet>
>>> pkt.data.data
'01ca00040500a4700500a22a5af20f830000b3aa000110da5af20f7c000bde1a000006390000666e000067f900000ba7000026ce000001d00000000100001726000100000000000000000000000017260500a4700500a22a608600250500a8c10500a22a608601310500a8c10500a22b608601200500a8cc0500a22a6086000c'
>>> dir(pkt.udp)
['DATA_LAYER', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__getstate__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_all_fields', '_field_prefix', '_get_all_field_lines', _get_all_fields_with_alternates', '_get_field_or_layer_repr', '_get_field_repr', '_layer_name', '_sanitize_field_name', 'checksum', 'checksum_status', 'dstport', 'field_names', 'get', 'get_field', 'get_field_by_showname', get_field_value', 'layer_name', 'length', 'port', 'pretty_print', raw_mode', 'srcport', 'stream']

I need a method to simply access the packet's UDP payload. The only method I found to access raw packet data is pkt.data.data, but this returns the entire content of the packet while I'm only interested to UDP portion. Something like pkt.udp.data. Is there a way to simply do that or I need to use pkt.data.data and calculate the offset at which my data are placed?

回答1:

pyshark_parser might help you out: https://github.com/jlents/pyshark_parser/blob/master/pyshark_parser/

I was looking at their code and what you might be looking for here: https://github.com/jlents/pyshark_parser/blob/master/pyshark_parser/packet_util.py

def get_all_field_names(packet, layer=None):
'''
    Builds a unique list of field names, that exist in the packet,
    for the specified layer.
    If no layer is provided, all layers are considered.
    Args:
        packet: the pyshark packet object the fields will be gathered from
        layer: the string name of the layer that will be targeted
    Returns:
        a set containing all unique field names
        or None, if packet is None
'''

if not packet:
    return None

field_names = set()
for current_layer in packet.layers:
    if not layer or layer == current_layer.__dict__['_layer_name']:
        for field in current_layer.__dict__['_all_fields']:
            field_names.add(field)
return field_names

and

def get_value_from_packet_for_layer_field(packet, layer, field):
'''
    Gets the value from the packet for the specified 'layer' and 'field'
    Args:
        packet: The packet where you'll be retrieving the value from
        layer: The layer that contains the field
        field: The field that contains the value
    Returns:
        the value at packet[layer][key] or None
        or None, if any of the arguments are None
'''
if not packet or not layer or not field:
    return None
for current_layer in packet.layers:
    if layer == current_layer.__dict__['_layer_name'] and \
       current_layer.__dict__['_all_fields']:
        return current_layer.__dict__['_all_fields'][field]
return None


回答2:

The only method I found to access raw packet data is pkt.data.data,

Correct.

but this returns the entire content of the packet while I'm only interested to UDP portion.

Incorrect. The .data.data attribute is a hex string representation of just the UDP payload itself.

For example if your UDP payload is the ASCII string "hello", you can simply retrieve it as such with: bytearray.fromhex(pkt.data.data).decode()

(You can easily test this yourself from a Bash console, e.g., with echo -n hello >/dev/udp/localhost/12345 while doing a pyshark capture on lo:12345.)