How can I filter a pcap file by specific protocol

2019-01-31 09:51发布

问题:

I have some pcap files and I want to filter by protocol, i.e., if I want to filter by HTTP protocol, anything but HTTP packets will remain in the pcap file.

There is a tool called openDPI, and it's perfect for what I need, but there is no wrapper for python language.

Does anyone knows any python modules that can do what I need?

Thanks

Edit 1:

HTTP filtering was just an example, there is a lot of protocols that I want to filter.

Edit 2:

I tried Scapy, but I don't figure how to filter correctly. The filter only accepts Berkeley Packet Filter expression, i.e., I can't apply a msn, or HTTP, or another specific filter from upper layer. Can anyone help me?

回答1:

maybe this can help Scapy?



回答2:

A quick example using Scapy, since I just wrote one:

pkts = rdpcap('packets.pcap')
ports = [80, 25]
filtered = (pkt for pkt in pkts if
    TCP in pkt and
    (pkt[TCP].sport in ports or pkt[TCP].dport in ports))
wrpcap('filtered.pcap', filtered)

That will filter out packets that are neither HTTP nor SMTP. If you want all the packets but HTTP and SMTP, the third line should be:

filtered = (pkt for pkt in pkts if
    not (TCP in pkt and
    (pkt[TCP].sport in ports or pkt[TCP].dport in ports)))
wrpcap('filtered.pcap', filtered)


回答3:

I know this is a super-old question, but I just ran across it thought I'd provide my answer. This is a problem I've encountered several times over the years, and I keep finding myself falling back to dpkt. Originally from the very capable dugsong, dpkt is primarily a packet creation/parsing library. I get the sense the pcap parsing was an afterthought, but it turns out to be a very useful one, because parsing pcaps, IP, TCP and and TCP headers is straightforward. It's parsing all the higher-level protocols that becomes the time sink! (I wrote my own python pcap parsing library before finding dpkt)

The documentation on using the pcap parsing functionality is a little thin. Here's an example from my files:

import socket
import dpkt
import sys
pcapReader = dpkt.pcap.Reader(file(sys.argv[1], "rb"))
for ts, data in pcapReader:
    ether = dpkt.ethernet.Ethernet(data)
    if ether.type != dpkt.ethernet.ETH_TYPE_IP: raise
    ip = ether.data
    src = socket.inet_ntoa(ip.src)
    dst = socket.inet_ntoa(ip.dst)
    print "%s -> %s" % (src, dst)

Hope this helps the next guy to run across this post!



回答4:

Something along the lines of

from pcapy import open_offline
from impacket.ImpactDecoder import EthDecoder
from impacket.ImpactPacket import IP, TCP, UDP, ICMP

decoder = EthDecoder()

def callback(jdr, data):
    packet = decoder.decode(data)
    child = packet.child()
    if isinstance(child, IP):
        child = packet.child()
        if isinstance(child, TCP):
            if child.get_th_dport() == 80:
                print 'HTTP'

pcap = open_offline('net.cap')
pcap.loop(0, callback)

using

http://oss.coresecurity.com/projects/impacket.html



回答5:

Try pylibpcap.



回答6:

sniff supports a offline option wherein you can provide the pcap file as input. This way you can use the filtering advantages of sniff command on pcap file.

>>> packets = sniff(offline='mypackets.pcap')
>>>
>>> packets
<Sniffed: TCP:17 UDP:0 ICMP:0 Other:0>

Hope that helps !



回答7:

to filter in/out a specific protocol you have to do a per packet analysis otherwise you could miss some http traffic on a non-conventional port that is flowing in your network. of course if you want a loose system, you could check just for source and destination port number but that wont give you exact results. you have to look for specific feature of a protocol like GET, POST, HEAD etc keywords for HTTP and others for other protocol and check each TCP packets.



回答8:

I have tried the same using @nmichaels method, but it becomes cumbersome when I want to iterate it over multiple protocols. I tried finding ways to read the .pcap file and then filter it but found no help. Basically, when one reads a .pcap file there is no function in Scapy which allows to filter these packets, on the other hand using a command like,

a=sniff(filter="tcp and ( port 25 or port 110 )",prn=lambda x: x.sprintf("%IP.src%:%TCP.sport% -> %IP.dst%:%TCP.dport%  %2s,TCP.flags% : %TCP.payload%"))

helps to filter but only while sniffing.

If anyone knows of any other method where we can use a BPF syntax instead of the for statement?