How to derive KDD99 Features from DARPA pcap file?

2019-01-12 01:26发布

问题:

I have worked recently with the DARPA network traffic packets and the derived version of it used in KDD99 for intrusion detection evaluation.

Excuse my limited domain knowledge in computer networks, I could only derive 9 features from the DARPA packet headers. and Not the 41 features used in KDD99.

I am intending to continue my work on the UNB ISCX Intrusion Detection Evaluation DataSet. However, I want to derive from the pcap files the 41 features used in the KDD99 and save it in a CSV format. Is there a fast/easy way to achieve this?

as it was already been done previously for the KDD99, is there a library or converter that can do this for me ? if not, is there a guide of how to derive these features from a pcap file ?

回答1:

Be careful with this data set.

http://www.kdnuggets.com/news/2007/n18/4i.html

Some excerpts:

the artificial data was generated using a closed network, some proprietary network traffic generators, and hand-injected attacks

Among the issues raised, the most important seemed to be that no validation was ever performed to show that the DARPA dataset actually looked like real network traffic.

In 2003, Mahoney and Chan built a trivial intrusion detection system and ran it against the DARPA tcpdump data. They found numerous irregularities, including that -- due to the way the data was generated -- all the malicious packets had a TTL of 126 or 253 whereas almost all the benign packets had a TTL of 127 or 254.

the DARPA dataset (and by extension, the KDD Cup '99 dataset) was fundamentally broken, and one could not draw any conclusions from any experiments run using them

we strongly recommend that (1) all researchers stop using the KDD Cup '99 dataset

As for the feature extraction used. IIRC the majority of features simply were attributes of the parsed IP/TCP/UDP headers. Such as, port number, last octet of IP, and some packet flags.

As such, these findings no longer reflect realistic attacks anymore anyway. Todays TCP/IP stacks are much more robust than at the time the data set was created, where a "ping of death" would instantly lock up a windows host. Every developer of a TCP/IP stack should by now be aware of the risk of such malformed packets and stress-test the stack against such things.

With this, these features have become pretty much meaningless. Incorrectly set SYN flags etc. are no longer used in network attacks; these are much more sophisticated; and most likely no longer attacking the TCP/IP stack, but the services running on the next layer. So I would not bother finding out which low level packet flags were used in that '99 flawed simulation using attacks that worked in the early '90s...