How can I extract HTML code with Scapy?

2019-07-26 09:38发布

I recently began to use the scapy library for Python 2.x I found there to be minimal documentation on the sniff() function. I began to play around with it and found that I can veiw TCP packets at a very low level. So far I have only found informational data. For example:

Here is what I put in the scapy terminal:

A = sniff(filter="tcp and host 216.58.193.78", count=2)

This is a request to google.com asking for the homepage:

<Ether  dst=e8:de:27:55:17:f3 src=00:24:1d:20:a6:1b type=0x800 |<IP  version=4L ihl=5L tos=0x0 len=60 id=46627 flags=DF frag=0L ttl=64 proto=tcp chksum=0x2a65 src=192.168.0.2 dst=216.58.193.78 options=[] |<TCP  sport=54036 dport=www seq=2948286264 ack=0 dataofs=10L reserved=0L flags=S window=29200 chksum=0x5a62 urgptr=0 options=[('MSS', 1460), ('SAckOK', ''), ('Timestamp', (389403, 0)), ('NOP', None), ('WScale', 7)] |>>>

Here is the response:

<Ether  dst=00:24:1d:20:a6:1b src=e8:de:27:55:17:f3 type=0x800 |<IP  version=4L ihl=5L tos=0x0 len=60 id=42380 flags= frag=0L ttl=55 proto=tcp chksum=0x83fc src=216.58.193.78 dst=192.168.0.2 options=[] |<TCP  sport=www dport=54036 seq=3087468609 ack=2948286265 dataofs=10L reserved=0L flags=SA window=42540 chksum=0xecaf urgptr=0 options=[('MSS', 1430), ('SAckOK', ''), ('Timestamp', (2823173876, 389403)), ('NOP', None), ('WScale', 7)] |>>>

Using this function, is there a way that I can extract HTML code from the response?

Also, what do those packets look like?

And finaly, Why are both of these packets nearly identical?

2条回答
Evening l夕情丶
2楼-- · 2019-07-26 10:12

Have you tried using scapy-http? It's a great scapy extension that helps with this exact issue

查看更多
在下西门庆
3楼-- · 2019-07-26 10:21

The segments in your example are "nearly identical" because they are the TCP SYN and SYN-ACK segments which are part of the TCP handshake, HTTP request and response comes after that during the connection (usually when in ESTABLISHED state except when TCP Fast Open option is used) so you need to look at segments after the handshake to get the data you are interested in.

         SYN
C ---------------> S
       SYN-ACK
C <--------------- S
         ACK
C ---------------> S
    HTTP request
C ---------------> S
         ACK
C <--------------- S
    HTTP response
C <--------------- S  <= Here is the server's answer
         ACK
C ---------------> S
...

You can use Scapy's Raw layer to extract data above TCP in your case (e.g. pkt[Raw])

查看更多
登录 后发表回答