How can I extract HTML code with Scapy?

I recently began to use the scapy library for Python 2.x I found there to be minimal documentation on the sniff() function. I began to play around with it and found that I can veiw TCP packets at a very low level. So far I have only found informational data. For example:

Here is what I put in the scapy terminal:

A = sniff(filter="tcp and host 216.58.193.78", count=2)

This is a request to google.com asking for the homepage:

<Ether  dst=e8:de:27:55:17:f3 src=00:24:1d:20:a6:1b type=0x800 |<IP  version=4L ihl=5L tos=0x0 len=60 id=46627 flags=DF frag=0L ttl=64 proto=tcp chksum=0x2a65 src=192.168.0.2 dst=216.58.193.78 options=[] |<TCP  sport=54036 dport=www seq=2948286264 ack=0 dataofs=10L reserved=0L flags=S window=29200 chksum=0x5a62 urgptr=0 options=[('MSS', 1460), ('SAckOK', ''), ('Timestamp', (389403, 0)), ('NOP', None), ('WScale', 7)] |>>>

Here is the response:

<Ether  dst=00:24:1d:20:a6:1b src=e8:de:27:55:17:f3 type=0x800 |<IP  version=4L ihl=5L tos=0x0 len=60 id=42380 flags= frag=0L ttl=55 proto=tcp chksum=0x83fc src=216.58.193.78 dst=192.168.0.2 options=[] |<TCP  sport=www dport=54036 seq=3087468609 ack=2948286265 dataofs=10L reserved=0L flags=SA window=42540 chksum=0xecaf urgptr=0 options=[('MSS', 1430), ('SAckOK', ''), ('Timestamp', (2823173876, 389403)), ('NOP', None), ('WScale', 7)] |>>>

Using this function, is there a way that I can extract HTML code from the response?

Also, what do those packets look like?

And finaly, Why are both of these packets nearly identical?

标签： python-2.7 tcp scapy packet-capture packets

2条回答

Evening l夕情丶

2楼-- · 2019-07-26 10:12

Have you tried using scapy-http? It's a great scapy extension that helps with this exact issue

0人赞添加讨论(0) 举报

在下西门庆

3楼-- · 2019-07-26 10:21

The segments in your example are "nearly identical" because they are the TCP SYN and SYN-ACK segments which are part of the TCP handshake, HTTP request and response comes after that during the connection (usually when in ESTABLISHED state except when TCP Fast Open option is used) so you need to look at segments after the handshake to get the data you are interested in.

         SYN
C ---------------> S
       SYN-ACK
C <--------------- S
         ACK
C ---------------> S
    HTTP request
C ---------------> S
         ACK
C <--------------- S
    HTTP response
C <--------------- S  <= Here is the server's answer
         ACK
C ---------------> S
...

You can use Scapy's Raw layer to extract data above TCP in your case (e.g. pkt[Raw])

0人赞添加讨论(0) 举报

How can I extract HTML code with Scapy?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间