how to parse user agent string? python

2020-08-24 03:24发布

问题:

<field name="http.user_agent" showname="User-Agent: CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)\r\n" size="62" pos="542" show="CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)" value="557365722d4167656e743a20434f52452f362e3530362e342e31204f70656e434f52452f322e303220284c696e75783b416e64726f696420322e32290d0a"/>

<field name="http.user_agent" showname="User-Agent: HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5\r\n" size="67" pos="570" show="HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5" value="557365722d4167656e743a204854432053747265616d696e6720506c61796572206874635f777765202f20312e30202f206874635f7669766f202f20322e332e350d0a"/>

<field name="http.user_agent" showname="User-Agent: AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)\r\n" size="85" pos="639" show="AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)" value="557365722d4167656e743a204170706c65436f72654d656469612f312e302e302e38433134382028695061643b20553b20435055204f5320345f325f31206c696b65204d6163204f5320583b2073765f7365290d0a"/>

The samples of the urls I've got are listed above. I am wondering if there is any module in Python which I can use to parse the user-agent. I want to get the output from these samples like:

Android
HTC Streaming player
ipad

and if it is a PC user, I want to get the web browser type.

回答1:

There is a library called httpagentparser for that:

import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9"
>>> print httpagentparser.simple_detect(s)
('Linux', 'Chrome 5.0.307.11')
>>> print httpagentparser.detect(s)
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}


回答2:

Werkzeug has a user agent parser built in.

http://werkzeug.pocoo.org/docs/quickstart/?highlight=user_agent#header-parsing

from werkzeug.test import create_environ
from werkzeug.wrappers import Request

environ = create_environ()
environ.update(HTTP_USER_AGENT=('Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    ' AppleWebKit/537.36 (KHTML, like Gecko)'
    ' Chrome/76.0.3809.100 Safari/537.36'))
request = Request(environ)

request.user_agent.browser
'chrome'


回答3:

You can try to write your own with regular expression : http://docs.python.org/library/re.html or take a look at this : http://pypi.python.org/pypi/httpagentparser