我需要一个标题是这样的:
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
并解析它这个使用Python:
{'protocol':'Digest',
'qop':'chap',
'realm':'testrealm@host.com',
'username':'Foobear',
'response':'6629fae49393a05397450978507c4ef1',
'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}
是否有一个图书馆要做到这一点,或者说我可以看看灵感?
我在谷歌应用程序引擎这样做,我不知道,如果Pyparsing库是可用的,但也许我可以用我的应用程序包括它,如果它是最好的解决办法。
目前我创造我自己的MyHeaderParser对象,并使用它与减少()上的头部字符串。 它的工作,但很脆弱。
辉煌的解决方案由以下纳迪亚:
import re
reg = re.compile('(\w+)[=] ?"?(\w+)"?')
s = """Digest
realm="stackoverflow.com", username="kixx"
"""
print str(dict(reg.findall(s)))
Answer 1:
有一点正则表达式:
import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')
>>>dict(reg.findall(headers))
{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}
Answer 2:
您还可以使用的urllib2作为CheryPy一样。
这里是片段:
input= """
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
items = urllib2.parse_http_list(value)
opts = urllib2.parse_keqv_list(items)
opts['protocol'] = 'Digest'
print opts
它输出:
{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': 'testrealm@host.com', 'response': '6629fae49393a05397450978507c4ef1'}
Answer 3:
这里是我的pyparsing尝试:
text = """Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41" """
from pyparsing import *
AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)
valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict
print authentry.parseString(text).dump()
其打印:
['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', 'testrealm@host.com'],
['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'],
['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: testrealm@host.com
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear
我不熟悉的RFC,但我希望这可以让你滚。
Answer 4:
如果这些组件将永远在那里,然后一个正则表达式会做的伎俩:
test = '''Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''
import re
re_auth = re.compile(r"""
Authorization:\s*(?P<protocol>[^ ]+)\s+
qop="(?P<qop>[^"]+)",\s+
realm="(?P<realm>[^"]+)",\s+
username="(?P<username>[^"]+)",\s+
response="(?P<response>[^"]+)",\s+
cnonce="(?P<cnonce>[^"]+)"
""", re.VERBOSE)
m = re_auth.match(test)
print m.groupdict()
生产:
{ 'username': 'Foobear',
'protocol': 'Digest',
'qop': 'chap',
'cnonce': '5ccc069c403ebaf9f0171e9517f40e41',
'realm': 'testrealm@host.com',
'response': '6629fae49393a05397450978507c4ef1'
}
Answer 5:
我建议找到一个正确的库来解析HTTP头不幸的是我不能reacall任何。 :(
有一段时间检查下面的代码段(应该主要工作):
input= """
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foob,ear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
field, sep, value = input.partition(":")
if field.endswith('Authorization'):
protocol, sep, opts_str = value.strip().partition(" ")
opts = {}
for opt in opts_str.split(",\n"):
key, value = opt.strip().split('=')
key = key.strip(" ")
value = value.strip(' "')
opts[key] = value
opts['protocol'] = protocol
print opts
Answer 6:
您原来使用PyParsing的概念将是最好的办法。 什么你已经隐含要求的东西,需要一个语法...也就是说,正则表达式或简单的解析程序总是会变脆,这听起来就像是你试图避免的事情。
看来,在谷歌应用程序引擎越来越pyparsing很简单: 我如何获得PyParsing建立在谷歌应用程序引擎?
所以我想与去,然后实现从RFC2617的完整的HTTP认证/授权头的支持。
Answer 7:
HTTP摘要授权报头字段是有点奇数兽。 它的格式是类似的RFC 2616的Cache-Control和Content-Type头字段,但只是不同够不兼容。 如果你还在寻找一个更聪明一点,比正则表达式更易读库,你可以尝试删除授权:摘要与部分str.split()和解析,其余parse_dict_header)(从WERKZEUG的HTTP模块。 (WERKZEUG可以安装在应用程序引擎。)
Answer 8:
Nadia的正则表达式只匹配字母数字字符的参数值。 这意味着它无法解析至少两个字段。 即,uri和QOP。 根据RFC 2617,该URI字段是在请求线(即,HTTP请求的第一行)的字符串的副本。 和QOP无法正确解析,如果值是“AUTH-INT”由于非字母数字“ - ”。
这种改进的正则表达式允许URI(或任何其他值),以遏制任何东西,但“”(空格),“'(qoute),或‘’(逗号)。这可能是更宽松的比它需要的,但不应该”吨导致与正确形成的HTTP请求的任何问题。
reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')
特别提示:从那里,这是相当简单的示例代码转换在RFC-2617到Python。 使用Python的MD5 API “MD5Init()” 变为 “M = md5.new()”, “MD5Update()” 变为 “m.update()” 和 “MD5Final()” 变为 “m.digest()”。
Answer 9:
一个旧的问题,但一个我觉得非常有帮助。
我需要一个解析器来处理任何适当形成Authorization头,通过定义RFC7235 (举起你的手,如果你喜欢阅读ABNF)。
Authorization = credentials
BWS = <BWS, see [RFC7230], Section 3.2.3>
OWS = <OWS, see [RFC7230], Section 3.2.3>
Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS
challenge ] )
Proxy-Authorization = credentials
WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge
] )
auth-param = token BWS "=" BWS ( token / quoted-string )
auth-scheme = token
challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *(
OWS "," [ OWS auth-param ] ) ] ) ]
credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param )
*( OWS "," [ OWS auth-param ] ) ] ) ]
quoted-string = <quoted-string, see [RFC7230], Section 3.2.6>
token = <token, see [RFC7230], Section 3.2.6>
token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )
*"="
与开始PaulMcG的答案,我想出了这个:
import pyparsing as pp
tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
auth_parser = header + pp.Suppress(':') + credentials
这使得在分析任何授权头:
parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))
其输出:
Authenticating with Basic scheme, token: Zm9vOmJhcg==
把它全部的Authenticator
类:
import pyparsing as pp
from base64 import b64decode
import re
class Authenticator:
def __init__(self):
"""
Use pyparsing to create a parser for Authentication headers
"""
tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
auth_header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
# the moment of truth...
self.auth_parser = auth_header + pp.Suppress(':') + credentials
def authenticate(self, auth_header):
"""
Parse auth_header and call the correct authentication handler
"""
authenticated = False
try:
parsed = self.auth_parser.parseString(auth_header)
scheme = parsed['scheme']
details = parsed['token'] if 'token' in parsed.keys() else parsed['params']
print('Authenticating using {0} scheme'.format(scheme))
try:
safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
handler = getattr(self, 'auth_handle_' + safe_scheme)
authenticated = handler(details)
except AttributeError:
print('This is a valid Authorization header, but we do not handle this scheme yet.')
except pp.ParseException as ex:
print('Not a valid Authorization header')
print(ex)
return authenticated
# The following methods are fake, of course. They should use what's passed
# to them to actually authenticate, and return True/False if successful.
# For this demo I'll just print some of the values used to authenticate.
@staticmethod
def auth_handle_basic(token):
print('- token is {0}'.format(token))
try:
username, password = b64decode(token).decode().split(':', 1)
except Exception:
raise DecodeError
print('- username is {0}'.format(username))
print('- password is {0}'.format(password))
return True
@staticmethod
def auth_handle_bearer(token):
print('- token is {0}'.format(token))
return True
@staticmethod
def auth_handle_digest(params):
print('- username is {0}'.format(params['username']))
print('- cnonce is {0}'.format(params['cnonce']))
return True
@staticmethod
def auth_handle_aws4_hmac_sha256(params):
print('- Signature is {0}'.format(params['Signature']))
return True
为了测试这个类:
tests = [
'Authorization: Digest qop="chap", realm="testrealm@example.com", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
'Authorization: Bearer cn389ncoiwuencr',
'Authorization: Basic Zm9vOmJhcg==',
'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]
authenticator = Authenticator()
for test in tests:
authenticator.authenticate(test)
print()
其输出:
Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41
Authenticating using Bearer scheme
- token is cn389ncoiwuencr
Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar
Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024
Authenticating using CrazyCustom scheme
This is a valid Authorization header, but we do not handle this scheme yet.
在未来,如果我们想处理CrazyCustom我们只是添加
def auth_handle_crazycustom(params):
Answer 10:
如果你的反应是在一个单一的字符串, 从来没有变化,因为有表达式匹配有尽可能多的行 ,你可以把它拆分成所谓新行的数组authentication_array
和使用正则表达式:
pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}
for line in authentication_array:
pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
match = re.search(re.compile(pattern), line) # make the match
if match:
parsed_dict[match.group(1)] = match.group(2)
i += 1
文章来源: Parse an HTTP request Authorization header with Python