sscanf in Python

I'm looking for an equivalent to sscanf() in Python. I want to parse /proc/net/* files, in C I could do something like this:

int matches = sscanf(
        buffer,
        "%*d: %64[0-9A-Fa-f]:%X %64[0-9A-Fa-f]:%X %*X %*X:%*X %*X:%*X %*X %*d %*d %ld %*512s\n",
        local_addr, &local_port, rem_addr, &rem_port, &inode);

I thought at first to use str.split, however it doesn't split on the given characters, but the sep string as a whole:

>>> lines = open("/proc/net/dev").readlines()
>>> for l in lines[2:]:
>>>     cols = l.split(string.whitespace + ":")
>>>     print len(cols)
1

Which should be returning 17, as explained above.

Is there a Python equivalent to sscanf (not RE), or a string splitting function in the standard library that splits on any of a range of characters that I'm not aware of?

标签： python parsing split scanf procfs

11条回答

劫难

2楼-- · 2019-01-04 01:23

There is also the parse module.

parse() is designed to be the opposite of format() (the newer string formatting function in Python 2.6 and higher).

>>> from parse import parse
>>> parse('{} fish', '1')
>>> parse('{} fish', '1 fish')
<Result ('1',) {}>
>>> parse('{} fish', '2 fish')
<Result ('2',) {}>
>>> parse('{} fish', 'red fish')
<Result ('red',) {}>
>>> parse('{} fish', 'blue fish')
<Result ('blue',) {}>

0人赞添加讨论(0) 举报

时光不老，我们不散

3楼-- · 2019-01-04 01:29

There is an ActiveState recipe which implements a basic scanf http://code.activestate.com/recipes/502213-simple-scanf-implementation/

0人赞添加讨论(0) 举报

做自己的国王

4楼-- · 2019-01-04 01:33

If the separators are ':', you can split on ':', and then use x.strip() on the strings to get rid of any leading or trailing whitespace. int() will ignore the spaces.

0人赞添加讨论(0) 举报

Melony?

5楼-- · 2019-01-04 01:36

You can split on a range of characters using the re module.

>>> import re
>>> r = re.compile('[ \t\n\r:]+')
>>> r.split("abc:def  ghi")
['abc', 'def', 'ghi']

0人赞添加讨论(0) 举报

Emotional °昔

6楼-- · 2019-01-04 01:36

You can parse with module re using named groups. It won't parse the substrings to their actual datatypes (e.g. int) but it's very convenient when parsing strings.

Given this sample line from /proc/net/tcp:

line="   0: 00000000:0203 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 335 1 c1674320 300 0 0 0"

An example mimicking your sscanf example with the variable could be:

import re
hex_digit_pattern = r"[\dA-Fa-f]"
pat = r"\d+: " + \
      r"(?P<local_addr>HEX+):(?P<local_port>HEX+) " + \
      r"(?P<rem_addr>HEX+):(?P<rem_port>HEX+) " + \
      r"HEX+ HEX+:HEX+ HEX+:HEX+ HEX+ +\d+ +\d+ " + \
      r"(?P<inode>\d+)"
pat = pat.replace("HEX", hex_digit_pattern)

values = re.search(pat, line).groupdict()

import pprint; pprint values
# prints:
# {'inode': '335',
#  'local_addr': '00000000',
#  'local_port': '0203',
#  'rem_addr': '00000000',
#  'rem_port': '0000'}

0人赞添加讨论(0) 举报

姐就是有狂的资本

7楼-- · 2019-01-04 01:40

Upvoted orip's answer. I think it is sound advice to use re module. The Kodos application is helpful when approaching a complex regexp task with Python.

http://kodos.sourceforge.net/home.html

0人赞添加讨论(0) 举报

1 2 下一页

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间