I'm using python regular expressions to extract dimensional information from a database. The entries in that column look like this:

23 cm
43 1/2 cm

20cm
15 cm x 30 cm

What I need from this is only the width of the entry (so for the entries with an 'x', only the first number), but as you can see the values are all over the place.

From what I understood in the documentation, you can access the groups in a match using their position, so I was thinking I could determine the type of the entry based on how many groups are returned and what is found at each index.

The expression I used so far is ^(\d{2})\s?(x\s?(\d{2}))?(\d+/\d+)?$, however it's not perfect and it returns a number of useless groups. Is there something more efficient and appropriate?

Edit: I need the number from every line. When there is only one number, it is implied that only the width was measured (including any fractional components such as line 2). When there are two numbers, the height was also measured, but I only need the width which is the first number (such as in the last line)

标签： python regex csv numbers data-processing

3条回答

聊天终结者

2楼-- · 2019-08-05 02:26

Here's a sample of how to do it from a text file. It works for the provided data.

     f = open("textfile.txt",r')

     for line in f :
         if 'x'in line:
             iposition = line.find('x')
             print(line[:iposition])

0人赞添加讨论(0) 举报

Ridiculous、

3楼-- · 2019-08-05 02:37

This regex should work (Live Demo)

^(\d+)(?:\s*cm\s+[xX])

Explanation

^(\d+) - capture at least one digit at the beginning of the line
(?: - start non-capturing group
\s* - followed by at least zero whitespace characters
cm - followed by a literal c and m
\s+ - followed by at least one whitespace character
[xX] - followed by a literal x or X
) - end non-capturing group

You shouldn't need to bother matching the rest of the line.

0人赞添加讨论(0) 举报

ら.Afraid

4楼-- · 2019-08-05 02:40

try regex below, it will capture 1st digits and optional fractional come after it before the 1st 'cm'

import re
regex = re.compile('(\d+.*?)\s?cm') # this will works for all your example data
# or
# this asserted whatever come after the 1st digit group must be fractional number only
regex = re.compile('(\d+(?:\s+\d+\/\d+)?)\s?cm') 


>>> regex.match('23 cm').group(1)
>>> '23' 
>>> regex.match('43 1/2 cm').group(1)
>>> '43 1/2'
>>> regex.match('20cm').group(1)
>>> '20'
>>> regex.match('15 cm x 30 cm').group(1)
>>> '15'

regex101 demo

0人赞添加讨论(0) 举报

Regular expression extracting number dimension

Explanation

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间