Extracting number preceding a particular text usin

2019-07-20 01:44发布

问题:

I'm looking for a regex to extract two numbers from the same text (they can be run independently, no need to extract them both in one go.

I'm using yahoo pipes.

Source Text: S$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)

Need to extract as a number: 1,475 and also (but can be extracted on a separate instance) Need to extract as a number: 137

I got the following pattern from someone quite helpful on a different forum:

\b(\d+(,\d+)*)\s+(sqft|sqm)

but when i go and use it with a replace $1, it brings back the whole source text instead of just the numbers i want (ie. 1,475 or 137 depending on whether i run \b(\d+(,\d+))\s+(sqft) or \b(\d+(,\d+))\s+(sqm)

what am i doing wrong?

回答1:

Well you could do this by iterating through the matches and getting the results that way.

But if you want to use the replace method then this could work:

^.*?(?<sqft>\d+(,\d+)*)\s?sqft.*?(?<sqm>\d+(,\d+)*)\s?sqm.*$

And then replace with:

${sqft}
${sqm}

Here it is in action.

This will work with or without a comma in the sqft or sqm numbers. And the .* at the beginning, middle, and end forces it to match the entire string so that the replacement text eliminates everything except for what you're after.



回答2:

Since you didn't specify a language, here is some Python:

import re

s = "$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)"
print re.search(r'\b([0-9.,]+) ?sqft ?/ ?([0-9.,]+) ?sqm', s).groups()
# prints ('1,475', '137')

Searches for any number, comma, or period after a word boundary, followed by an optional space, and the word 'sqft', then an optional space, a slash, an optional space space, followed by any number, comma, or period, an optional space, the word 'sqm'.

This should allow your formatting to be pretty loose (optional spaces, thousands and decimal separators).



回答3:

In perl, I would write something like:

if ($line ~= m/\b([0-9.,]+) sqft/)
{
  $sqft = $1;
}
else
{
  $sqft = undef;
}

if ($line ~= m/\b([0-9.,]+) sqm/)
{
  $sqm = $1;
}
else
{
  $sqm = undef;
}


回答4:

You may wish to consider the situations discussed in this answer in crafting a regex for numbers.