Matching Product Prices from an HTML text

2019-03-03 06:56发布

问题:

I'm trying a simple regex on a string for pricing information, but my preg_match_all is simply not finding what it should.

I'm looking for instance of e.g. $**.** or £**.** or sometimes the currency symbol might be encoded as an HTML entity e.g. for GBP £ or £

Is there an issue with using preg_match_all to find html entities?

Here's what I'm trying:

$price = preg_match_all(
    '#(?:\$|\£|\€|\£|\£)(\d+(?:\.\d+)?)#', 
    $string, 
    $matches
);

But I get: Unknown modifier '1'

回答1:

Here is some obvious errors:

1) preg_match_all() expects at least 3 parameters, so it has to be

preg_match_all(
    '#(?:\$|\£|\€|\£|\£)(\d+(?:\.\d+)?)#', 
    $string, 
    $matches
);

The $matches variable will contain the matched strings. Your $price will contain the number of times the pattern matched. Please see http://php.net/preg_match_all for further information.

2) You have an unescaped delimiter:

'#(?:\$|\£|\€|\£|\£)(\d+(?:\.\d+)?)#'
 ^                       ^                    ^
 Start                   Unescaped            End

Fixing these two issues will make the code run without any parsing errors. It should also answer your literal question about matching entities.

However, I somewhat doubt the Regex achieves what you are trying to do. Prices are not always listed [CurrencySymbol][Amount]. For instance, Euros are usually written as 100€ or 100 €. So you'd have to check for digits before the symbols and whitespace after as well.