i have tried all i know but still can't figure out how to resolve this problem :
i have a string ex :
"--included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees"
"--not included-- in selling price: us$ 35.00 express fees 2 % notifying fees"
i want to know if the taxes are "included" or "excluded" and if the fees are "%" or "currency" the problem is it doesn't detect the currency "usd" while it's attached to the taxe name "vat usd"
how can i separate the currency from the name of the taxe in different groups.
here is what i did
(--excluded--|--included--|--not included--)([a-z ]*)?:?(usd | aed | mad | € | us\$ )?([ \. 0-9 ]*)(%)?([a-z A-z ?]*) (aed|mad|€|us\$)*((aed|mad|€|us\$)+)?([\. 0-9 ]*)(%)?([a-z A-z]*)(.*)?
and here is what i got
Match 1
Full match 0-83 --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees
Group 1. 0-12 --included--
Group 2. 12-29 in selling price
Group 4. 30-33 5
Group 5. 33-34 %
Group 6. 34-42 vat usd
Group 10. 43-49 10.00
Group 12. 49-64 packaging fees
Group 13. 64-82 2 % notifying fees
and here is what i want
Match 1
Full match 0-83 --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees
Group 1. 0-12 --included--
Group 2. 12-29 in selling price
Group 4. 30-33 5
Group 5. 33-34 %
Group 6. 34-38 vat
Group 7. 38-42 usd
Group 10. 43-49 10.00
Group 12. 49-64 packaging fees
Group 13. 64-82 2 % notifying fees
Here is the solution:
See the PHP demo.
Notes:
The first regex extracts each match you need to parse. See the first regex demo. It means:
(--(?:(?:not )?in|ex)cluded--)
- Group 1: a shorter version of(--excluded--|--included--|--not included--)
:--excluded--
,--included--
or--not included--
(?:\s+([a-zA-Z ]+))?
- an optional sequence: 1+ whitespaces and then Group 2: 1+ ASCII letters or spaces:+
- 1 or more colons\s*
- 0+ whitespaces((?:(?!--(?:(?:not )?in|ex)cluded--).)*)
- Group 3: any char, 0+ occurrences, as many as possible, not starting any of the three char sequences:--excluded--
,--included--
,--not included--
Then, the Group 3 value needs to be further parsed to grab all the details. The second regex is used here to match
(?:(\b(?:usd|aed|mad|usd)\b|\B€|\bus\$)\s*)?
- an optional occurrence of(\b(?:usd|aed|mad|usd)\b|\B€|\bus\$)
- Group 1:\b(?:usd|aed|mad|usd)\b
-usd
,aed
,mad
orusd
as whole words\B€
-€
not preceded with a word char\bus\$
-us$
not preceded with a word char\s*
- 0+ whitespaces\d+
(?:\.\d+)?
- an optional sequence of.
and 1+ digits(?:(?!(?1))\D)*
- any non-digit char, 0 or more occurrences, as many as possible, not starting the same pattern as in Group 1