Regex - Remove everything before first comma and e

2020-04-30 19:38发布

问题:

I have the following string:

55,1001wuensche.com,0,354137264,1,"0.00 %",0,"0.00 %","2016-04-24 09:00:24"
56,100hoch3.de,47,2757361,2,"0.00 %",0,"0.00 %","2016-02-11 00:42:10"

I want to remove everything before the first comma: 55, and 56,

AND everything after the second comma.

The result should look like this, where only the domain name is left:

1001wuensche.com
100hoch3.de

I'm using Notepad++ to accomplish this. Anybody got an idea? Thanks for your help in advance!

回答1:

^.*?,(.*?),.*$

The capture group $1 will be everything between the first two commas.



回答2:

You could search for ^[^,]+,([^,]+).* and replace it with $1.

If there is a chance of non-well formatted lines (containing empty strings before the first comma or lines without comma) you could use a more strict pattern like ^[^,\r\n]*,([^,\r\n]+).+ instead.



回答3:

Another way to do this sort of thing (in a more general sense) is to "split the line by commas, into an array, then take only the second element of that array.

Yet-another way to do it is to execute two "substitute" regexes, both explicitly anchored to the beginning or to the end of the line (and the first being non-"greedy" e.g.:

s/^.*\?,//

s/\,.*$//

The concept of "greediness" is quite important, because in the first case we want to match the least number of characters, so as to stop at the first comma that is encountered. (Hence, "non-greedy.") Whereas, in the second case, you do want to "greedily" identify (and set to empty-string) the biggest match that you can find: namely, "the rest of the string."

Find the simplest and most obvious way to do it, because, quite inevitably, someone's going to want to change this logic someday. Or, someone will hand you a file that breaks your "clever, elegant" approach. Think "testable, and maintainable."