I am trying to search for filenames in a comma-separated list in:
text.txt,temp_doc.doc,template.tmpl,empty.zip
I use Java's regex implementation. Requirements for output are as follows:
- Display only filenames and not their respective extensions
- Exclude files that begin with "temp_"
It should look like:
text
template
empty
So far I have managed to write more or less satisfactory regex to cope with the first task:
[^\\.,]++(?=\\.[^,]*+,?+)
I believe to make it comply with the second requirement best option is to use lookaround constructs, but not sure how to write a reliable and optimized expression. While the following regex does seem to do what is required, it is obviously a flawed solution if for no other reason than it relies on explicit maximum filename length.
(?!temp_|emp_|mp_|p_|_)(?<!temp_\\w{0,50})[^\\.,]++(?=\\.[^,]*+,?+)
P.S. I've been studying regexes only for a few days, so please don't laugh at this newbie-style overcomplicated code :)
How about this:
This also allows dots within filenames.
Another option:
That pattern will match all file names, but will capture only valid names.
temp_file.ext
, it matches it and does not capture.temp_
, it tires to match([^,.]*)\.[^,]*
, and capture the file's name.You can see an example here: http://www.rubular.com/r/QywiDgFxww
One variant would be like this:
This allows
file.name.ext
will be matched asfile.name
)But actually, this is really complex. You'll be better off writing a small function that splits the input at the commas and strips the extension from the parts.
Anyway, here's the tear-down:
http://rubular.com/r/4jeHhsDuJG