Good morning guys
Is there a good way to use regular expression in C# in order to find all filenames and their paths within a string
variable?
For example, if you have this string:
string s = @"Hello John
these are the files you have to send us today: <file>C:\Development\Projects 2010\Accounting\file20101130.csv</file>, <file>C:\Development\Projects 2010\Accounting\orders20101130.docx</file>
also we would like you to send <file>C:\Development\Projects 2010\Accounting\customersupdated.xls</file>
thank you";
The result would be:
C:\Development\Projects 2010\Accounting\file20101130.csv
C:\Development\Projects 2010\Accounting\orders20101130.docx
C:\Development\Projects 2010\Accounting\customersupdated.xls
EDITED: Considering what told @Jim, I edited the string adding tags in order to make it easier to extract needed file names from string!
Here's something I came up with:
Produces: (see on ideone)
The regex is not extremely robust (it does make a few assumptions) but it worked for your examples as well.
Here is a version of the program if you use
<file>
tags. Change the regex andExtract
to:Also available on ideone.
If you use
<file>
tag and the final text could be represented as well formatted xml document (as far as being inner xml, i.e. text without root tags), you probably can do:or
Both method really works and are highly object-oriented, especially the second one.
And will bring rather more performance.
See also - Don't parse (X)HTML using RegEx
If you put some constraints on your filename requirements, you can use code similar to this:
In this case, I limited extensions to a length of 1-5 characters. You can obviously use another value or restrict the characters allowed in filename extensions further. The list of valid characters is taken from the MSDN article Naming Files, Paths, and Namespaces.