I'm trying to write a regex that will parse out the directory and filename of a fully qualified path using matching groups.
so...
/var/log/xyz/10032008.log
would recognize group 1 to be "/var/log/xyz"
and group 2 to be "10032008.log"
Seems simple but I can't get the matching groups to work for the life of me.
NOTE: As pointed out by some of the respondents this is probably not a good use of regular expressions. Generally I'd prefer to use the file API of the language I was using. What I'm actually trying to do is a little more complicated than this but would have been much more difficult to explain, so I chose a domain that everyone would be familiar with in order to most succinctly describe the root problem.
Reasoning:
I did a little research through trial and error method. Found out that all the values that are available in keyboard are eligible to be a file or directory except '/' in *nux machine.
I used touch command to create file for following characters and it created a file.
It failed only when I tried creating '/' (because it's root directory) and filename container
/
because it file separator.And it changed the modified time of current dir
.
when I didtouch .
. However, file.log is possible.And of course,
a-z
,A-Z
,0-9
,-
(hypen),_
(underscore) should work.Outcome
So, by the above reasoning we know that a file name or directory name can contain anything except
/
forward slash. So, our regex will be derived by what will not be present in the file name/directory name.Step by Step regexp creation process
Pattern Explanation
Step-1: Start with matching
root
directoryA directory can start with
/
when it is absolute path and directory name when it's relative. Hence, look for/
with zero or one occurrence.Step-2: Try to find the first directory.
Next, a directory and its child is always separated by
/
. And a directory name can be anything except/
. Let's match /var/ first then.Step-3: Get full directory path for the file
Next, let's match all directories
Here, single_dir is
yz/
because, first it matchedvar/
, then it found next occurrence of same pattern i.e.log/
, then it found the next occurrence of same patternyz/
. So, it showed the last occurrence of pattern.Step-4: Match filename and clean up
Now, we know that we're never going to use the groups like single_dir, filepath, root. Hence let's clean that up.
Let's keep them as groups however don't capture those groups.
And rest_of_the_path is just the filename! So, rename it. And a file will not have
/
in its name, so it's better to keep[^/]
This brings us to the final result. Of course, there are several other ways you can do it. I am just mentioning one of the ways here.
Regex Rules used above are listed here
^
means string starts with(?P<dir>pattern)
means capture group by group name. We have two groups with group namedir
andfile
(?:pattern)
means don't consider this group or non-capturing group.?
means match zero or one.+
means match one or more[^\/]
means matches any char except forward slash (/
)[/]?
means if it is absolute path then it can start with / otherwise it won't. So, match zero or one occurrence of/
.[^\/]+/
means one or more characters which aren't forward slash (/
) which is followed by a forward slash (/
). This will matchvar/
orxyz/
. One directory at a time.In languages that support regular expressions with non-capturing groups:
I'll explain the gnarly regex by exploding it...
What the parts mean:
Example
To test the regular expression, I used the following Perl script...
The output of the script...
Try this:
EDIT: escaped the forward slash to prevent problems when copy/pasting the Regex
A very late answer, but hope this will help
This uses lazy check for
/
, and I just modified the accepted answerhttp://regex101.com/r/gV2xB7/1
What about this?
Deterministic :
Strict :
What language? and why use regex for this simple task?
If you must:
gives you the two parts you wanted. You might need to quote the parentheses:
depending on your preferred language syntax.
But I suggest you just use your language's string search function that finds the last "/" character, and split the string on that index.