What's the correct regexp pattern to match a V

2019-05-15 10:52发布

问题:

The documentation at http://h71000.www7.hp.com/doc/731final/documentation/pdf/ovms_731_file_app.pdf (section 5-1) says the filename should look like this:

node::device:[root.][directory-name]filename.type;version

Most of them are optional (like node, device, version) - not sure which ones and how to correctly write this in a regexp, (including the directory name):

DISK1:[MYROOT.][MYDIR]FILE.DAT

DISK1:[MYDIR]FILE.DAT

[MYDIR]FILE.DAT

FILE.DAT;10

NODE::DISK5:[REMOTE.ACCESS]FILE.DAT

回答1:

See the documentation and source for the VMS::Filespec Perl module.



回答2:

From wikipedia, the full form is actually a bit more than that:

NODE"accountname password"::device:[directory.subdirectory]filename.type;ver

This one took a while, but here is an expression that should accept all valid variations, and place the components into capture groups.

(?:(?:(?:([^\s:\[\]]+)(?:"([^\s"]+) ([^\s"]+)")?::)?([^\s:\[\]]+):)?\[([^\s:\[\]]+)\])?([^\s:\[\]\.]+)(\.[^\s:\[\];]+)?(;\d+)?

Also, from what I can tell, your example of

DISK1:[MYROOT.][MYDIR]FILE.DAT

is not a valid name. I believe only one pair of brackets are allowed. I hope this helps!



回答3:

You could probably come up with a single complicated regex for this, but it will be much easier to read your code if you work your way from left to right stripping off each section if it is there. The following is some Python code that does just that:

lines = ["DISK1:[MYROOT.][MYDIR]FILE.DAT", "DISK1:[MYDIR]FILE.DAT", "[MYDIR]FILE.DAT", "FILE.DAT;10", "NODE::DISK5:[REMOTE.ACCESS]FILE.DAT"]
node_re = "(\w+)::"
device_re = "(\w+):"
root_re = "\[(\w+)\.]"
dir_re = "\[(\w+)]"
file_re = "(\w+)\."
type_re = "(\w+)"
version_re = ";(.*)"
re_dict = {"node": node_re, "device": device_re, "root": root_re, "directory": dir_re, "file": file_re, "type": type_re, "version": version_re}
order = ["node", "device", "root", "directory", "file", "type", "version"]
for line in lines:
    i = 0
    print line
    for item in order:
        m = re.search(re_dict[item], line[i:])
        if m is not None:
            print "  " + item + ": " + m.group(1)
            i += len(m.group(0))

and the output is

DISK1:[MYROOT.][MYDIR]FILE.DAT
  device: DISK1
  root: MYROOT
  directory: MYDIR
  file: FILE
  type: DAT
DISK1:[MYDIR]FILE.DAT
  device: DISK1
  directory: MYDIR
  file: FILE
  type: DAT
[MYDIR]FILE.DAT
  directory: MYDIR
  file: FILE
  type: DAT
FILE.DAT;10
  file: FILE
  type: DAT
  version: 10
NODE::DISK5:[REMOTE.ACCESS]FILE.DAT
  node: NODE
  device: DISK5
  directory: REMOTE.ACCESS
  file: FILE
  type: DAT