I have a little script that lists paths to all files in a directory and all subdirectories and parses each path on the list with regex in Perl.
#!/bin/sh
find * -type f | while read j; do
echo $j | perl -n -e '/\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/ && print "\"0\";\"$1$2$3\";\"$4\";\"$5\";$fl\""' >> bss.csv
echo | readlink -f -n "$j" >>bss.csv
echo \">>bss.csv
done
Output:
"0";"13957";"4121113";"2";"/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
I am using the readlink
from GNU coreutils: -n
suppresses newline at the end, -f
performs canonicalization by recursively following symlinks on the path.
Problem is, when input string did not pass regex I have only line with file path.
How can I add condition to check if regex passed - show path, else - no. I broke my brain with various combinations, but didn't find any that work properly.
Description of solution
In Perl, use
if (/…/) {…} else {…}
instead of/…/ && …
. Thus you can execute print if match is successful and some other code otherwise.If this is not the problem and you only want to get rid of the
readlink
output and closing quote, you can callreadlink
from Perl using backticks.Resulting code
I turned everything into a single Perl program, used
File::Find
instead offind
command, assumed$fl
at the end ofprint
in Perl is a relict (ignored it) and usedCwd::realpath()
to find canonical path of the file instead ofreadlink -f
from GNU coreutils. If you still want to usereadlink -f
, feel free to changeCwd::realpath($_)
to`readlink -f '$_'`
(including the backticks!), but then it will not work for filenames containing a single-quote.You should call this script as
./script-name starting-directory > bss.csv
. If you put it in the directory you are examining, the output would contain it too, along with thebss.csv
.For reference I also enclose polished version of the original program. It is calling
readlink
from Perl as I suggested above and really utilizes the-n
option of Perl, avoiding thewhile read
loop.Other remarks to the original code
echo |
before thereadlink
does nothing and should be removed. Readlink does not read its stdin.$fl
at the end ofprint
in Perl come from? I assume it is a relict.qq{}
and thoughtful use of delimiters (e.g. in regex matching and other quote-like operators) can save you from quoting hell. I already used this tip above:/…/
→m{…}
and"…"
→qq{…}
. Thx, Slade! See perlop manpage for more info.If I understand you, you want to capture the following parts of the filename:
But your perl regex doesn't do that. Let's break it apart for better understanding.
Sliced into pieces, this would be...
\/(\d{2})
- a slash then two digits (with the digits captured)\/(\d{2})
- another slash and two digits\/(\d)
- one more slash and any number of digits.*-
- any run of characters until the final hyphen in the input string([a-zA-Z]+)
- one or more alpha characters(?:_(\d{1}))?
- nonsensical (I think) construct matching an optional single digit that won't be captured (because it's inside a(?:...)
)If you step through your filename, you'll see that there is nothing here to handle the second last string of digits.
I'd do this using simpler tools. Sed, for example:
I'll break up the sed script for easier reading:
s/.*/"&"/;
- Put quotes around the filename.h;
- Store the filename in Sed's "hold" space, for future use...s:
- Start the big substitution....*/([0-9]{2})/([0-9]{2})/([0-9]+)[^[a-zA-Z]]*[^-]+-([0-9]+)(_([0-9]+))?.*
- This is the pattern we want to match for substitution. Similar to what you did in Perl, obviously, but using ERE instead of PCRE.:"0";"\1\2\3";"\4";"\6":;
- The replacement pattern, with\n
being replaced by the bracketed elements of the RE. Note that\5
is skipped in the replace string, as that subexpression is only being used for the match.G;
- Append the "hold" space to the pattern spaces/\n/;/;
- and remove the newline between them.p
- Print the result.Note that this solution, as is, assumes that all input lines match the pattern you're looking for. If that's not the case, then you may get unpredictable output, and should put some pattern matching into the script.