help using command line to extract snippets of dat

2019-09-03 09:03发布

问题:

I would like the option of extracting the following string/data:

/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample

=or=

25
myproxy
sample

But it would help if I see both.

From this output using cut or perl or anything else that would work:

Found 3 items
drwxr-xr-x   - foo_hd foo_users          0 2011-03-16 18:46 /work/foo/processed/25
drwxr-xr-x   - foo_hd foo_users          0 2011-04-05 07:10 /work/foo/processed/myproxy
drwxr-x---   - foo_hd testcont           0 2011-04-08 07:19 /work/foo/processed/sample

Doing a cut -d" " -f6 will get me foo_users, testcont. I tried increasing the field to higher values and I'm just not able to get what I want.

I'm not sure if cut is good for this or something like perl? The base directories will remain static /work/foo/processed.

Also, I need the first line Found Xn items removed. Thanks.

回答1:

Try this out :

<Your Command> | grep -P -o '[\/\.\w]+$' 

OR if the directory '/work/foo/processed' is always static then:

<Your Command>| grep -P -o '\/work\/foo\/processed\/.+$' 

-o : Show only the part of a matching line that matches PATTERN.
-P : Interpret PATTERN as a Perl regular expression.

In this example, the last word in the input will be matched . (The word can also contain dot(s)),so file names like 'text_file1.txt', can be matched). Ofcourse, you can change the pattern, as per your requirement.



回答2:

You can do a substitution from beginning to the first occurrence of / , (non greedily)

$ your_command | ruby -ne  'print $_.sub(/.*?\/(.*)/,"/\\1") if /\//'
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample

Or you can find a unique separator (field delimiter) to split on. for example, the time portion is unique , so you can split on that and get the last element. (2nd element)

$ ruby -ne  'print $_.split(/\s+\d+:\d+\s+/)[-1] if /\//' file
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample

With awk,

$ awk -F"[0-9][0-9]:[0-9][0-9]" '/\//{print $NF}' file
 /work/foo/processed/25
 /work/foo/processed/myproxy
 /work/foo/processed/sample


回答3:

perl -lanF"\s+" -e 'print @F[-1] unless /^Found/' file 

Here is an explanation of the command-line switches used:

-l: remove line break from each line of input, then add one back on print
-a: auto-split each line of input into an @F array
-n: loop through each line of input
-F: the regexp pattern to use for the auto-split (with -a)
-e: the perl code to execute (for each line of input if using -n or -p)

If you want to just output the last portion of your directory path, and the basedir is always '/work/foo/processed', I would do this:

perl -nle 'print $1 if m|/work/foo/processed/(\S+)|' file


回答4:

If you know the columns will be the same, and you always list the full path name, you could try something like:

ls -l | cut -c79-

which would cut out the 79th character until the end. That might work in this exact case, but I think it would be better to find the basename of the last field. You could easily do this in awk or perl. Respond if this is not what you want and I'll add the awk and perl versions.



回答5:

take the output of your ls command and pipe it to awk

your command|awk -F'/' '{print $NF}' 


回答6:

your_command | perl -pe 's#.*/##'