Linux Command Line using for loop and formatting r

2020-04-17 06:43发布

问题:

How can I use one command line to provide a list of all files between a certain size and then format the file with name, md5 has and the file size.

The example output should be

file1.***     MD5 value   size
file2.***     MD5 value   size etc.

Ive tried the following but it displays the md5 on a separate line

find 'directory' -size +30000c -size -50000c | 
while read filename 
   do ls -l "$filename" | awk '{print $9 "\t" $5}' 
   md5sum "$filename" | awk '{print $1}' 
done

It outputs the follow with the MD5 on a seperate line

file1.***   size
MD5

file2.***   size
MD5

回答1:

You are very close, just a few fixes needed:

#!/bin/bash
find ./path/to/dir -type f -size +30000c -size -50000c -printf '%s %p\n' |
while read -r size filename; do
    md5=$(md5sum "$filename" | awk '{print $1}')
    printf "%-30s %s %10s\n" "$filename" "$md5" "$size"
done

To produce something like:

./CHECKSUM                     36e371280a17372537a78167ce22b773        30400
./Makefile                     d21464a020be753a9d821cba58f046bc        40000

Let's start with the find. We can get the filename (path) and size directly from find via -printf action. The %p specifies the full file name (relative path) and %s the size of the file. We put %s first, so read can parse it, in case the filename contains spaces. Also, we're only interested in files, so we'll use the -type f filter.

Next, read can read multiple fields (separated by IFS, which defaults to space, newline and tab). If there are more fields than variables given, the last variable will hold all the remaining fields. Also, we use -r to prevent (special) interpretation of escaped characters in input. For each line read (assuming your filenames do not contain newlines), we calculate the MD5 sum with the command you already use.

Finally, we use shell's built-in printf to format and print all the fields. Formatting mini language is similar to C's printf: %-30s means left-aligned 30-characters wide string field, for example.

Bonus points: handling filenames with newlines. The one character Unix filenames possibly can not contain is the NULL (\0) character. Although bash is not particularly good at processing binary (non-text) data, we can still do it:

#!/bin/bash
find ./path/to/dir -type f -size +30000c -size -50000c -printf '%s %p\0' |
while read -r -d '' size filename; do
    md5=$(md5sum "$filename" | awk '{print $1}')
    display_name=$(echo -n "$filename" | tr '\n' '?')
    printf "%-30s %s %10s\n" "$display_name" "$md5" "$size"
done

First, we use \0 in -printf to separate records find outputs, with the matching read -d ''. To make filenames suitable for printing in one line, we have to replace (but only for display) all newlines \n with something like ?. We can use tr for that, combined with echo -n (note we can't use a here-string <<<"$filename" instead echo because here-string adds a trailing newline).



回答2:

You can use rhash for this simple task

find dir/ -type f -size +30000c -size -50000c -exec rhash -p "%p %m %s\n" {} \;
  • -p prints in custom format
  • %p for file path, %m for md5sum and %s for file size in bytes


回答3:

Rather than piping find to a set of commands, just call those commands directly in find:

find /p/a/t/h -size +30000c -size -50000c -exec sh -c '
    printf "%s\t" "$1"; md5sum "$1" | cut -d " " -f 1 | tr -d \\n;
    printf "\t";
    stat -c %s "$1"' _ {} \;

Note that stat is non-standard, but the above works for debian. You may need stat -f %z. YMMV