How can I use one command line to provide a list of all files between a certain size and then format the file with name, md5 has and the file size.
The example output should be
file1.*** MD5 value size
file2.*** MD5 value size etc.
Ive tried the following but it displays the md5 on a separate line
find 'directory' -size +30000c -size -50000c |
while read filename
do ls -l "$filename" | awk '{print $9 "\t" $5}'
md5sum "$filename" | awk '{print $1}'
done
It outputs the follow with the MD5 on a seperate line
file1.*** size
MD5
file2.*** size
MD5
You are very close, just a few fixes needed:
#!/bin/bash
find ./path/to/dir -type f -size +30000c -size -50000c -printf '%s %p\n' |
while read -r size filename; do
md5=$(md5sum "$filename" | awk '{print $1}')
printf "%-30s %s %10s\n" "$filename" "$md5" "$size"
done
To produce something like:
./CHECKSUM 36e371280a17372537a78167ce22b773 30400
./Makefile d21464a020be753a9d821cba58f046bc 40000
Let's start with the find
. We can get the filename (path) and size directly from find
via -printf
action. The %p
specifies the full file name (relative path) and %s
the size of the file. We put %s
first, so read
can parse it, in case the filename contains spaces. Also, we're only interested in files, so we'll use the -type f
filter.
Next, read
can read multiple fields (separated by IFS
, which defaults to space, newline and tab). If there are more fields than variables given, the last variable will hold all the remaining fields. Also, we use -r
to prevent (special) interpretation of escaped characters in input. For each line read (assuming your filenames do not contain newlines), we calculate the MD5 sum with the command you already use.
Finally, we use shell's built-in printf
to format and print all the fields. Formatting mini language is similar to C's printf: %-30s
means left-aligned 30-characters wide string field, for example.
Bonus points: handling filenames with newlines. The one character Unix filenames possibly can not contain is the NULL (\0
) character. Although bash
is not particularly good at processing binary (non-text) data, we can still do it:
#!/bin/bash
find ./path/to/dir -type f -size +30000c -size -50000c -printf '%s %p\0' |
while read -r -d '' size filename; do
md5=$(md5sum "$filename" | awk '{print $1}')
display_name=$(echo -n "$filename" | tr '\n' '?')
printf "%-30s %s %10s\n" "$display_name" "$md5" "$size"
done
First, we use \0
in -printf
to separate records find
outputs, with the matching read -d ''
. To make filenames suitable for printing in one line, we have to replace (but only for display) all newlines \n
with something like ?
. We can use tr
for that, combined with echo -n
(note we can't use a here-string <<<"$filename"
instead echo
because here-string adds a trailing newline).
You can use rhash
for this simple task
find dir/ -type f -size +30000c -size -50000c -exec rhash -p "%p %m %s\n" {} \;
-p
prints in custom format
%p
for file path, %m
for md5sum and %s
for file size in bytes
Rather than piping find to a set of commands, just call those commands directly in find:
find /p/a/t/h -size +30000c -size -50000c -exec sh -c '
printf "%s\t" "$1"; md5sum "$1" | cut -d " " -f 1 | tr -d \\n;
printf "\t";
stat -c %s "$1"' _ {} \;
Note that stat
is non-standard, but the above works for debian. You may need stat -f %z
. YMMV