Extracting unique id from file name

2019-08-08 12:56发布

问题:

I am organizing text files in a directory by placing them in subdirectories. The subdirectory name is derived from the original file name. In this way it would be easy to tell which file belongs to its respective folder. The bash for loop iterates through all the txt files and creates the folder accordingly. The text files have the following example format: xxxx-test_file1-aa1-a2.txt or xxxx-test_file1-aa1--2.txt. Based on the first example, the only two things that matter for the name of the about to be created folder is xxxx and -aa1-a2(always have a 6 character unique id at the end). Therefore the new folder would be named xxxx-aa1-a2. The below only works for extracting the right name for certain files and breaks with other.

FILE_PATH="/my_files/"
for file in "$FILE_PATH"/*.txt; do
    tmp=${file#*-}; head=${file%-"$tmp"}
    mid=${tmp%-*}; tail=${tmp#"$mid"-}
    base="${head,,}-${tail,,}"
    dir=${base%.txt}
    mkdir -p "$dir"
    mv "$file" "$dir/$base"
done

${var#prefix} expands to the value of var with prefix removed, and ${var%suffix} correspondingly performs the same substitution with a suffix. Finally, ${var,,} produces the lowercase version of the value. Then we simply assemble the file name structure you want from those parts.

The above works if the file only has two -: xxxx-test_file-aaasw1 or xxxx-test_file-bswb2u

|-- ./
|   |-- xxxx-aaasw1
|       |--xxxx-test_file-aaasw1.txt
|   |-- xxxx-bswb2u
        |--xxxx-test_file-bswb2u.txt.

But it breaks if the file has more than two -:xxxx-test_file-caa-v3u or xxxx-test_file-caa-v3-

|-- ./
|   |-- xxxx-v3u
|       |-- xxxx-test_file-caa-v3u.txt
|   |-- xxxx-
        |-- xxxx-test_file-caa-v3-.txt.

回答1:

So, you want the directory to be named "a-b", where a is everything up to the first dash and b is everything between the last dash and the first dot?

touch xxxx-test_file-aaasw1
touch xxxx-test_file-bswb2u
touch xxxx-test_file-caa-v3u.txt
touch xxxx-test_file-caa-v3-.txt

for f in *
do
    head=$(cut -f1  -d'-' <<< "$f")
     mid=$(cut -f2  -d'-' <<< "$f")
    tail=$(cut -f3- -d'-' <<< "$f" | cut -f 1 -d .)
     ext=$(cut -f3- -d'-' <<< "$f" | cut -f 2- -d .)
    echo "[$head][$mid][$tail][$ext]"
    mkdir "${head}-${tail}"
    mv "${f}" "${head}-${tail}/${head}-${tail}.${ext}"
    echo "${mid}" > "${head}-${tail}"/title_info.txt
done

tree

Outputs:

|-- xxxx-aaasw1
|   `-- xxxx-test_file-aaasw1
|-- xxxx-bswb2u
|   `-- xxxx-test_file-bswb2u
|-- xxxx-caa-v3-
|   `-- xxxx-test_file-caa-v3-.txt
`-- xxxx-caa-v3u
    `-- xxxx-test_file-caa-v3u.txt

There are several other ways to go about this, but the ones I can think of are more cryptic than this straightforward, but not terribly efficient, approach.



回答2:

Just change the mid assignment so that it always trims six characters after the dash, followed by dot and extension.

mid=${tmp%-??????.*};