I would like to rename a linux file to a filename that is legal in windows. It should not be longer than is allowed and should not have characters that are not allowed in windows. Sometimes I copy the title from papers to a filename and they have special characters such as –
, ®
, or ?
Also there is there are some kind of characters sometimes at the ends of lines generated when copying and pasting a title from a pdf. You can see them when using sed -n 'l':
echo 'Estrogen receptor agonists and estrogen attenuate TNF-α induced
α
apoptosis in VSC4.1 motoneurons.pdf' | sed -n 'l'
Estrogen receptor agonists and estrogen attenuate TNF-\316\261 induce\
d$
\316\261$
apoptosis in VSC4.1 motoneurons.pdf$
or
echo 'A synthetic review of the five molecular Sorlie’s subtypes in
breast cancer' | sed -n 'l'
A synthetic review of the \357\254\201ve molecular Sorlie\342\200\231\
s subtypes in$
breast cancer$
I have started a script but it is not elegant and incomplete. Has someone done something like this already or is there a fast elegant way to do it?
fn2win="$1"
testFn=$(echo "$fn2win" | sed -n 'l')
#SPEC_CHAR="ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞàáâãäåçèéêëìíîïðñòóôõöøùúûüýþÿ"
#NORM_CHAR="AAAAAACEEEEIIIIDNOOOOOOUUUUYPaaaaaaceeeeiiiionoooooouuuuyby"
#SPEC_LOW_CHAR="aàáâãäåāăąbḃcćçčĉċdḑďḋđeèéěêëēĕęėfḟƒgǵģǧĝğġǥhĥħiìíîĩïīĭįıjĵkḱķǩlĺļľłmṁnńņňñoòóôõöōŏøpṗqrŕŗřsśşšŝṡſtţťṫŧuùúûũüůūŭųvwẁẃŵẅxyỳýŷÿzźžż"
#NORM_LOW_CHAR="aaaaaaaaaabbccccccdddddeeeeeeeeeefffgggggggghhhiiiiiiiiiijjkkkklllllmmnnnnnoooooooooppqrrrrssssssstttttuuuuuuuuuuvwwwwwxyyyyyzzzz"
#SPEC_CAP_CHAR="AÀÁÂÃÄÅĀĂĄBḂCĆÇČĈĊDḐĎḊĐEÈÉĚÊËĒĔĘĖFḞGǴĢǦĜĞĠǤHĤĦIÌÍÎĨÏĪĬĮİJĴKḰĶǨĸLĹĻĽŁMṀNŃŅŇÑOÒÓÔÕÖŌŎØPṖQRŔŖŘSŚŞŠŜṠTŢŤṪŦUÙÚÛŨÜŮŪŬŲVWẀẂŴẄXYỲÝŶŸZŹŽŻ"
#SPEC_CAP_CHAR="AAAAAAAAAABBCCCCCCDDDDDEEEEEEEEEEFFGGGGGGGGHHHIIIIIIIIIIJJKKKKKLLLLLMMNNNNNOOOOOOOOOPPQRRRRSSSSSSTTTTTUUUUUUUUUUVWWWWWXYYYYYZZZZ"
#sed -e "y/'$SPEC_CHAR'/'$NORM_CHAR'/"
if [ "$fn2win" != "$testFn" ]; then
newLinFn=$(echo "$fn2win" | fromdos | tr "\n" " " |\
sed -e "
s/[?()\[\]=+<>:;©®”,*|]/_/g
s/"$'\t'"/ /g
s/–/-/g
s/’/'/g
s/α/alpha/g
s/β/beta/g
s/µ/micro/g
s/Æ/AE/g
s/Ǽ/AE/g
s/æ/ae/g
s/ǽ/ae/g
s/DZ/DZ/g
s/DŽ/DZ/g
s/Dž/Dz/g
s/Dz/Dz/g
s/dz/dz/g
s/dž/dz/g
s/ff/ff/g
s/fi/fi/g
s/fl/fl/g
s/ffi/ffi/g
s/ffl/ffl/g
s/ſt/ft/g
s/IJ/IJ/g
s/ij/ij/g
s/LJ/LJ/g
s/Lj/Lj/g
s/lj/lj/g
s/NJ/NJ/g
s/Nj/Nj/g
s/nj/nj/g
s/Œ/OE/g
s/œ/oe/g
s/ß/SZ/g
s/\"/_/g
s/[[:cntrl:]]/_/g
s/\ $//g
" |\
fold -s -w 251 | head -1 | sed 's/\ $/.pdf/')
if [ "$fn2win" != "$newLinFn" ]; then
mv "$fn2win" "$newLinFn"
fi
fi
winFn=$(echo "z:"$newLinFn | sed 's/\//\\/g' )
This looks like it should do it: http://pwet.fr/man/linux/commandes/konwert