I have a string that I want to use as a filename, so I want to remove all characters that wouldn't be allowed in filenames, using Python.
I'd rather be strict than otherwise, so let's say I want to retain only letters, digits, and a small set of other characters like "_-.() "
. What's the most elegant solution?
The filename needs to be valid on multiple operating systems (Windows, Linux and Mac OS) - it's an MP3 file in my library with the song title as the filename, and is shared and backed up between 3 machines.
There is a nice project on Github called python-slugify:
Install:
Then use:
Just to further complicate things, you are not guaranteed to get a valid filename just by removing invalid characters. Since allowed characters differ on different filenames, a conservative approach could end up turning a valid name into an invalid one. You may want to add special handling for the cases where:
The string is all invalid characters (leaving you with an empty string)
You end up with a string with a special meaning, eg "." or ".."
On windows, certain device names are reserved. For instance, you can't create a file named "nul", "nul.txt" (or nul.anything in fact) The reserved names are:
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9
You can probably work around these issues by prepending some string to the filenames that can never result in one of these cases, and stripping invalid characters.
You can use list comprehension together with the string methods.
In one line:
you can also put '_' character to make it more readable (in case of replacing slashs, for example)
Keep in mind, there are actually no restrictions on filenames on Unix systems other than
Everything else is fair game.
Yes, i just stored ANSI Colour Codes in a file name and had them take effect.
For entertainment, put a BEL character in a directory name and watch the fun that ensues when you CD into it ;)
I'm sure this isn't a great answer, since it modifies the string it's looping over, but it seems to work alright: