Turn a string into a valid filename?

2019-01-03 07:39发布

I have a string that I want to use as a filename, so I want to remove all characters that wouldn't be allowed in filenames, using Python.

I'd rather be strict than otherwise, so let's say I want to retain only letters, digits, and a small set of other characters like "_-.() ". What's the most elegant solution?

The filename needs to be valid on multiple operating systems (Windows, Linux and Mac OS) - it's an MP3 file in my library with the song title as the filename, and is shared and backed up between 3 machines.

20条回答
叼着烟拽天下
2楼-- · 2019-01-03 08:12

There is a nice project on Github called python-slugify:

Install:

pip install python-slugify

Then use:

>>> from slugify import slugify
>>> txt = "This\ is/ a%#$ test ---"
>>> slugify(txt)
'this-is-a-test'
查看更多
该账号已被封号
3楼-- · 2019-01-03 08:13

Just to further complicate things, you are not guaranteed to get a valid filename just by removing invalid characters. Since allowed characters differ on different filenames, a conservative approach could end up turning a valid name into an invalid one. You may want to add special handling for the cases where:

  • The string is all invalid characters (leaving you with an empty string)

  • You end up with a string with a special meaning, eg "." or ".."

  • On windows, certain device names are reserved. For instance, you can't create a file named "nul", "nul.txt" (or nul.anything in fact) The reserved names are:

    CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9

You can probably work around these issues by prepending some string to the filenames that can never result in one of these cases, and stripping invalid characters.

查看更多
放我归山
4楼-- · 2019-01-03 08:15

You can use list comprehension together with the string methods.

>>> s
'foo-bar#baz?qux@127/\\9]'
>>> "".join(x for x in s if x.isalnum())
'foobarbazqux1279'
查看更多
家丑人穷心不美
5楼-- · 2019-01-03 08:15

In one line:

valid_file_name = re.sub('[^\w_.)( -]', '', any_string)

you can also put '_' character to make it more readable (in case of replacing slashs, for example)

查看更多
forever°为你锁心
6楼-- · 2019-01-03 08:16

Keep in mind, there are actually no restrictions on filenames on Unix systems other than

  • It may not contain \0
  • It may not contain /

Everything else is fair game.

$ touch "
> even multiline
> haha
> ^[[31m red ^[[0m
> evil"
$ ls -la 
-rw-r--r--       0 Nov 17 23:39 ?even multiline?haha??[31m red ?[0m?evil
$ ls -lab
-rw-r--r--       0 Nov 17 23:39 \neven\ multiline\nhaha\n\033[31m\ red\ \033[0m\nevil
$ perl -e 'for my $i ( glob(q{./*even*}) ){ print $i; } '
./
even multiline
haha
 red 
evil

Yes, i just stored ANSI Colour Codes in a file name and had them take effect.

For entertainment, put a BEL character in a directory name and watch the fun that ensues when you CD into it ;)

查看更多
戒情不戒烟
7楼-- · 2019-01-03 08:16

I'm sure this isn't a great answer, since it modifies the string it's looping over, but it seems to work alright:

import string
for chr in your_string:
 if chr == ' ':
   your_string = your_string.replace(' ', '_')
 elif chr not in string.ascii_letters or chr not in string.digits:
    your_string = your_string.replace(chr, '')
查看更多
登录 后发表回答