I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe
I have changed it a little bit to:
s = 'String to slugify'
slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)
Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?
The problem is with the ascii normalization line:
It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:
A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:
You get better results for the above strings and for many Greek and Russian characters too:
Another option is
boltons.strutils.slugify
. Boltons has quite a few other useful functions as well, and is distributed under aBSD
license.Install unidecode form from here for unicode support
There is a python package named
python-slugify
, which does a pretty good job of slugifying:Works like this:
See More examples
This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over four years later (last checked 2017-04-26), it still gets updated).
careful: There is a second package around, named
slugify
. If you have both of them, you might get a problem, as they have the same name for import. The one just namedslugify
didn't do all I quick-checked:"Ich heiße"
became"ich-heie"
(should be"ich-heisse"
), so be sure to pick the right one, when usingpip
oreasy_install
.It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.
Are you having any problems with it?
A couple of options on GitHub:
Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.
In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's
unicode-slugify
is no longer Django-specific.Also note that currently
awesome-slugify
is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24