I know this is not an uncommon problem and that there are already multiple SO questions answered about this (1, 2, 3) but even in following the recommendations there, I am still seeing this error (for the below code):
uri_name = u"%s_%s" % (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
So I am trying to get a url from a list of artist names, a lot of which have accents and european characters like so (with their names also printed with the special characters via repr
):
Auberjonois, René -> Auberjonois, Ren\xc3\xa9
Bäumer, Eduard -> B\xc3\xa4umer, Eduard
Baur-Nütten, Gisela -> Baur-N\xc3\xbctten, Gisela
Bösken, Lorenz -> B\xc3\xb6sken, Lorenz
Čapek, Josef -> \xc4\x8capek, Josef
Großmann, Rudolf -> Gro\xc3\x9fmann, Rudolf
The block I am trying to run is:
def create_uri(artist_name):
artist_name = artist_name
name = artist_name.split(",")
uri_name = u"%s_%s" % (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip())
uri = 'http://example.com/' + uri_name
print uri
create_uri('Name, Non_Accent')
create_uri('Auberjonois, René')
So the first one works and produces http://example.com/Non_Accent_Name
But the second fails with the error above.
I have added # coding=utf-8
to the top of my script and have tried encoding the artist_name
string at every point along the way, only to get the same error each time.
If it matters, I am using Atom as a text editor and when I open up the .csv file from where these names are coming from, the accents all display correctly.
What else can I do to ensure that the script interprets UTF-8 as UTF-8 and not ascii?