I am working o scrapy, I scraped some sites and stored the items from the scraped page in to json files, but some of them are containing the following format.
l = ["Holding it Together",
"Fowler RV Trip",
"S\u00e9n\u00e9gal - Mali - Niger","H\u00eatres et \u00e9tang",
"Coll\u00e8ge marsan","N\u00b0one",
"Lines through the days 1 (Arabic) \u0633\u0637\u0648\u0631 \u0639\u0628\u0631 \u0627\u0644\u0623\u064a\u0627\u0645 1",
"\u00cdndia, Tail\u00e2ndia & Cingapura"]
I can expect that the list consists of different format, but i want to convert that and store the strings in the list with their original names like below
l = ["Holding it Together",
"Fowler RV Trip",
"Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ | Blogs" ,
"Índia, Tailândia & Cingapura "]
Thanks in advance...........
When you serialise to JSON, there may be a flag that allows you to turn off the escaping of non-ASCII characters to
\u
sequences. If you are using the standard libraryjson
module, it'sensure_ascii
:However be aware that with that safety measure taken away you now have to be able to deal with non-ASCII characters in a correct way, or you'll get a bunch of
UnicodeError
s. For example if you are writing the JSON to a file you must explicitly encode the Unicode string to the charset you want (for example UTF-8).You have byte strings containing unicode escapes. You can convert them to unicode with the
unicode_escape
codec:And you can encode it back to byte strings:
You can filter and decode the non-unicode strings like: