I am working on http client in haskell (that's my first "non exersize" project).
There is an api which returns json with all text using unicode, something like
\u041e\u043d\u0430 \u043f\u0440\u0438\u0432\u0435\u0434\u0435\u0442 \u0432\u0430\u0441 \u0432 \u0434\u043b\u0438\u043d\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a
I want to decode this json to utf-8, to print some data from json message.
I searched for existing libraries, but find Nothing for this purpose.
So I wrote function to convert data (I am using lazy bytestrings because I got data with this type from wreq lib)
ununicode :: BL.ByteString -> BL.ByteString
ununicode s = replace s where
replace :: BL.ByteString -> BL.ByteString
replace str = case (Map.lookup (BL.take 6 str) table) of
(Just x) -> BL.append x (replace $ BL.drop 6 str)
(Nothing) -> BL.cons (BL.head str) (replace $ BL.tail str)
table = Map.fromList $ zip letters rus
rus = ["Ё", "ё", "А", "Б", "В", "Г", "Д", "Е", "Ж", "З", "И", "Й", "К", "Л", "М",
"Н", "О", "П", "Р", "С", "Т", "У", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ъ", "Ы",
"Ь", "Э", "Ю", "Я", "а", "б", "в", "г", "д", "е", "ж", "з", "и", "й", "к",
"л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ч", "ш", "щ",
"ъ", "ы", "ь", "э", "ю", "я"]
letters = ["\\u0401", "\\u0451", "\\u0410", "\\u0411", "\\u0412", "\\u0413",
"\\u0414", "\\u0415", "\\u0416", "\\u0417", "\\u0418", "\\u0419",
"\\u041a", "\\u041b", "\\u041c", "\\u041d", "\\u041e", "\\u041f",
"\\u0420", "\\u0421", "\\u0422", "\\u0423", "\\u0424", "\\u0425",
"\\u0426", "\\u0427", "\\u0428", "\\u0429", "\\u042a", "\\u042b",
"\\u042c", "\\u042d", "\\u042e", "\\u042f", "\\u0430", "\\u0431",
"\\u0432", "\\u0433", "\\u0434", "\\u0435", "\\u0436", "\\u0437",
"\\u0438", "\\u0439", "\\u043a", "\\u043b", "\\u043c", "\\u043d",
"\\u043e", "\\u043f", "\\u0440", "\\u0441", "\\u0442", "\\u0443",
"\\u0444", "\\u0445", "\\u0446", "\\u0447", "\\u0448", "\\u0449",
"\\u044a", "\\u044b", "\\u044c", "\\u044d", "\\u044e", "\\u044f"]
But it doesn't work as I expected. It replaces text, but instead of cyrrilic letters I got something like 345 ?C1;8:C5< 8=B5@2LN A @4=52=8:>2F0<8 8=B5@5A=KE ?@>D5AA89 8 E>118
The second problem that I can't debug my function.
When I try just call it with custom string I got error Data.ByteString.Lazy.head: empty ByteString
I gave no idea about reason why it's empty.
It work's fine during normal program execution:
umailGet env params = do
r <- apiGet env (("method", "umail.get"):params)
x <- return $ case r of
(Right a) -> a
(Left a) -> ""
return $ ununicode $ x
and than in Main
r2 <- umailGet client []
print $ r2
And the last problem is that all api can return any unicode symbol, so this solution is bad by design.
Of course function implementation seems to be bad to, so after solving the main problem, I am going to rewrite it using foldr.
UPDATED: It seems like I had desribed problem not enough clear.
So I am sending request via wreq lib, and get a json answer. For example
{"result":"12","error":"\u041d\u0435\u0432\u0435\u0440\u043d\u044b\u0439 \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u0441\u0435\u0441\u0441\u0438\u0438"}
That's not the result of haskell representetion of result, thare are real ascii symbols. I got the same text using curl or firefox. 190 bytes/190 ascii symbols.
Using this site for example http://unicode.online-toolz.com/tools/text-unicode-entities-convertor.php I can convert it to cyrrilic text {"result":"12","error":"Неверный идентификатор сессии"}
And I need to implement something like this service using haskell (or find a package where it had been already implemented), where response like this has type Lazy Bytestring.
I also tried to change types to use Text instead of ByteString (both Lazy and strict), changed first line to ununicode s = encodeUtf8 $ replace $ L.toStrict $ LE.decodeUtf8 s
And with that new implementation I am getting an error when executing my program
Data.Text.Internal.Fusion.Common.head: Empty stream
. Sot it looks like I have error in my replacing function, maybe if I fix it, it also will fix the main problem.