Dealing with the Cyrillic encoding in Node.Js / Ex

In my app a user submits text through a form's textarea and this text is passed on to the app and is then processed by jsesc library, which escapes javascript strings.

The problem is that when I type in a text in Russian, such as

 нам #интересны наши #идеи

what i get is

 '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'

I then need to pass this data through FlowDock to extract hashtags and FlockDock just does not recognize it.

Can someone please tell me

1) What is the need for converting it into that representation;

2) If it makes sense to convert it back to cyrillic encoding for FlowDock and for the database, or shall I keep it in Unicode and try to make FlowDock work with it?

Thanks!

UPDATE

The complete script is:

result = getField(req, field);
result = S(result).trim().collapseWhitespace().s;

// at this point result = "нам #интересны наши #идеи"
result = jsesc(result, {
             'quotes': 'double'
         });

// now i end up with Unicode as above above (\u....)

var hashtags = FlowdockText.extractHashtags(result);

FlowDock receives the result which is

\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438

And doesn't extract hashtags from it...

标签： javascript node.js unicode encoding cyrillic

3条回答

SAY GOODBYE

2楼-- · 2019-08-31 01:47

These are 2 representations of the same string:

'нам #интересны наши #идеи' ===  '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'

looks like flowdock-text doesn't work well with non-ASCII characters

UPD: Tried, actually works well:

fdt.extractHashtags('\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438');

You shouldn't have used escaping in the first place, it gives you string literal representation (suits for eval, etc), not a string.

UPD2: I've reduced you code to the following:

var jsesc = require('jsesc');
var fdt = require('flowdock-text');

var result = 'нам #интересны наши #идеи';

result = jsesc(result, {
             'quotes': 'double'
         });

var hashtags = fdt.extractHashtags(result);

console.log(hashtags);

As I said, the problem is with jsesc: you don't need it. It returns javascript-encoded string. You need when you are doing eval with concatenation to protect from code injection, or something like this. For example if you add result = eval('"' + result + '"');, it will work.

0人赞添加讨论(0) 举报

姐就是有狂的资本

3楼-- · 2019-08-31 01:49

What is the need for converting it into that representation?

jsesc is a JavaScript library for escaping JavaScript strings while generating the shortest possible valid ASCII-only output. Here’s an online demo.

This can be used to avoid mojibake and other encoding issues, or even to avoid errors when passing JSON-formatted data (which may contain U+2028 LINE SEPARATOR, U+2029 PARAGRAPH SEPARATOR, or lone surrogates) to a JavaScript parser or an UTF-8 encoder, respectively.

Sounds like in this case you don’t intend to use jsesc at all.

0人赞添加讨论(0) 举报

地球回转人心会变

4楼-- · 2019-08-31 01:49

Try this:

decodeURIComponent("\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438");

0人赞添加讨论(0) 举报

Dealing with the Cyrillic encoding in Node.Js / Ex

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间