Javascript - regex - how to remove words with spec

In my case word length is "2" and I am using this regex:

text = text.replace(/\b[a-zA-ZΆ-ώἀ-ῼ]{2}\b/g, '') );

but cannot make it work with greek characters. For your convenience here is a demo:

text = 'English: the on in to of \n Greek: πως θα το πω';
text = text.replace(/\b[0-9a-zA-ZΆ-ώἀ-ῼ]{2}\b/g, '');
console.log(text);

As far as the greek characters are concerned, I try to use a range with 2 sets: "Greek and Coptic" and "Greek Extended" (as seen on unicode-table.com).

标签： javascript regex words

4条回答

等我变得足够好

2楼-- · 2019-08-20 02:21

JavaScript has problems with Unicode support in regular expressions. To make the things working, I'd suggest to use XRegExp library, which has a stable support of Unicode.

MORE: http://xregexp.com/plugins/#unicode

0人赞添加讨论(0) 举报

该账号已被封号

3楼-- · 2019-08-20 02:28

Why using regex, I think you problem can be resolved without using regex

check the example below it should give you a hint on how to start

text = 'English: the on in to of \n Greek: πως θα το πω';
var tokens = text.split(/\s+/);
var text = tokens.filter(function(token){ return token.length > 2}).join(' ');
alert(text);

0人赞添加讨论(0) 举报

▲ chillily

4楼-- · 2019-08-20 02:34

The problem with greek characters is because of \b. You can take a look here: Javascript - regex - word boundary (\b) issue where @Casimir et Hippolyte proposes the following solution:

Since Javascript doesn't have the lookbehind feature and since word boundaries work only with members of the \w character class, the only way is to use groups (and capturing groups if you want to make a replacement):

//example to remove 2 letter words:
txt = txt.replace(/(^|[^a-zA-ZΆΈ-ώἀ-ῼ\n])([a-zA-ZΆΈ-ώἀ-ῼ]{2})(?![a-zA-ZΆΈ-ώἀ-ῼ])/gm, '\1');

I also added 0-9 inside the first and the third match because it was removing words like "2TB" or "mp3"

0人赞添加讨论(0) 举报

劳资没心，怎么记你

5楼-- · 2019-08-20 02:47

try this

text = 'English: the on in to of \n Greek: πως θα το πω';
text = text.replace(/\b[0-9a-zA-ZΆ-ώἀ-ῼ]{2}\b/g, '');
alert(text);
text2 = text.split(' ');
text = text2.filter(function(text2){ return text2.length != 2}).join(' ');
alert(text);

Edit-------------------

Try this,

text = 'English: the on in to of \n Greek: πως θα το πω';
text.replace(/\b[\n]\b/g, '\n ').replace(/\b[\t]\b/g, '\t ');
text2 = text.split(' ');
text = text2.filter(function(text2){ return text2.length != 2}).join(' ');
alert(text);

You will mantain \t, \n and will remove 2-letter word is between 2 tabs or two line feeds

0人赞添加讨论(0) 举报

Javascript - regex - how to remove words with spec

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间