Ok, so this is what I have (special thx to Tushar Gupta, for fixing the code)
HTML
<input type='checkbox' value='2' name='v'>STS
<input type='checkbox' value='4' name='v'>NTV
js
$(function () {
var wordCounts = {};
$("input[type='text']:not(:disabled)").keyup(function () {
var matches = this.value.match(/\b/g);
wordCounts[this.id] = matches ? matches.length / 2 : 0;
var finalCount = 0;
var x = 0;
$('input:checkbox:checked').each(function () {
x += parseInt(this.value);
});
x = (x == 0) ? 1 : x;
$.each(wordCounts, function (k, v) {
finalCount += v * x;
});
$('#finalcount').val(finalCount)
}).keyup();
$('input:checkbox').change(function () {
$('input[type="text"]:not(:disabled)').trigger('keyup');
});
});
I want it to be able to count up Russian words e.g "Привет как дела", so far it only works with English input
The problem is in your regex - \b
doesn't match UTF-8 word boundaries.
Try changing this:
var matches = this.value.match(/\b/g);
To this:
var matches = this.value.match(/[^\s\.\!\?]+/g);
and see if that gives a result for Cyrillic input. If it works then you no longer need to divide by 2 to get the word count.
The \b
notation is defined in terms of “word boundaries”, but with “word” meaning a sequence of ASCII letters, so it cannot be used for Russian texts. A simple approach is to count sequences of Cyrillic letters, and the range from U+0400 to U+0481 covers the Cyrillic letters used in Russian.
var matches = this.value.match(/\b/g);
wordCounts[this.id] = matches ? matches.length / 2 : 0;
by the lines
var matches = this.value.match(/[\u0400-\u0481]+/g);
wordCounts[this.id] = matches ? matches.length : 0;
You should perhaps treat a hyphen as corresponding to a letter (and therefore add \-
inside the brackets), so that a hyphenated compound would be counted as one word, but this is debatable (is e.g. “жили-были” two words or one?)