This is a follow-up of Find multiple keywords within a dictionary.
My questions are...
The first is: I believe this matches words that are not whole. Like if short is in my dictionary it matches the word shortly. How would I stop this?
And the second not so important but would be nice is: How would I make it so it only matches once per content? So short doesn't get defined twice within the same content area.
Thanks!
I have implemented the following additional requirements:
- Do not match
shortly
when looking for short
(because shortly
is a different word)
- Use keys in the dictionary only once.
Example input: key=foo
, replacement=bar
, content=foo foo
.
Output: bar foo
(only the first foo
is replaced).
Demo: http://jsfiddle.net/bhGE3/3/
Usage:
- Define a
dictionary
. Each key will be used only once.
- Define
content
. A new string will be created, based on this string.
- Optionally, define a
replacehandler
function. This function is called at each match. The return value will be used to replace the matched phrase.
The default replacehandler
will return the dictionary's matching phrase. The function should take two arguments: key
and dictionary
.
- Call
replaceOnceUsingDictionary(dictionary, content, replacehandler)
- Process the output, eg. show
content
to the user.
Code:
var dictionary = {
"history": "war . ",
"no": "in a",
"nothing": "",
"oops": "",
"time": "while",
"there": "We",
"upon": "in",
"was": "get involved"
};
var content = "Once upon a time... There was no history. Nothing. Oops";
content = replaceOnceUsingDictionary(dictionary, content, function(key, dictionary){
return '_' + dictionary[key] + '_';
});
alert(content);
// End of implementation
/*
* @name replaceOnceUsingDictionary
* @author Rob W http://stackoverflow.com/users/938089/rob-w
* @description Replaces phrases in a string, based on keys in a given dictionary.
* Each key is used only once, and the replacements are case-insensitive
* @param Object dictionary {key: phrase, ...}
* @param String content
* @param Function replacehandler
* @returns Modified string
*/
function replaceOnceUsingDictionary(dictionary, content, replacehandler) {
if (typeof replacehandler != "function") {
// Default replacehandler function.
replacehandler = function(key, dictionary){
return dictionary[key];
}
}
var patterns = [], // \b is used to mark boundaries "foo" doesn't match food
patternHash = {},
oldkey, key, index = 0,
output = [];
for (key in dictionary) {
// Case-insensitivity:
key = (oldkey = key).toLowerCase();
dictionary[key] = dictionary[oldkey];
// Sanitize the key, and push it in the list
patterns.push('\\b(?:' + key.replace(/([[^$.|?*+(){}])/g, '\\$1') + ')\\b');
// Add entry to hash variable, for an optimized backtracking at the next loop
patternHash[key] = index++;
}
var pattern = new RegExp(patterns.join('|'), 'gi'),
lastIndex = 0;
// We should actually test using !== null, but for foolproofness,
// we also reject empty strings
while (key = pattern.exec(content)) {
// Case-insensitivity
key = key[0].toLowerCase();
// Add to output buffer
output.push(content.substring(lastIndex, pattern.lastIndex - key.length));
// The next line is the actual replacement method
output.push(replacehandler(key, dictionary));
// Update lastIndex variable
lastIndex = pattern.lastIndex;
// Don't match again by removing the matched word, create new pattern
patterns[patternHash[key]] = '^';
pattern = new RegExp(patterns.join('|'), 'gi');
// IMPORTANT: Update lastIndex property. Otherwise, enjoy an infinite loop
pattern.lastIndex = lastIndex;
}
output.push(content.substring(lastIndex, content.length));
return output.join('');
}