I'm building an entity highlighter so I can upload a text file, view the contents on the screen, then highlight words that are in an array. This is array is populated by the user when they manually highlight a selection e.g...
const entities = ['John Smith', 'Apple', 'some other word'];
This is my text document that is displayed on the screen. It contains a lot of text, and some of this text needs to be visually highlighted to the user once they manually highlight some text, like the name John Smith, Apple and some other word
Now I want to visually highlight all instances of the entity in the text by wrapping it in some markup, and doing something like this works perfectly:
getFormattedText() {
const paragraphs = this.props.text.split(/\n/);
const { entities } = this.props;
return paragraphs.map((p) => {
let entityWrapped = p;
entities.forEach((text) => {
const re = new RegExp(`${text}`, 'g');
entityWrapped =
entityWrapped.replace(re, `<em>${text}</em>`);
});
return `<p>${entityWrapped}</p>`;
}).toString().replace(/<\/p>,/g, '</p>');
}
...however(!), this just gives me a big string so I have to dangerously set the inner HTML, and therefor I can't then attach an onClick event 'the React way' on any of these highlighted entities, which is something I need to do.
The React way of doing this would be to return an array that looks something like this:
['This is my text document that is displayed on the screen. It contains a lot of text, and some of this text needs to be visually highlighted to the user, like the name', {}, {}, {}]
Where the {}
are the React Objects containing the JSX stuff.
I've had a stab at this with a few nested loops, but it's buggy as hell, difficult to read and as I'm incrementally adding more entities the performance takes a huge hit.
So, my question is... what's the best way to solve this issue? Ensuring code is simple and readable, and that we don't get huge performance issues, as I'm potentially dealing with documents which are very long. Is this the time that I let go of my React morals and dangerouslySetInnerHTML, along with events bound directly to the DOM?
Update
@AndriciCezar's answer below does a perfect job of formatting the array of Strings and Objects ready for React to render, however it's not very performant once the array of entities is large (>100) and the body of text is also large (>100kb). We're looking at about 10x longer to render this as an array V's a string.
Does anyone know a better way to do this that gives the speed of rendering a large string but the flexibility of being able to attach React events on the elements? Or is dangerouslySetInnerHTML the best solution in this scenario?
Here's a solution that uses a regex to split the string on each keyword. You could make this simpler if you don't need it to be case insensitive or highlight keywords that are multiple words.
Have you tried something like this?
The complexity is number of paragraphs * number of keywords. For a paragraph of 22,273 words (121,104 characters) and 3 keywords, it takes 44ms on my PC to generate the array.
!!! UPDATE: I think this is the clearest and efficientest way to highlight the keywords. I used James Brierley's answer to optimize it.
I tested on 320kb of data with 500 keywords and it loads pretty slow. Another idea it will be to render the paragraphs progressive. Render first 10 paragraphs, and after that, at scroll or after some time, render the rest.
And a JS Fiddle with your example: https://jsfiddle.net/69z2wepo/79047/
The first thing I did was split the paragraph into an array of words.
const words = paragraph.split( ' ' );
Then I mapped the words array to a bunch of
<span>
tags. This allows me to attachonDoubleClick
events to each word.So if a word is double clicked, I fire the
this.highlightSelected()
function and then as I conditionally render the word based on whether or not it is highlighted.All I am doing here is either removing or pushing the word to a an array in my component's state.
checkHighlighted()
will just check if the word being rendered exists in that array.And finally, the
formatWord()
function is simply removing any periods or commas and making everything lower case.Hope this helps!