I'm struggling to figure out a reasonable solution to this. I need to replace the following characters: ⁰¹²³⁴⁵⁶⁷⁸⁹ using a regex replace. I would think that you would just do this:
item = item.replace(/[⁰¹²³⁴⁵⁶⁷⁸⁹]/g, '');
However, when I try to do that, notepad++ converts symbols 5-9 into regular script numbers. I realize this probably relates to the encoding format I am using, which I see is set to ANSI.
I've never really understood the difference between the various encoding formats. But I'm wondering if there is any easy fix for this issue?
Here is the simple regex for finding all superscript numbers
Breakdown:
\p{No}
matches a superscript or subscript digit, or a number that is not a digit [0-9]u modifier
: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode charactersg modifier
: global. All matches (don't return on first match)https://regex101.com/r/zA8sJ4/1
Now, most modern browsers still have no built in support for unicode numbers in regex. I would recommend using the
xregexp
libraryhttp://xregexp.com/
HTML Solution
HTML has a
<sup>
tag for representing superscript text.If there are superscript numbers, the html markup almost surely has the
sup
tag.I'd suggest trying following regex:
Code will look like
result will contain after successful run:
See demo here
Use UTF-8. If for some reason you can't, a workaround is escaping