-->

Regex to replace all superscript numbers

2019-02-26 12:03发布

问题:

I'm struggling to figure out a reasonable solution to this. I need to replace the following characters: ⁰¹²³⁴⁵⁶⁷⁸⁹ using a regex replace. I would think that you would just do this:

item = item.replace(/[⁰¹²³⁴⁵⁶⁷⁸⁹]/g, '');

However, when I try to do that, notepad++ converts symbols 5-9 into regular script numbers. I realize this probably relates to the encoding format I am using, which I see is set to ANSI.

I've never really understood the difference between the various encoding formats. But I'm wondering if there is any easy fix for this issue?

回答1:

Here is the simple regex for finding all superscript numbers

/\p{No}/gu/

Breakdown:

  • \p{No} matches a superscript or subscript digit, or a number that is not a digit [0-9]
  • u modifier: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters
  • g modifier: global. All matches (don't return on first match)

https://regex101.com/r/zA8sJ4/1

Now, most modern browsers still have no built in support for unicode numbers in regex. I would recommend using the xregexp library

XRegExp provides augmented (and extensible) JavaScript regular expressions. You get new modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your client-side grepping and parsing easier, while freeing you from worrying about pesky aspects of JavaScript regexes like cross-browser inconsistencies or manually manipulating lastIndex.

http://xregexp.com/

HTML Solution

HTML has a <sup> tag for representing superscript text.

The tag defines superscript text. Superscript text appears half a character above the normal line, and is sometimes rendered in a smaller font. Superscript text can be used for footnotes, like WWW[1].

If there are superscript numbers, the html markup almost surely has the sup tag.

var math = document.getElementById("math");

math.innerHTML = math.innerHTML.replace(/<sup>[\d]?<\/sup>/g, "");
<p id="math">4<sup>2</sup>+ 3<sup>2</sup></p>



回答2:

Use UTF-8. If for some reason you can't, a workaround is escaping

var rg = new RegExp(
  "[\u2070\u00b9\u00b2\u00b3\u2074\u2075\u2076\u2077\u2078\u2079]",
  "g"
);


回答3:

I'd suggest trying following regex:

/[\u2070-\u209f\u00b0-\u00be]+/g

Code will look like

var re = /[\u2070-\u209f\u00b0-\u00be]+/g; 
var str = '⁰¹²³⁴⁵⁶⁷⁸⁹';
var subst = ''; 

var result = str.replace(re, subs);

result will contain after successful run:

2sometext

See demo here