Need regular expression to remove Name tags

2020-02-29 03:48发布

Need regular expression to remove the a tag from the following url <a href="http://example.com">Name</a> to output only the string "Name". I am using C#.net.

Any help is appreciated

标签: c# asp.net regex
5条回答
混吃等死
2楼-- · 2020-02-29 04:19

You should be looking at Html Agility Pack. RegEx works on almost all cases but it fails for some basics or broken Html. Since, the grammar of HTML is not regular, Html Agility pack still works perfectly fine in all cases.

If you are looking for just one time this particular case of anchor tag, any above RegEx would work for you, but Html Agility Pack is your long run, solid solution to strip off any Html tags.

Ref: Using C# regular expressions to remove HTML tags

查看更多
▲ chillily
3楼-- · 2020-02-29 04:21

Agree with Priyank that using a parser is a safer bet. If you do go the route of using a regex, consider how you want to handle edge cases. It's easy to transform the simple case you mentioned in your question. And if that is indeed the only form the markup will take, a simple regex can handle it. But if the markup is, for example, user generated or from 3rd party source, consider cases such as these:

<a>foo</a> --> foo # a bare anchor tag, with no attributes
                   # the regexes listed above wouldn't handle this

<a href="blah"><b>boldness</b></a> --> <b>boldness</b>
                   # stripping out only the anchor tag

<A onClick="javascript:alert('foo')">Upper\ncase</A> --> Upper\ncase
                   # and obviously the regex should be case insensitive and
                   # apply to the entire string, not just one line at a time.

<a href="javascript:alert('<b>boom</b>')"><b>bold</b>bar</a> --> <b>bold</b>bar
                   # cases such as this tend to break a lot of regexes,
                   # if the markup in question is user generated, you're leaving
                   # yourself open to the risk of XSS
查看更多
聊天终结者
4楼-- · 2020-02-29 04:27

Following is working for me.

Regex.Replace(inputvalue, "\<[\/]*a[^\>]*\>", "")
查看更多
何必那么认真
5楼-- · 2020-02-29 04:32

This will do a pretty good job:

str = Regex.Replace(str, @"<a\b[^>]+>([^<]*(?:(?!</a)<[^<]*)*)</a>", "$1");
查看更多
家丑人穷心不美
6楼-- · 2020-02-29 04:40

You can try to use this one. It has not been tested under all conditions, but it will return the correct value from your example.

\<[^\>]+\>(.[^\<]+)</[^\>]+\>

Here's a version that will work for only tags.

\<a\s[^\>]+\>(.[^\<]+)</a\>

I tested it on the following HTML and it returned Name and Value only.

<a href="http://xx.com">Name</a><label>This is a label</label> <a href="http://xx.com">Value</a> 
查看更多
登录 后发表回答