I have a program I'm writing that is supposed to strip html tags out of a string. I've been trying to replace all strings that start with "<" and end with ">". This (obviously because I'm here asking this) has not worked so far. Here's what I've tried:
StrippedContent = Regex.Replace(StrippedContent, "\<.*\>", "")
That just returns what seems like a random part of the original string. I've also tried
For Each StringMatch As Match In Regex.Matches(StrippedContent, "\<.*\>")
StrippedContent = StrippedContent.Replace(StringMatch.Value, "")
Next
Which did the same thing (returns what seems like a random part of the original string). Is there a better way to do this? By better I mean a way that works.
Description
This expression will:
- find and replace all tags with nothing
- avoid problematic edge cases
Regex: <(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>
Replace with: nothing
Example
Sample Text
Note the difficult edge case in the mouse over function
these are <a onmouseover=' href="NotYourHref" ; if (6/a>3) { funRotator(href) } ; ' href=abc.aspx?filter=3&prefix=&num=11&suffix=>the droids</a> you are looking for.
Code
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim replacementstring as String = ""
Dim matchpattern as String = "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>"
Console.Writeline(regex.Replace(sourcestring,matchpattern,replacementstring,RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline))
End Sub
End Module
String after replacement
these are the droids you are looking for.
Well, this proves that you should always search Google for an answer. Here's a method I got from http://www.dotnetperls.com/remove-html-tags-vbnet
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim html As String = "<p>There was a <b>.NET</b> programmer " +
"and he stripped the <i>HTML</i> tags.</p>"
Dim tagless As String = StripTags(html)
Console.WriteLine(tagless)
End Sub
Function StripTags(ByVal html As String) As String
Return Regex.Replace(html, "<.*?>", "")
End Function
End Module
Here's a simple function using the regex pattern that Ro Yo Mi posted.
<Extension()> Public Function RemoveHtmlTags(value As String) As String
Return Regex.Replace(value, "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>", "")
End Function
Demonstration:
Dim html As String = "This <i>is</i> just a <b>demo</b>.".RemoveHtmlTags()
Console.WriteLine(html)