How would you write a regular expression to convert mark down into HTML? For example, you would type in the following:
This would be *italicized* text and this would be **bold** text
This would then need to be converted to:
This would be <em>italicized</em> text and this would be <strong>bold</strong> text
Very similar to the mark down edit control used by stackoverflow.
Clarification
For what it is worth, I am using C#. Also, these are the only real tags/markdown that I want to allow. The amount of text being converted would be less than 300 characters or so.
The best way is to find a version of the Markdown library ported to whatever language you are using (you did not specify in your question).
Now that you have clarified that you only want STRONG and EM to be processed, and that you are using C#, I recommend you take a look at Markdown.NET to see how those tags are implemented. As you can see, it is in fact two expressions. Here is the code:
private string DoItalicsAndBold (string text)
{
// <strong> must go first:
text = Regex.Replace (text, @"(\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1",
new MatchEvaluator (BoldEvaluator),
RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);
// Then <em>:
text = Regex.Replace (text, @"(\*|_) (?=\S) (.+?) (?<=\S) \1",
new MatchEvaluator (ItalicsEvaluator),
RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);
return text;
}
private string ItalicsEvaluator (Match match)
{
return string.Format ("<em>{0}</em>", match.Groups[2].Value);
}
private string BoldEvaluator (Match match)
{
return string.Format ("<strong>{0}</strong>", match.Groups[2].Value);
}
A single regex won't do. Every text markup will have it's own html translator. Better look into how the existing converters are implemented to get an idea on how it works.
http://en.wikipedia.org/wiki/Markdown#See_also
I don't know about C# specifically, but in perl it would be:
s/
\*\*(.*?)\*\*/
\< bold>$1\</bold>/g
s/
\*(.*?)\*/
\< em>$1\</em>/g
I came across the following post that recommends to not do this. In my case though I am looking to keep it simple, but thought I would post this per jop's recommendation in case someone else wanted to do this.