Why isn't MarkdownSharp encoding my HTML?

2019-04-05 15:27发布

问题:

In my mind, one of the bigger goals of Markdown is to prevent the user from typing potentially malformed HTML directly.

Well that isn't exactly working for me in MarkdownSharp.

This example works properly when you have the extra line break immediately after "abc"...

But when that line break isn't there, I think it should still be HtmlEncoded, but that isn't happening here...

Behind the scenes, the rendered markup is coming from an iframe. And this is the code behind it...

<% 
var md = new MarkdownSharp.Markdown();
%>
<%= md.Transform(Request.Form[0]) %>

Surely I must be missing something. Oh, and I am using v1.13 (the latest version as of this writing).


EDIT (this is a test for StackOverflow's implementation)

abc

this shouldn't be red

回答1:

For those not wanting to use Steve Wortham's customized solution, I have submitted an issue and a proposed fix to the MarkdownSharp guys: http://code.google.com/p/markdownsharp/issues/detail?id=43

If you download my attached Markdown.cs file you will find a new option that you can set. It will stop MarkdownSharp from re-encoding text within the code blocks.

Just don't forget to HTML encode your input BEFORE you pass it into markdown, NOT after.

Another solution is to white-list HTML tags like Stack Overflow does. You would do this AFTER you pass your content to markdown.

See this for more information: http://www.CodeTunnel.com/blog/post/24/mardownsharp-and-encoded-html



回答2:

Since it became clear that the StackOverflow implementation contains quite a few customizations that could be time consuming to test and figure out, I decided to go another direction.

I created my own simplified markup language that's a subset of Markdown. The open-source project is at http://ultralight.codeplex.com/ and you can see a working example at http://www.bucketsoft.com/ultralight/

The project is a complete ASP.NET MVC solution with a Javascript editor. And unlike MarkdownSharp, safe HTML is guaranteed. The Javascript parser is used both client-side and server-side to guarantee consistent markup (special thanks to the Jurassic Javascript compiler). It's a beautiful thing to only have to maintain one codebase for that parser.

Although the project is still in beta, I'm using it on my own site already and it seems to be working well so far.



回答3:

Maybe I'm not understanding? If you are starting a new code block in Markdown, in all its varieties, you do need a double linebreak and four-space indentation -- a single newline won't do in any of the renderers I have to hand.

abc -- Here comes a code block:

    <div style="background-color: red"> This is code</div>

yielding:

abc -- Here comes a code block:

<div style="background-color: red"> This is code</div>

From what you are saying it seems that MarkdownSharp does fine with this rule, so with just one newline (but indentation):

 abc -- Here comes a code block:
     <div style="background-color: red"> This should be code</div>

we get a mess not a code block:

abc -- Here comes a code block: This should be code

I assume StackOverflow is stripping the <div> tags, because they think comments shouldn't have divisions and suchlike things. (?) (In general they have to do a lot of other processing don't they, e.g. to get syntax highlighting and so on?)

EDIT: I think people are expecting the wrong thing of a Markdown implementation. For example, as I say below, there is no such thing as 'invalid markdown'. It isn't a programming language or anything like one. I have verified that all three markdown implementations I have available from the command line indifferently 'convert' random .js and .c files, or those inserted into otherwise sensible markdown -- and also interpolated zip files and other nonsense -- into valid html that browsers don't mind displaying at all -- chicken scratches though it is. If you want to exclude something, e.g. in a wiki program, you do something further, of course, as most markdown-employing wiki programs do.