I'm looking at using WMD in my project instead of my existing RadEditor. I have been reading a few posts on how to store and retrieve the data, and I want to make sure I have the concept correct before proceeding.
If my research is correct, here is what I should be doing.
- I should store the editor data twice (Once as HTML and once as Markdown)
- I should run the HTML through a Whitelist before storing it.
- I should run the HTML through AntiXSS on the way out (before displaying)
- I should use the Markdown data ONLY to repopulate Markdown for editing.
Can anyone confirm or deny if this is correct, and also add any useful input on the subject?
References
Reformat my code: Sanitize Html
StackOverflow: how do you store the markdown using wmd in asp net
StackOverflow: sanitize html before storing in the db or before rendering antixss library
StackOverflow: store html entities in database or convert when retrieved
I'm implementing Markdown in a Blog engine I'm writing (who doesn't write blog engines?), and I've also implemented Markdown in a number of customized CMSs I've written for clients.
I do it very similarly to how the Stack Overflow team does it:
- I use the
wmd.js
as the client side editor.
- I use the MarkdownSharp server side processing.
- I use Jeff Atwood's Sanitize HTML to cover processing HTML.
Here are some resources that talk about Markdown:
- Introducing MarkdownSharp
- Three Markdown Gotchas
- Markdown, One Year Later
- Reverse Engineering the Markdown Editor
- WMD Edtior Reverse Engineered
Bottom line:
- I store the post in the form it was submitted in; It's displayed using MarkdownSharp.
- I sanitize the HTML using Jeff Atwood's approach (On output, not on input).
- I utilize ASP.NET MVC 'best practices' (a highly subjective term) to deal with XSS and XSRF.
So one of the ides behind Markdown is that it will produce "safe" html - there should be no need for separate encoding.
More generally I would recommend storing "raw" data in the database, without transforming it or sanitising it. You should always sanitise or transform as close to the rendering point as possible - it gives greater flexibility (oh, suddenly I need to render as RSS. Or JSON. Damn, I can't because I pre-formatted for HTML) and, should the sanitiser or renderer be updated you see the effects of the update on every piece of data.
I would say store the markdown text in the database, and then convert it when you want it rendered, using the markdown library for this which, in theory, should all safe HTML built from its safe list of tags and attributes.