How do I remove Word markup crap when inserting to

2020-04-08 12:08发布

问题:

I'm building a CMS in PHP and one dread I have is that the users will have to fill the data in from existing Word (and Excel, but nevermind that) documents. Now, I've seen what happens when they carelessly copy and paste from Word to a textarea: the database got filled with crap markup.

Now, I could certainly strip all markup myself, but I'd have to start learning about it first. So I ask you: have you tested some functionality - plugins of the usual suspects (tinyMCE, FCKeditor, etc) that helps here? Bonus for the least intrusive solution.

回答1:

Sadly most of the HTML editor controls I've used either:

  1. Have a button to strip out various elements of mark up (word, html, script, etc)
  2. Strip out all markup on paste via JavaScript.

If you leave it to a button, then generally the non-technical users will forget to press it because they don't (some would say "shouldn't have to") care about it :(

With a bit of playing around with Regular Expressions (now you have another problem ;)) you could do something similar to 2 but just for word xml.



回答2:

I have found FCKEditor to handle text yanked and thrown at it from Word documents, much better than tinyMCE.



回答3:

Ok, I found a plugin for TinyMCE that apparently does what I wanted. Still, this asks for the users to press a button to paste, which is a bit less than ideal. Anything better?



回答4:

ASP.NET? Telerik's RadEditor has worked very well for me