How do I encode/decode HTML entities in Ruby?

2019-01-01 07:52发布

问题:

I am trying to decode some HTML entities, such as \'&amp;lt;\' becoming \'<\'.

I have an old gem (html_helpers) but it seems to have been abandoned twice.

Any recommendations? I will need to use it in a model.

回答1:

HTMLEntities can do it:

: jmglov@laurana; sudo gem install htmlentities
Successfully installed htmlentities-4.2.4
: jmglov@laurana;  irb
irb(main):001:0> require \'htmlentities\'
=> []
irb(main):002:0> HTMLEntities.new.decode \"&iexcl;I&#39;m highly&nbsp;annoyed with character references!\"
=> \"¡I\'m highly annoyed with character references!\"


回答2:

To encode the characters, you can use CGI.escapeHTML:

string = CGI.escapeHTML(\'test \"escaping\" <characters>\')

To decode them, there is CGI.unescapeHTML:

CGI.unescapeHTML(\"test &quot;unescaping&quot; &lt;characters&gt;\")

Of course, before that you need to include the CGI library:

require \'cgi\'

And if you\'re in Rails, you don\'t need to use CGI to encode the string. There\'s the h method.

<%= h \'escaping <html>\' %>


回答3:

To decode characters in Rails use:

<%= raw \'<html>\' %>

So,

<%= raw \'&lt;br&gt;\' %>

would output

<br>


回答4:

I think Nokogiri gem is also a good choice. It is very stable and has a huge contributing community.

Samples:

a = Nokogiri::HTML.parse \"foo&nbsp;b&auml;r\"    
a.text 
=> \"foo bär\"

or

a = Nokogiri::HTML.parse \"&iexcl;I&#39;m highly&nbsp;annoyed with character references!\"
a.text
=> \"¡I\'m highly annoyed with character references!\"


回答5:

If you don\'t want to add a new dependency just to do this (like HTMLEntities) and you\'re already using Hpricot, it can both escape and unescape for you. It handles much more than CGI:

Hpricot.uxs \"foo&nbsp;b&auml;r\"
=> \"foo bär\"


回答6:

You can use htmlascii gem:

Htmlascii.convert string


回答7:

<% str=\"<h1> Test </h1>\" %>

result: &lt; h1 &gt; Test &lt; /h1 &gt;

<%= CGI.unescapeHTML(str).html_safe %>


标签: html ruby