Ruby url encoding string

2019-01-10 03:07发布

How do I URI::encode a string like:

\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a

To get it in a format like:

%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A

(as per RFC 1738)

Here's what I've tried:

irb(main):123:0> URI::encode "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `gsub'
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `escape'
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:505:in `escape'
    from (irb):123
    from /usr/local/bin/irb:12:in `<main>'

Also,

irb(main):126:0> CGI::escape "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
    from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `gsub'
    from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `escape'
    from (irb):126
    from /usr/local/bin/irb:12:in `<main>'

I've looked all about the internet and haven't found (or more likely missed) a way to do this, although I am almost positive that the other day I did this without any trouble at all.

Thanks!

6条回答
啃猪蹄的小仙女
2楼-- · 2019-01-10 03:33

Nowadays, you should use ERB::Util.url_encode or CGI.escape. The primary difference between them is their handling of spaces:

>> ERB::Util.url_encode("foo/bar? baz&")
=> "foo%2Fbar%3F%20baz%26"

>> CGI.escape("foo/bar? baz&")
=> "foo%2Fbar%3F+baz%26"

CGI.escape follows the CGI/HTML forms spec and gives you an application/x-www-form-urlencoded string, which requires spaces be escaped to +, whereas ERB::Util.url_encode follows RFC 3986, which requires them to be encoded as %20.

See this answer for more discussion.

查看更多
时光不老,我们不散
3楼-- · 2019-01-10 03:36

I created a gem to make uri encoding stuff cleaner to use in your code. It takes care of binary encoding for you (added some of the example stuff in the code above).

Run gem install uri-handler.

require 'uri-handler'

str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".to_uri
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

It adds the uri conversion functionality into the String class. You can also pass it an argument with the optional encoding string you would like to use (by default sets to encoding 'binary' if the straight UTF-8 encoding fails).

查看更多
三岁会撩人
4楼-- · 2019-01-10 03:48

You can use Addressable::URI gem for that:

require 'addressable/uri'   
string = '\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a'
Addressable::URI.encode_component(string, Addressable::URI::CharacterClasses::QUERY)
# "%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a%5Cxbc%5Cxde%5Cxf1%5Cx23%5Cx45%5Cx67%5Cx89%5Cxab%5Cxcd%5Cxef%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a" 

It uses more modern format, than CGI.escape, for example, it properly encodes space as %20 and not as + sign, you can read more in wikipedia article

2.1.2 :008 > CGI.escape('Hello, this is me')
 => "Hello%2C+this+is+me" 
2.1.2 :009 > Addressable::URI.encode_component('Hello, this is me', Addressable::URI::CharacterClasses::QUERY)
 => "Hello,%20this%20is%20me" 
查看更多
对你真心纯属浪费
5楼-- · 2019-01-10 03:55
require 'uri'
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts URI::encode(str)

UPDATE: see the comment below Ruby url encoding string

查看更多
走好不送
6楼-- · 2019-01-10 03:55
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
require 'cgi'
CGI.escape(str)
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

Taken from @J-Rou's comment

查看更多
我想做一个坏孩纸
7楼-- · 2019-01-10 03:55

I was originally trying to escape special characters on file name only (not on path) from full url string. ERB::Util.url_encode didn't work for my use.

helper.send(:url_encode, "http://example.com/?a=\11\15")
# => "http%3A%2F%2Fexample.com%2F%3Fa%3D%09%0D"

Based on 2 answers of different SO question, it looks like URI::RFC2396_Parser#escape is better than using URI::Escape#escape. However, they both are behaving the same to me.

URI.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
URI::Parser.new.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
查看更多
登录 后发表回答