可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to write a bash script for testing that takes a parameter and sends it through curl to web site. I need to url encode the value to make sure that special characters are processed properly. What is the best way to do this?
Here is my basic script so far:
#!/bin/bash
host=${1:?\'bad host\'}
value=$2
shift
shift
curl -v -d \"param=${value}\" http://${host}/somepath $@
回答1:
Use curl --data-urlencode
; from man curl
:
This posts data, similar to the other --data
options with the exception that this performs URL-encoding. To be CGI-compliant, the <data>
part should begin with a name followed by a separator and a content specification.
Example usage:
curl \\
--data-urlencode \"paramName=value\" \\
--data-urlencode \"secondParam=value\" \\
http://example.com
See the man page for more info.
This requires curl 7.18.0 or newer (released January 2008). Use curl -V
to check which version you have.
回答2:
Here is the pure BASH answer.
rawurlencode() {
local string=\"${1}\"
local strlen=${#string}
local encoded=\"\"
local pos c o
for (( pos=0 ; pos<strlen ; pos++ )); do
c=${string:$pos:1}
case \"$c\" in
[-_.~a-zA-Z0-9] ) o=\"${c}\" ;;
* ) printf -v o \'%%%02x\' \"\'$c\"
esac
encoded+=\"${o}\"
done
echo \"${encoded}\" # You can either set a return variable (FASTER)
REPLY=\"${encoded}\" #+or echo the result (EASIER)... or both... :p
}
You can use it in two ways:
easier: echo http://url/q?=$( rawurlencode \"$args\" )
faster: rawurlencode \"$args\"; echo http://url/q?${REPLY}
[edited]
Here\'s the matching rawurldecode() function, which - with all modesty - is awesome.
# Returns a string in which the sequences with percent (%) signs followed by
# two hex digits have been replaced with literal characters.
rawurldecode() {
# This is perhaps a risky gambit, but since all escape characters must be
# encoded, we can replace %NN with \\xNN and pass the lot to printf -b, which
# will decode hex for us
printf -v REPLY \'%b\' \"${1//%/\\\\x}\" # You can either set a return variable (FASTER)
echo \"${REPLY}\" #+or echo the result (EASIER)... or both... :p
}
With the matching set, we can now perform some simple tests:
$ diff rawurlencode.inc.sh \\
<( rawurldecode \"$( rawurlencode \"$( cat rawurlencode.inc.sh )\" )\" ) \\
&& echo Matched
Output: Matched
And if you really really feel that you need an external tool (well, it will go a lot faster, and might do binary files and such...) I found this on my OpenWRT router...
replace_value=$(echo $replace_value | sed -f /usr/lib/ddns/url_escape.sed)
Where url_escape.sed was a file that contained these rules:
# sed url escaping
s:%:%25:g
s: :%20:g
s:<:%3C:g
s:>:%3E:g
s:#:%23:g
s:{:%7B:g
s:}:%7D:g
s:|:%7C:g
s:\\\\:%5C:g
s:\\^:%5E:g
s:~:%7E:g
s:\\[:%5B:g
s:\\]:%5D:g
s:`:%60:g
s:;:%3B:g
s:/:%2F:g
s:?:%3F:g
s^:^%3A^g
s:@:%40:g
s:=:%3D:g
s:&:%26:g
s:\\$:%24:g
s:\\!:%21:g
s:\\*:%2A:g
回答3:
Use Perl\'s URI::Escape
module and uri_escape
function in the second line of your bash script:
...
value=\"$(perl -MURI::Escape -e \'print uri_escape($ARGV[0]);\' \"$2\")\"
...
Edit: Fix quoting problems, as suggested by Chris Johnsen in the comments. Thanks!
回答4:
for the sake of completeness, many solutions using sed
or awk
only translate a special set of characters and are hence quite large by code size and also dont translate other special characters that should be encoded.
a safe way to urlencode would be to just encode every single byte - even those that would\'ve been allowed.
echo -ne \'some random\\nbytes\' | xxd -plain | tr -d \'\\n\' | sed \'s/\\(..\\)/%\\1/g\'
xxd is taking care here that the input is handled as bytes and not characters.
edit:
xxd comes with the vim-common package in Debian and I was just on a system where it was not installed and I didnt want to install it. The altornative is to use hexdump
from the bsdmainutils package in Debian. According to the following graph, bsdmainutils and vim-common should have an about equal likelihood to be installed:
http://qa.debian.org/popcon-png.php?packages=vim-common%2Cbsdmainutils&show_installed=1&want_legend=1&want_ticks=1
but nevertheless here a version which uses hexdump
instead of xxd
and allows to avoid the tr
call:
echo -ne \'some random\\nbytes\' | hexdump -v -e \'/1 \"%02x\"\' | sed \'s/\\(..\\)/%\\1/g\'
回答5:
I find it more readable in python:
encoded_value=$(python -c \"import urllib; print urllib.quote(\'\'\'$value\'\'\')\")
the triple \' ensures that single quotes in value won\'t hurt. urllib is in the standard library. It work for exampple for this crazy (real world) url:
\"http://www.rai.it/dl/audio/\" \"1264165523944Ho servito il re d\'Inghilterra - Puntata 7
回答6:
One of variants, may be ugly, but simple:
urlencode() {
local data
if [[ $# != 1 ]]; then
echo \"Usage: $0 string-to-urlencode\"
return 1
fi
data=\"$(curl -s -o /dev/null -w %{url_effective} --get --data-urlencode \"$1\" \"\")\"
if [[ $? != 3 ]]; then
echo \"Unexpected error\" 1>&2
return 2
fi
echo \"${data##/?}\"
return 0
}
Here is the one-liner version for example (as suggested by Bruno):
date | curl -Gso /dev/null -w %{url_effective} --data-urlencode @- \"\" | cut -c 3-
回答7:
I\'ve found the following snippet useful to stick it into a chain of program calls, where URI::Escape might not be installed:
perl -p -e \'s/([^A-Za-z0-9])/sprintf(\"%%%02X\", ord($1))/seg\'
(source)
回答8:
If you wish to run GET
request and use pure curl just add --get
to @Jacob\'s solution.
Here is an example:
curl -v --get --data-urlencode \"access_token=$(cat .fb_access_token)\" https://graph.facebook.com/me/feed
回答9:
Another option is to use jq
:
jq -s -R -r @uri
-s
(--slurp
) reads input lines into an array and -s -R
(--slurp --raw-input
) reads the input into a single string. -r
(--raw-output
) outputs the contents of strings instead of JSON string literals.
Or this percent-encodes all bytes:
xxd -p|tr -d \\\\n|sed \'s/../%&/g\'
回答10:
Direct link to awk version : http://www.shelldorado.com/scripts/cmds/urlencode
I used it for years and it works like a charm
:
##########################################################################
# Title : urlencode - encode URL data
# Author : Heiner Steven (heiner.steven@odn.de)
# Date : 2000-03-15
# Requires : awk
# Categories : File Conversion, WWW, CGI
# SCCS-Id. : @(#) urlencode 1.4 06/10/29
##########################################################################
# Description
# Encode data according to
# RFC 1738: \"Uniform Resource Locators (URL)\" and
# RFC 1866: \"Hypertext Markup Language - 2.0\" (HTML)
#
# This encoding is used i.e. for the MIME type
# \"application/x-www-form-urlencoded\"
#
# Notes
# o The default behaviour is not to encode the line endings. This
# may not be what was intended, because the result will be
# multiple lines of output (which cannot be used in an URL or a
# HTTP \"POST\" request). If the desired output should be one
# line, use the \"-l\" option.
#
# o The \"-l\" option assumes, that the end-of-line is denoted by
# the character LF (ASCII 10). This is not true for Windows or
# Mac systems, where the end of a line is denoted by the two
# characters CR LF (ASCII 13 10).
# We use this for symmetry; data processed in the following way:
# cat | urlencode -l | urldecode -l
# should (and will) result in the original data
#
# o Large lines (or binary files) will break many AWK
# implementations. If you get the message
# awk: record `...\' too long
# record number xxx
# consider using GNU AWK (gawk).
#
# o urlencode will always terminate it\'s output with an EOL
# character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
# urldecode
##########################################################################
PN=`basename \"$0\"` # Program name
VER=\'1.4\'
: ${AWK=awk}
Usage () {
echo >&2 \"$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
-l: encode line endings (result will be one line of output)
The default is to encode each input line on its own.\"
exit 1
}
Msg () {
for MsgLine
do echo \"$PN: $MsgLine\" >&2
done
}
Fatal () { Msg \"$@\"; exit 1; }
set -- `getopt hl \"$@\" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage # \"getopt\" detected an error
EncodeEOL=no
while [ $# -gt 0 ]
do
case \"$1\" in
-l) EncodeEOL=yes;;
--) shift; break;;
-h) Usage;;
-*) Usage;;
*) break;; # First file name
esac
shift
done
LANG=C export LANG
$AWK \'
BEGIN {
# We assume an awk implementation that is just plain dumb.
# We will convert an character to its ASCII value with the
# table ord[], and produce two-digit hexadecimal output
# without the printf(\"%02X\") feature.
EOL = \"%0A\" # \"end of line\" string (encoded)
split (\"1 2 3 4 5 6 7 8 9 A B C D E F\", hextab, \" \")
hextab [0] = 0
for ( i=1; i<=255; ++i ) ord [ sprintf (\"%c\", i) \"\" ] = i + 0
if (\"\'\"$EncodeEOL\"\'\" == \"yes\") EncodeEOL = 1; else EncodeEOL = 0
}
{
encoded = \"\"
for ( i=1; i<=length ($0); ++i ) {
c = substr ($0, i, 1)
if ( c ~ /[a-zA-Z0-9.-]/ ) {
encoded = encoded c # safe character
} else if ( c == \" \" ) {
encoded = encoded \"+\" # special handling
} else {
# unsafe character, encode it as a two-digit hex-number
lo = ord [c] % 16
hi = int (ord [c] / 16);
encoded = encoded \"%\" hextab [hi] hextab [lo]
}
}
if ( EncodeEOL ) {
printf (\"%s\", encoded EOL)
} else {
print encoded
}
}
END {
#if ( EncodeEOL ) print \"\"
}
\' \"$@\"
回答11:
This may be the best one:
after=$(echo -e \"$before\" | od -An -tx1 | tr \' \' % | xargs printf \"%s\")
回答12:
url=$(echo \"$1\" | sed -e \'s/%/%25/g\' -e \'s/ /%20/g\' -e \'s/!/%21/g\' -e \'s/\"/%22/g\' -e \'s/#/%23/g\' -e \'s/\\$/%24/g\' -e \'s/\\&/%26/g\' -e \'s/\'\\\'\'/%27/g\' -e \'s/(/%28/g\' -e \'s/)/%29/g\' -e \'s/\\*/%2a/g\' -e \'s/+/%2b/g\' -e \'s/,/%2c/g\' -e \'s/-/%2d/g\' -e \'s/\\./%2e/g\' -e \'s/\\//%2f/g\' -e \'s/:/%3a/g\' -e \'s/;/%3b/g\' -e \'s//%3e/g\' -e \'s/?/%3f/g\' -e \'s/@/%40/g\' -e \'s/\\[/%5b/g\' -e \'s/\\\\/%5c/g\' -e \'s/\\]/%5d/g\' -e \'s/\\^/%5e/g\' -e \'s/_/%5f/g\' -e \'s/`/%60/g\' -e \'s/{/%7b/g\' -e \'s/|/%7c/g\' -e \'s/}/%7d/g\' -e \'s/~/%7e/g\')
this will encode the string inside of $1 and output it in $url. although you don\'t have to put it in a var if you want. BTW didn\'t include the sed for tab thought it would turn it into spaces
回答13:
For those of you looking for a solution that doesn\'t need perl, here is one that only needs hexdump and awk:
url_encode() {
[ $# -lt 1 ] && { return; }
encodedurl=\"$1\";
# make sure hexdump exists, if not, just give back the url
[ ! -x \"/usr/bin/hexdump\" ] && { return; }
encodedurl=`
echo $encodedurl | hexdump -v -e \'1/1 \"%02x\\t\"\' -e \'1/1 \"%_c\\n\"\' |
LANG=C awk \'
$1 == \"20\" { printf(\"%s\", \"+\"); next } # space becomes plus
$1 ~ /0[adAD]/ { next } # strip newlines
$2 ~ /^[a-zA-Z0-9.*()\\/-]$/ { printf(\"%s\", $2); next } # pass through what we can
{ printf(\"%%%s\", $1) } # take hex value of everything else
\'`
}
Stitched together from a couple of places across the net and some local trial and error. It works great!
回答14:
Using php from a shell script:
value=\"http://www.google.com\"
encoded=$(php -r \"echo rawurlencode(\'$value\');\")
# encoded = \"http%3A%2F%2Fwww.google.com\"
echo $(php -r \"echo rawurldecode(\'$encoded\');\")
# returns: \"http://www.google.com\"
- http://www.php.net/manual/en/function.rawurlencode.php
- http://www.php.net/manual/en/function.rawurldecode.php
回答15:
uni2ascii is very handy:
$ echo -ne \'你好世界\' | uni2ascii -aJ
%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C
回答16:
If you don\'t want to depend on Perl you can also use sed. It\'s a bit messy, as each character has to be escaped individually. Make a file with the following contents and call it urlencode.sed
s/%/%25/g
s/ /%20/g
s/ /%09/g
s/!/%21/g
s/\"/%22/g
s/#/%23/g
s/\\$/%24/g
s/\\&/%26/g
s/\'\\\'\'/%27/g
s/(/%28/g
s/)/%29/g
s/\\*/%2a/g
s/+/%2b/g
s/,/%2c/g
s/-/%2d/g
s/\\./%2e/g
s/\\//%2f/g
s/:/%3a/g
s/;/%3b/g
s//%3e/g
s/?/%3f/g
s/@/%40/g
s/\\[/%5b/g
s/\\\\/%5c/g
s/\\]/%5d/g
s/\\^/%5e/g
s/_/%5f/g
s/`/%60/g
s/{/%7b/g
s/|/%7c/g
s/}/%7d/g
s/~/%7e/g
s/ /%09/g
To use it do the following.
STR1=$(echo \"https://www.example.com/change&$ ^this to?%checkthe@-functionality\" | cut -d\\? -f1)
STR2=$(echo \"https://www.example.com/change&$ ^this to?%checkthe@-functionality\" | cut -d\\? -f2)
OUT2=$(echo \"$STR2\" | sed -f urlencode.sed)
echo \"$STR1?$OUT2\"
This will split the string into a part that needs encoding, and the part that is fine, encode the part that needs it, then stitches back together.
You can put that into a sh script for convenience, maybe have it take a parameter to encode, put it on your path and then you can just call:
urlencode https://www.exxample.com?isThisFun=HellNo
source
回答17:
The question is about doing this in bash and there\'s no need for python or perl as there is in fact a single command that does exactly what you want - \"urlencode\".
value=$(urlencode \"${2}\")
This is also much better, as the above perl answer, for example, doesn\'t encode all characters correctly. Try it with the long dash you get from Word and you get the wrong encoding.
Note, you need \"gridsite-clients\" installed to provide this command.
回答18:
You can emulate javascript\'s encodeURIComponent
in perl. Here\'s the command:
perl -pe \'s/([^a-zA-Z0-9_.!~*()\'\\\'\'-])/sprintf(\"%%%02X\", ord($1))/ge\'
You could set this as a bash alias in .bash_profile
:
alias encodeURIComponent=\'perl -pe \'\\\'\'s/([^a-zA-Z0-9_.!~*()\'\\\'\'\\\'\\\'\'\'\\\'\'-])/sprintf(\"%%%02X\",ord($1))/ge\'\\\'
Now you can pipe into encodeURIComponent
:
$ echo -n \'hèllo wôrld!\' | encodeURIComponent
h%C3%A8llo%20w%C3%B4rld!
回答19:
Simple PHP option:
echo \'part-that-needs-encoding\' | php -R \'echo urlencode($argn);\'
回答20:
Here\'s a Bash solution which doesn\'t invoke any external programs:
uriencode() {
s=\"${1//\'%\'/%25}\"
s=\"${s//\' \'/%20}\"
s=\"${s//\'\"\'/%22}\"
s=\"${s//\'#\'/%23}\"
s=\"${s//\'$\'/%24}\"
s=\"${s//\'&\'/%26}\"
s=\"${s//\'+\'/%2B}\"
s=\"${s//\',\'/%2C}\"
s=\"${s//\'/\'/%2F}\"
s=\"${s//\':\'/%3A}\"
s=\"${s//\';\'/%3B}\"
s=\"${s//\'=\'/%3D}\"
s=\"${s//\'?\'/%3F}\"
s=\"${s//\'@\'/%40}\"
s=\"${s//\'[\'/%5B}\"
s=\"${s//\']\'/%5D}\"
printf %s \"$s\"
}
回答21:
Another php approach:
echo \"encode me\" | php -r \"echo urlencode(file_get_contents(\'php://stdin\'));\"
回答22:
Here\'s the node version:
uriencode() {
node -p \"encodeURIComponent(\'${1//\\\'/\\\\\\\'}\')\"
}
回答23:
Ruby, for completeness
value=\"$(ruby -r cgi -e \'puts CGI.escape(ARGV[0])\' \"$2\")\"
回答24:
Here is my version for busybox ash shell for an embedded system, I originally adopted Orwellophile\'s variant:
urlencode()
{
local S=\"${1}\"
local encoded=\"\"
local ch
local o
for i in $(seq 0 $((${#S} - 1)) )
do
ch=${S:$i:1}
case \"${ch}\" in
[-_.~a-zA-Z0-9])
o=\"${ch}\"
;;
*)
o=$(printf \'%%%02x\' \"\'$ch\")
;;
esac
encoded=\"${encoded}${o}\"
done
echo ${encoded}
}
urldecode()
{
# urldecode <string>
local url_encoded=\"${1//+/ }\"
printf \'%b\' \"${url_encoded//%/\\\\x}\"
}
回答25:
Here is a POSIX function to do that:
encodeURIComponent() {
awk \'BEGIN {while (y++ < 125) z[sprintf(\"%c\", y)] = y
while (y = substr(ARGV[1], ++j, 1))
q = y ~ /[[:alnum:]_.!~*\\47()-]/ ? q y : q sprintf(\"%%%02X\", z[y])
print q}\' \"$1\"
}
Example:
value=$(encodeURIComponent \"$2\")
Source
回答26:
Here\'s a one-line conversion using Lua, similar to blueyed\'s answer except with all the RFC 3986 Unreserved Characters left unencoded (like this answer):
url=$(echo \'print((arg[1]:gsub(\"([^%w%-%.%_%~])\",function(c)return(\"%%%02X\"):format(c:byte())end)))\' | lua - \"$1\")
Additionally, you may need to ensure that newlines in your string are converted from LF to CRLF, in which case you can insert a gsub(\"\\r?\\n\", \"\\r\\n\")
in the chain before the percent-encoding.
Here\'s a variant that, in the non-standard style of application/x-www-form-urlencoded, does that newline normalization, as well as encoding spaces as \'+\' instead of \'%20\' (which could probably be added to the Perl snippet using a similar technique).
url=$(echo \'print((arg[1]:gsub(\"\\r?\\n\", \"\\r\\n\"):gsub(\"([^%w%-%.%_%~ ]))\",function(c)return(\"%%%02X\"):format(c:byte())end):gsub(\" \",\"+\"))\' | lua - \"$1\")
回答27:
Having php installed I use this way:
URL_ENCODED_DATA=`php -r \"echo urlencode(\'$DATA\');\"`
回答28:
This is the ksh version of orwellophile\'s answer containing the rawurlencode and rawurldecode functions (link: How to urlencode data for curl command?). I don\'t have enough rep to post a comment, hence the new post..
#!/bin/ksh93
function rawurlencode
{
typeset string=\"${1}\"
typeset strlen=${#string}
typeset encoded=\"\"
for (( pos=0 ; pos<strlen ; pos++ )); do
c=${string:$pos:1}
case \"$c\" in
[-_.~a-zA-Z0-9] ) o=\"${c}\" ;;
* ) o=$(printf \'%%%02x\' \"\'$c\")
esac
encoded+=\"${o}\"
done
print \"${encoded}\"
}
function rawurldecode
{
printf $(printf \'%b\' \"${1//%/\\\\x}\")
}
print $(rawurlencode \"C++\") # --> C%2b%2b
print $(rawurldecode \"C%2b%2b\") # --> C++
回答29:
What would parse URLs better than javascript?
node -p \"encodeURIComponent(\'$url\')\"
回答30:
The following is based on Orwellophile\'s answer, but solves the multibyte
bug mentioned in the comments by setting LC_ALL=C (a trick from vte.sh).
I\'ve written it in the form of function suitable PROMPT_COMMAND, because
that\'s how I use it.
print_path_url() {
local LC_ALL=C
local string=\"$PWD\"
local strlen=${#string}
local encoded=\"\"
local pos c o
for (( pos=0 ; pos<strlen ; pos++ )); do
c=${string:$pos:1}
case \"$c\" in
[-_.~a-zA-Z0-9/] ) o=\"${c}\" ;;
* ) printf -v o \'%%%02x\' \"\'$c\"
esac
encoded+=\"${o}\"
done
printf \"\\033]7;file://%s%s\\007\" \"${HOSTNAME:-}\" \"${encoded}\"
}