I am trying to build a url so that I can send a get request to it using urllib
module.
Let's suppose my final_url
should be
url = "www.example.com/find.php?data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value"
Now to achieve this I tried the following way:
>>> initial_url = "http://www.stackoverflow.com"
>>> search = "Generate+value"
>>> params = {"data":initial_url,"search":search}
>>> query_string = urllib.urlencode(params)
>>> query_string
'search=Generate%2Bvalue&data=http%3A%2F%2Fwww.stackoverflow.com'
Now if you compare my query_string
with the format of final_url
you can observer two things
1) The order of params are reversed instead of data=()&search=
it is search=()&data=
2) urlencode
also encoded the +
in Generate+value
I believe the first change is due to the random behaviour of dictionary. So, I though of using OrderedDict
to reverse the dictionary. As, I am using python 2.6.5
I did
pip install ordereddict
But I am not able to use it in my code when I try
>>> od = OrderedDict((('a', 'first'), ('b', 'second')))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'OrderedDict' is not defined
So, my question is what is the correct way to use OrderedDict
in python 2.6.5 and how do I make urlencode
ignores the +
in Generate+value
.
Also, is this the correct approach to build URL
.
You shouldn't worry about encoding the
+
it should be restored on the server after unescaping the url. The order of named parameters shouldn't matter either.Considering OrderedDict, it is not Python's built in. You should import it from
collections
:if your python is too old and does not have OrderedDict in the module
collections
, use:Anyway, the order of parameters should not matter.
Note the
safe
parameter ofquote
. It prevents+
to be escaped, but it means , server will interpretGenerate+value
asGenerate value
. You can manually escape+
by writing%2B
and marking%
as safe char:First, the order of parameters in a http request should be completely irrelevant. If it isn't then the parsing library on the othe side is doing something wrong.
Second, of course the
+
is encoded.+
is used as placeholder for a space in an encoded url, so if yor raw string contains a+
, this has to be escaped.urlencode
expects an unencoded string, you can't pass it a string that is already encoded.Some comments on the question and other answers:
urllib.urlencode
, submit an ordered sequence of k/v pairs instead of mapping(dict). when you pass in a dict,urlencode
just callsfoo.items()
to grab an iterable sequence.# urllib.urlencode accepts a mapping or sequence # the output of this can vary, because `items()` is called on the dict urllib.urlencode({"data": initial_url,"search": search}) # the output of this will not vary urllib.urlencode((("data", initial_url), ("search", search)))
you can also pass in a secondard
doseq
argument to adjust how iterable values are handled.The order of parameters is not irrelevant. take these two urls for example:
https://example.com?foo=bar&bar=foo https://example.com?bar=foo&foo=bar
A http server should consider the order of these parameters irrelevant, but a function designed to compare URLs would not. In order to safely compare urls, these params would need to be sorted.
However, consider duplicate keys:
https://example.com?foo=3&foo=2&foo=1
The URI specs support duplicate keys, but don't address precedence or ordering.
In a given application, these could each trigger different results and be valid as well:
+
is a reserved character that represents a space in a urlencoded form (vs%20
for part of the path).urllib.urlencode
escapes usingurllib.quote_plus()
, noturllib.quote()
. The OP most likely wanted to just do this:initial_url = "http://www.stackoverflow.com" search = "Generate value" urllib.urlencode((("data", initial_url), ("search", search)))
Which produces:
data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value
as the output.