I'm having an issue where the template url reversal is escaping colon and parenthetical characters. I want these characters to remain unescaped in the anchor tag's href attribute. It used to behave this way when I was in django 1.3, but upgrading to 1.6, I notice that this does not behave as I want.
What I have:
surt = 'http://(gov/'
browse_domain = 'gov'
... in template ...
<a href="{% url 'nomination.views.url_surt' project.project_slug surt %}">{{ browse_domain }}</a>
This yields:
<a href="/nomination/eth2008/surt/http%3A//%28gov/">gov</a>
As you can see, the colon :
and left parenthetical (
characters are being escaped in the url href attribute. I don't want that.
What I want:
surt = 'http://(gov/'
browse_domain = 'Gov'
... in template ...
<a href="{% url 'nomination.views.url_surt' project.project_slug surt %}">{{ browse_domain }}</a>
This yields:
<a href="/nomination/eth2008/surt/http://(gov/">gov</a>
Anyone know how to keep these characters from escaping when I'm reversing URLs in my anchor tag?
NOTE: The below answer is wrong. urllib.quote(safe=':()') will indeed keep those safe characters unescaped. Something else is happening in django to cause this problem and I still don't know where it is.
In Django 1.6, any url reversal in the template will first pass through iri_to_uri()
before it is rendered to HTML.
There is no override for this in the template call to url reverse {% url %}
as-is.
Notice this bit of italicized text detailing the change.
This is iri_to_uri()
def iri_to_uri(iri):
"""
Convert an Internationalized Resource Identifier (IRI) portion to a URI
portion that is suitable for inclusion in a URL.
This is the algorithm from section 3.1 of RFC 3987. However, since we are
assuming input is either UTF-8 or unicode already, we can simplify things a
little from the full method.
Returns an ASCII string containing the encoded result.
"""
# The list of safe characters here is constructed from the "reserved" and
# "unreserved" characters specified in sections 2.2 and 2.3 of RFC 3986:
# reserved = gen-delims / sub-delims
# gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
# sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
# / "*" / "+" / "," / ";" / "="
# unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
# Of the unreserved characters, urllib.quote already considers all but
# the ~ safe.
# The % character is also added to the list of safe characters here, as the
# end of section 3.1 of RFC 3987 specifically mentions that % must not be
# converted.
if iri is None:
return iri
return urllib.quote(smart_str(iri), safe="/#%[]=:;$&()+,!?*@'~")
At first glance, this might look like :
, (
, and )
are safe from escaped hex-encoding because they are passed as 'safe' to urllib.quote()
:
_safe_map = {}
for i, c in zip(xrange(256), str(bytearray(xrange(256)))):
_safe_map[c] = c if (i < 128 and c in always_safe) else '%{:02X}'.format(i)
_safe_quoters = {}
def quote(s, safe='/'):
# fastpath
if not s:
if s is None:
raise TypeError('None object cannot be quoted')
return s
cachekey = (safe, always_safe)
try:
(quoter, safe) = _safe_quoters[cachekey]
except KeyError:
safe_map = _safe_map.copy()
safe_map.update([(c, c) for c in safe])
quoter = safe_map.__getitem__
safe = always_safe + safe
_safe_quoters[cachekey] = (quoter, safe)
if not s.rstrip(safe):
return s
return ''.join(map(quoter, s))
If you step through the actual urllib.quote()
method as shown above, 'safe' actually means that those characters will be escaped/quoted. Initially, I thought 'safe' meant 'safe-from-quoting'. It caused me a great deal of confusion. I guess they instead mean, 'safe' as 'safe-in-terms-of-sections-2.2-and-2.3-of-RFC-3986'. Perhaps a more elaborately named keyword argument would be prudent, but then again, there's a whole cornucopia of things I find awkward regarding urllib
. ಠ_ಠ
After much research, and due to the fact that we don't want to modify Django core methods, our team decided to do some hacky url-construction in the template (the very kind Django docs strongly eschew). It's not perfect, but it works for our use case.