template url reversal escaping surt arguments

2019-07-10 13:41发布

问题:

I'm having an issue where the template url reversal is escaping colon and parenthetical characters. I want these characters to remain unescaped in the anchor tag's href attribute. It used to behave this way when I was in django 1.3, but upgrading to 1.6, I notice that this does not behave as I want.

What I have:

surt = 'http://(gov/'
browse_domain = 'gov'
... in template ...
<a href="{% url 'nomination.views.url_surt' project.project_slug surt %}">{{ browse_domain }}</a>

This yields:

<a href="/nomination/eth2008/surt/http%3A//%28gov/">gov</a>

As you can see, the colon : and left parenthetical ( characters are being escaped in the url href attribute. I don't want that.

What I want:

surt = 'http://(gov/'
browse_domain = 'Gov'
... in template ...
<a href="{% url 'nomination.views.url_surt' project.project_slug surt %}">{{ browse_domain }}</a>

This yields:

<a href="/nomination/eth2008/surt/http://(gov/">gov</a>

Anyone know how to keep these characters from escaping when I'm reversing URLs in my anchor tag?

回答1:

NOTE: The below answer is wrong. urllib.quote(safe=':()') will indeed keep those safe characters unescaped. Something else is happening in django to cause this problem and I still don't know where it is.

In Django 1.6, any url reversal in the template will first pass through iri_to_uri() before it is rendered to HTML. There is no override for this in the template call to url reverse {% url %} as-is.

Notice this bit of italicized text detailing the change.

This is iri_to_uri()

def iri_to_uri(iri):
    """
    Convert an Internationalized Resource Identifier (IRI) portion to a URI
    portion that is suitable for inclusion in a URL.

    This is the algorithm from section 3.1 of RFC 3987.  However, since we are
    assuming input is either UTF-8 or unicode already, we can simplify things a
    little from the full method.

    Returns an ASCII string containing the encoded result.
    """
    # The list of safe characters here is constructed from the "reserved" and
    # "unreserved" characters specified in sections 2.2 and 2.3 of RFC 3986:
    #     reserved    = gen-delims / sub-delims
    #     gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
    #     sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
    #                   / "*" / "+" / "," / ";" / "="
    #     unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
    # Of the unreserved characters, urllib.quote already considers all but
    # the ~ safe.
    # The % character is also added to the list of safe characters here, as the
    # end of section 3.1 of RFC 3987 specifically mentions that % must not be
    # converted.
    if iri is None:
        return iri
    return urllib.quote(smart_str(iri), safe="/#%[]=:;$&()+,!?*@'~")

At first glance, this might look like :, (, and ) are safe from escaped hex-encoding because they are passed as 'safe' to urllib.quote():

_safe_map = {}
for i, c in zip(xrange(256), str(bytearray(xrange(256)))):
    _safe_map[c] = c if (i < 128 and c in always_safe) else '%{:02X}'.format(i)
_safe_quoters = {}

def quote(s, safe='/'):
    # fastpath
    if not s:
        if s is None:
            raise TypeError('None object cannot be quoted')
        return s
    cachekey = (safe, always_safe)
    try:
        (quoter, safe) = _safe_quoters[cachekey]
    except KeyError:
        safe_map = _safe_map.copy()
        safe_map.update([(c, c) for c in safe])
        quoter = safe_map.__getitem__
        safe = always_safe + safe
        _safe_quoters[cachekey] = (quoter, safe)
    if not s.rstrip(safe):
        return s
    return ''.join(map(quoter, s))

If you step through the actual urllib.quote() method as shown above, 'safe' actually means that those characters will be escaped/quoted. Initially, I thought 'safe' meant 'safe-from-quoting'. It caused me a great deal of confusion. I guess they instead mean, 'safe' as 'safe-in-terms-of-sections-2.2-and-2.3-of-RFC-3986'. Perhaps a more elaborately named keyword argument would be prudent, but then again, there's a whole cornucopia of things I find awkward regarding urllib. ‎ಠ_ಠ

After much research, and due to the fact that we don't want to modify Django core methods, our team decided to do some hacky url-construction in the template (the very kind Django docs strongly eschew). It's not perfect, but it works for our use case.