Is there an HTTP header to say what base URL to us

2019-02-22 03:03发布

I am retrieving a page from another host, and then initializing the form with data from a database before sending it on to the user.

I need to make the URLs in href and src attributes absolute, so that the browsers load them from the right place.

Can I set an HTTP header to cause this to happen without modifying the HTML?

标签: html http
3条回答
冷血范
2楼-- · 2019-02-22 03:41

YES or NO, depending on which HTTP spec you use.

Per HTML and URLs on W3C:

User agents should calculate the base URL for resolving relative URLs according to the [RFC1808]. The following is a summary of how [RFC1808] applies to HTML. User agents should calculate the base URL according to the following precedences (highest priority to lowest):

  1. The base URL is set by the BASE element.
  2. The base URL is given by an HTTP header (see [RFC2068]).
  3. By default, the base URL is that of the current document.

Additionally, the OBJECT and APPLET elements define attributes that take precedence over the value set by the BASE element. Please consult the definitions of these elements for more information about URL issues specific to them.

RFC 2068 is the original spec for HTTP 1.1. It defined Content-Base and Content-Location headers for the purpose of specifying an entity's base URL used for resolving relative URLs within the entity:

14.11 Content-Base

   The Content-Base entity-header field may be used to specify the base
   URI for resolving relative URLs within the entity. This header field
   is described as Base in RFC 1808, which is expected to be revised.

          Content-Base      = "Content-Base" ":" absoluteURI

   If no Content-Base field is present, the base URI of an entity is
   defined either by its Content-Location (if that Content-Location URI
   is an absolute URI) or the URI used to initiate the request, in that
   order of precedence. Note, however, that the base URI of the contents
   within the entity-body may be redefined within that entity-body.
14.15 Content-Location

   The Content-Location entity-header field may be used to supply the
   resource location for the entity enclosed in the message. In the case
   where a resource has multiple entities associated with it, and those
   entities actually have separate locations by which they might be
   individually accessed, the server should provide a Content-Location
   for the particular variant which is returned. In addition, a server
   SHOULD provide a Content-Location for the resource corresponding to
   the response entity.

          Content-Location = "Content-Location" ":"
                            ( absoluteURI | relativeURI )

   If no Content-Base header field is present, the value of Content-
   Location also defines the base URL for the entity (see section
   14.11).

   The Content-Location value is not a replacement for the original
   requested URI; it is only a statement of the location of the resource
   corresponding to this particular entity at the time of the request.
   Future requests MAY use the Content-Location URI if the desire is to
   identify the source of that particular entity.

   A cache cannot assume that an entity with a Content-Location
   different from the URI used to retrieve it can be used to respond to
   later requests on that Content-Location URI. However, the Content-
   Location can be used to differentiate between multiple entities
   retrieved from a single requested resource, as described in section
   13.6.

   If the Content-Location is a relative URI, the URI is interpreted
   relative to any Content-Base URI provided in the response. If no
   Content-Base is provided, the relative URI is interpreted relative to
   the Request-URI.

RFC 2068 is obsolete, replaced by RFC 2616, which is currently the most common HTTP 1.1 spec implemented by most web servers. It deletes the Content-Base header completely from the HTTP 1.1 spec, and slightly re-defines the semantics of Content-Location:

14.14 Content-Location

   The Content-Location entity-header field MAY be used to supply the
   resource location for the entity enclosed in the message when that
   entity is accessible from a location separate from the requested
   resource's URI. A server SHOULD provide a Content-Location for the
   variant corresponding to the response entity; especially in the case
   where a resource has multiple entities associated with it, and those
   entities actually have separate locations by which they might be
   individually accessed, the server SHOULD provide a Content-Location
   for the particular variant which is returned.

       Content-Location = "Content-Location" ":"
                         ( absoluteURI | relativeURI )

   The value of Content-Location also defines the base URI for the
   entity.

   The Content-Location value is not a replacement for the original
   requested URI; it is only a statement of the location of the resource
   corresponding to this particular entity at the time of the request.
   Future requests MAY specify the Content-Location URI as the request-
   URI if the desire is to identify the source of that particular
   entity.

   A cache cannot assume that an entity with a Content-Location
   different from the URI used to retrieve it can be used to respond to
   later requests on that Content-Location URI. However, the Content-
   Location can be used to differentiate between multiple entities
   retrieved from a single requested resource, as described in section
   13.6.

   If the Content-Location is a relative URI, the relative URI is
   interpreted relative to the Request-URI.

   The meaning of the Content-Location header in PUT or POST requests is
   undefined; servers are free to ignore it in those cases.

It is important to note that "The value of Content-Location also defines the base URI for the entity" still applies at this point.

Moving forward, RFC 2616 has been obsoleted by RFCs 7230-7235 (which are not widely implemented yet). In particular, RFC 7231 completely redefines the semantics of Content-Location:

3.1.4.2.  Content-Location

   The "Content-Location" header field references a URI that can be used
   as an identifier for a specific resource corresponding to the
   representation in this message's payload.  In other words, if one
   were to perform a GET request on this URI at the time of this
   message's generation, then a 200 (OK) response would contain the same
   representation that is enclosed as payload in this message.

     Content-Location = absolute-URI / partial-URI

   The Content-Location value is not a replacement for the effective
   Request URI (Section 5.5 of [RFC7230]).  It is representation
   metadata.  It has the same syntax and semantics as the header field
   of the same name defined for MIME body parts in Section 4 of
   [RFC2557].  However, its appearance in an HTTP message has some
   special implications for HTTP recipients.

   If Content-Location is included in a 2xx (Successful) response
   message and its value refers (after conversion to absolute form) to a
   URI that is the same as the effective request URI, then the recipient
   MAY consider the payload to be a current representation of that
   resource at the time indicated by the message origination date.  For
   a GET (Section 4.3.1) or HEAD (Section 4.3.2) request, this is the
   same as the default semantics when no Content-Location is provided by
   the server.  For a state-changing request like PUT (Section 4.3.4) or
   POST (Section 4.3.3), it implies that the server's response contains
   the new representation of that resource, thereby distinguishing it
   from representations that might only report about the action (e.g.,
   "It worked!").  This allows authoring applications to update their
   local copies without the need for a subsequent GET request.

   If Content-Location is included in a 2xx (Successful) response
   message and its field-value refers to a URI that differs from the
   effective request URI, then the origin server claims that the URI is
   an identifier for a different resource corresponding to the enclosed
   representation.  Such a claim can only be trusted if both identifiers
   share the same resource owner, which cannot be programmatically
   determined via HTTP.

   o  For a response to a GET or HEAD request, this is an indication
      that the effective request URI refers to a resource that is
      subject to content negotiation and the Content-Location
      field-value is a more specific identifier for the selected
      representation.

   o  For a 201 (Created) response to a state-changing method, a
      Content-Location field-value that is identical to the Location
      field-value indicates that this payload is a current
      representation of the newly created resource.

   o  Otherwise, such a Content-Location indicates that this payload is
      a representation reporting on the requested action's status and
      that the same report is available (for future access with GET) at
      the given URI.  For example, a purchase transaction made via a
      POST request might include a receipt document as the payload of
      the 200 (OK) response; the Content-Location field-value provides
      an identifier for retrieving a copy of that same receipt in the
      future.

   A user agent that sends Content-Location in a request message is
   stating that its value refers to where the user agent originally
   obtained the content of the enclosed representation (prior to any
   modifications made by that user agent).  In other words, the user
   agent is providing a back link to the source of the original
   representation.

   An origin server that receives a Content-Location field in a request
   message MUST treat the information as transitory request context
   rather than as metadata to be saved verbatim as part of the
   representation.  An origin server MAY use that context to guide in
   processing the request or to save it for other uses, such as within
   source links or versioning metadata.  However, an origin server MUST
   NOT use such context information to alter the request semantics.

   For example, if a client makes a PUT request on a negotiated resource
   and the origin server accepts that PUT (without redirection), then
   the new state of that resource is expected to be consistent with the
   one representation supplied in that PUT; the Content-Location cannot
   be used as a form of reverse content selection identifier to update
   only one of the negotiated representations.  If the user agent had
   wanted the latter semantics, it would have applied the PUT directly
   to the Content-Location URI.

Most importantly, RFC 7231 also states:

Appendix B.  Changes from RFC 2616

   ...

   The definition of Content-Location has been changed to no longer
   affect the base URI for resolving relative URI references, due to
   poor implementation support and the undesirable effect of potentially
   breaking relative links in content-negotiated resources.
   (Section 3.1.4.2)

   ...

So, in answer to the question that was asked:

  • as of RFC 2616, the answer is YES, Content-Location exists to specify an entity's base URL at the HTTP level.

  • as of RFC 7231, the answer is NO, Content-Location can no longer be used to specify an entity's base URL.

AFAIK, as of RFC 7231, no new or existing HTTP header has been defined to restore the base URL behavior. So there is no longer an HTTP header available for specifying a base URL. It can only be specified by the entity itself, if it needs to be different than the entity's request URL.

查看更多
爷的心禁止访问
3楼-- · 2019-02-22 03:45

There is no such for HTTP. But you can set the base URL with HTML’s BASE element like:

<base href="http://example.com/">
查看更多
三岁会撩人
4楼-- · 2019-02-22 03:58

No. The only way to do that would be a <base> element in the HTML output.

See docs here: HTML <base> Tag

Alternative idea

if you can't touch the HTML, you should be able to put something together using mod_rewrite. You would build 301 redirect statements for your image resources, that will point forward to a remote server. The only condition for this is that your image requests follow a fixed pattern (e.g. /images/xyz.jpg) that you can translate into a RewriteRule.

Check out this tutorial to get you started.

查看更多
登录 后发表回答