I am retrieving a page from another host, and then initializing the form with data from a database before sending it on to the user.
I need to make the URLs in href
and src
attributes absolute, so that the browsers load them from the right place.
Can I set an HTTP header to cause this to happen without modifying the HTML?
There is no such for HTTP. But you can set the base URL with HTML’s BASE
element like:
<base href="http://example.com/">
YES or NO, depending on which HTTP spec you use.
Per HTML and URLs on W3C:
User agents should calculate the base URL for resolving relative URLs according to the [RFC1808]. The following is a summary of how [RFC1808] applies to HTML. User agents should calculate the base URL according to the following precedences (highest priority to lowest):
- The base URL is set by the
BASE
element.
- The base URL is given by an HTTP header (see [RFC2068]).
- By default, the base URL is that of the current document.
Additionally, the OBJECT
and APPLET
elements define attributes that take precedence over the value set by the BASE
element. Please consult the definitions of these elements for more information about URL issues specific to them.
RFC 2068 is the original spec for HTTP 1.1. It defined Content-Base
and Content-Location
headers for the purpose of specifying an entity's base URL used for resolving relative URLs within the entity:
14.11 Content-Base
The Content-Base entity-header field may be used to specify the base
URI for resolving relative URLs within the entity. This header field
is described as Base in RFC 1808, which is expected to be revised.
Content-Base = "Content-Base" ":" absoluteURI
If no Content-Base field is present, the base URI of an entity is
defined either by its Content-Location (if that Content-Location URI
is an absolute URI) or the URI used to initiate the request, in that
order of precedence. Note, however, that the base URI of the contents
within the entity-body may be redefined within that entity-body.
14.15 Content-Location
The Content-Location entity-header field may be used to supply the
resource location for the entity enclosed in the message. In the case
where a resource has multiple entities associated with it, and those
entities actually have separate locations by which they might be
individually accessed, the server should provide a Content-Location
for the particular variant which is returned. In addition, a server
SHOULD provide a Content-Location for the resource corresponding to
the response entity.
Content-Location = "Content-Location" ":"
( absoluteURI | relativeURI )
If no Content-Base header field is present, the value of Content-
Location also defines the base URL for the entity (see section
14.11).
The Content-Location value is not a replacement for the original
requested URI; it is only a statement of the location of the resource
corresponding to this particular entity at the time of the request.
Future requests MAY use the Content-Location URI if the desire is to
identify the source of that particular entity.
A cache cannot assume that an entity with a Content-Location
different from the URI used to retrieve it can be used to respond to
later requests on that Content-Location URI. However, the Content-
Location can be used to differentiate between multiple entities
retrieved from a single requested resource, as described in section
13.6.
If the Content-Location is a relative URI, the URI is interpreted
relative to any Content-Base URI provided in the response. If no
Content-Base is provided, the relative URI is interpreted relative to
the Request-URI.
RFC 2068 is obsolete, replaced by RFC 2616, which is currently the most common HTTP 1.1 spec implemented by most web servers. It deletes the Content-Base
header completely from the HTTP 1.1 spec, and slightly re-defines the semantics of Content-Location
:
14.14 Content-Location
The Content-Location entity-header field MAY be used to supply the
resource location for the entity enclosed in the message when that
entity is accessible from a location separate from the requested
resource's URI. A server SHOULD provide a Content-Location for the
variant corresponding to the response entity; especially in the case
where a resource has multiple entities associated with it, and those
entities actually have separate locations by which they might be
individually accessed, the server SHOULD provide a Content-Location
for the particular variant which is returned.
Content-Location = "Content-Location" ":"
( absoluteURI | relativeURI )
The value of Content-Location also defines the base URI for the
entity.
The Content-Location value is not a replacement for the original
requested URI; it is only a statement of the location of the resource
corresponding to this particular entity at the time of the request.
Future requests MAY specify the Content-Location URI as the request-
URI if the desire is to identify the source of that particular
entity.
A cache cannot assume that an entity with a Content-Location
different from the URI used to retrieve it can be used to respond to
later requests on that Content-Location URI. However, the Content-
Location can be used to differentiate between multiple entities
retrieved from a single requested resource, as described in section
13.6.
If the Content-Location is a relative URI, the relative URI is
interpreted relative to the Request-URI.
The meaning of the Content-Location header in PUT or POST requests is
undefined; servers are free to ignore it in those cases.
It is important to note that "The value of Content-Location also defines the base URI for the entity" still applies at this point.
Moving forward, RFC 2616 has been obsoleted by RFCs 7230-7235 (which are not widely implemented yet). In particular, RFC 7231 completely redefines the semantics of Content-Location
:
3.1.4.2. Content-Location
The "Content-Location" header field references a URI that can be used
as an identifier for a specific resource corresponding to the
representation in this message's payload. In other words, if one
were to perform a GET request on this URI at the time of this
message's generation, then a 200 (OK) response would contain the same
representation that is enclosed as payload in this message.
Content-Location = absolute-URI / partial-URI
The Content-Location value is not a replacement for the effective
Request URI (Section 5.5 of [RFC7230]). It is representation
metadata. It has the same syntax and semantics as the header field
of the same name defined for MIME body parts in Section 4 of
[RFC2557]. However, its appearance in an HTTP message has some
special implications for HTTP recipients.
If Content-Location is included in a 2xx (Successful) response
message and its value refers (after conversion to absolute form) to a
URI that is the same as the effective request URI, then the recipient
MAY consider the payload to be a current representation of that
resource at the time indicated by the message origination date. For
a GET (Section 4.3.1) or HEAD (Section 4.3.2) request, this is the
same as the default semantics when no Content-Location is provided by
the server. For a state-changing request like PUT (Section 4.3.4) or
POST (Section 4.3.3), it implies that the server's response contains
the new representation of that resource, thereby distinguishing it
from representations that might only report about the action (e.g.,
"It worked!"). This allows authoring applications to update their
local copies without the need for a subsequent GET request.
If Content-Location is included in a 2xx (Successful) response
message and its field-value refers to a URI that differs from the
effective request URI, then the origin server claims that the URI is
an identifier for a different resource corresponding to the enclosed
representation. Such a claim can only be trusted if both identifiers
share the same resource owner, which cannot be programmatically
determined via HTTP.
o For a response to a GET or HEAD request, this is an indication
that the effective request URI refers to a resource that is
subject to content negotiation and the Content-Location
field-value is a more specific identifier for the selected
representation.
o For a 201 (Created) response to a state-changing method, a
Content-Location field-value that is identical to the Location
field-value indicates that this payload is a current
representation of the newly created resource.
o Otherwise, such a Content-Location indicates that this payload is
a representation reporting on the requested action's status and
that the same report is available (for future access with GET) at
the given URI. For example, a purchase transaction made via a
POST request might include a receipt document as the payload of
the 200 (OK) response; the Content-Location field-value provides
an identifier for retrieving a copy of that same receipt in the
future.
A user agent that sends Content-Location in a request message is
stating that its value refers to where the user agent originally
obtained the content of the enclosed representation (prior to any
modifications made by that user agent). In other words, the user
agent is providing a back link to the source of the original
representation.
An origin server that receives a Content-Location field in a request
message MUST treat the information as transitory request context
rather than as metadata to be saved verbatim as part of the
representation. An origin server MAY use that context to guide in
processing the request or to save it for other uses, such as within
source links or versioning metadata. However, an origin server MUST
NOT use such context information to alter the request semantics.
For example, if a client makes a PUT request on a negotiated resource
and the origin server accepts that PUT (without redirection), then
the new state of that resource is expected to be consistent with the
one representation supplied in that PUT; the Content-Location cannot
be used as a form of reverse content selection identifier to update
only one of the negotiated representations. If the user agent had
wanted the latter semantics, it would have applied the PUT directly
to the Content-Location URI.
Most importantly, RFC 7231 also states:
Appendix B. Changes from RFC 2616
...
The definition of Content-Location has been changed to no longer
affect the base URI for resolving relative URI references, due to
poor implementation support and the undesirable effect of potentially
breaking relative links in content-negotiated resources.
(Section 3.1.4.2)
...
So, in answer to the question that was asked:
as of RFC 2616, the answer is YES, Content-Location
exists to specify an entity's base URL at the HTTP level.
as of RFC 7231, the answer is NO, Content-Location
can no longer be used to specify an entity's base URL.
AFAIK, as of RFC 7231, no new or existing HTTP header has been defined to restore the base URL behavior. So there is no longer an HTTP header available for specifying a base URL. It can only be specified by the entity itself, if it needs to be different than the entity's request URL.
No. The only way to do that would be a <base>
element in the HTML output.
See docs here: HTML <base>
Tag
Alternative idea
if you can't touch the HTML, you should be able to put something together using mod_rewrite
. You would build 301 redirect statements for your image resources, that will point forward to a remote server. The only condition for this is that your image requests follow a fixed pattern (e.g. /images/xyz.jpg
) that you can translate into a RewriteRule
.
Check out this tutorial to get you started.