Should a Web site also be a Web resource?

2019-04-08 07:10发布

站内文章 / 移动开发

41 0

再贱就再见

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Every web application - every web site - is a service. (...) The features that make a web site easy for a web surfer to use also make a web service API easy for a programmer to use.

Richardson and Ruby, "RESTFul Web Services"

As I intend it, a Web site that is also a Web service provides multiple representations of its resources, depending on what the user-agent requests. The API, so-to-speak, is the Web site itself, and is not provided separately.

This isn't the case for many popular "REST APIs" out in the wild. Twitter's API, for example, is located at http://api.twitter.com/1/, the '1' in the URI being the version of the API itself. Socialcast also provides a REST API at https://demo.socialcast.com/api/ , the third level name being the name of the network it addresses.

This seems wrong to me. If I have my blog at http://www.example.com/blog, I shouldn't need to provide an API at a different location, serving JSON just for robots. Instead of having http://www.example.com/blog/posts/ and http://api.example.com/blog/posts, two different URIs, I should have just the former, and multiple representations available, among which application/json for the JSON API I wish to provide to my users.

Example 1: a browser asking for the posts on my blog;

Request:

curl -i \
 -H "Accept: text/html" \
 -X GET \
 http://www.example.org/blog/posts/

Response:

 200 OK
 Content-Type: text/html; charset=utf-8

 <html><body><h1>Posts</h1><ol><li><h2>My first post ...

Example 2: same URI, but this time a robot makes the request;

Request:

curl -i \
 -H "Accept: application/json" \
 -X GET \
 http://www.example.org/blog/posts/

Response:

 200 OK
 Content-Type: text/html; charset=utf-8

 {
    "posts": [
        {
            "id": 1,
            "title": "My first post" ...

Version numbers for APIs should be encoded in the "Accept" field of the request headers, and above all avoiding strongly typing the URIs like Twitter does ("statuses/show.json?id=210462857140252672" or "statuses/show/210462857140252672.json").

I could lose some flexibility by going for the unified approach (but, shouldn't Cool URIs never change?), but I think adhering to REST (or at least my interpretation of it) would provide more benefit.

Which is the more correct approach: separating the API and the Web site, or unifying them?

回答1:

There is no right or wrong here. Following REST and RFCs too closely may prove to be difficult when your API development is driven by specific client requirements.

In reality, human users have different behaviour patterns compared to API clients, and therefore require different treatment. The most vivid distinction comes from the fact that many APIs are very data intensive, designed for batch operations and data dumping, whereas applications for human users are more "reactive" and often do things step-by-step, request-by-request. As a consequence, in many projects APIs URL design is optimised to avoid wasting client and server resources on multiple network roundtrips and repeat storage calls.

Under the hood, API implementations often have different design from core application, optimised for the kind of operations APIs provide. For example, API implementation may use a separate caching strategy. Now if you split the code out, you may want to create a cluster of hosts that only handle the API calls. That is where placing API on another domain becomes beneficial for load management: a separate domain allows for simpler load balancing on high-load sites. In comparison, when you use /api URL prefix on the same domain name (but have separate clusters) then you need a smart (L7-aware) load balancer to do the job of splitting the request flow between API and web front end clusters, but such load balancers are more expensive.

So there may be very good technical reasons why the likes of Twitter separate out the API, but references to other implementations may not apply to YOUR project. If you are at early stages of design, you may want to start with a unified URL scheme on the same domain, but eventually you may find that there are good real-life use cases that make you change the approach, and then ... refactoring.

P.S. there is a lengthy discussion on versioning here - Best practices for API versioning?

P.S.S. I find strongly typed URLs helpful in quick debugging. You can simply put a URL into the browser with .json and quickly get the result without switching to the command line. But agree with you that "accept" header is the preferred method

P.S.S.S. SEO for APIs? I can see how a good URL design can be beneficial, but for a search engine its probably irrelevant if your service provides multiple output formats on the same path / domain name. In the end of the day, search engines are built for human users, and human users don't consume XML and JSON.

回答2:

The Web and a RESTful API may behave in different ways.

In theory, how would a request like http://mysite.com/blog/1 distinguishes if it needs to return an HTML page or just the data (JSON, XML...)? I'll vote for using the Accept http header:

Accept: text/html <-- Web browsers
Accept: application/json <-- Applications/Relying parties consuming data or performing actions

Why Twitter, Facebook or other sites don't mix both Web browsers and relying parties? Honestly I would argue that is an arbitrary decision.

Perhaps I can provide one possible reason: Web browser/Search engine robot URLs should be friendly-URLs because these work better on SEO. For that reason, maybe the SEO-ready URLs aren't very semantic in terms of REST, but they're for search engine or even human users!

Finally: which is better (it's my opinion)?

You need SEO, then use separate URLs.
You don't need SEO, then unify URLs in the same domain and format.

回答3:

I disagree with the other answer that this decision should have anything to do with SEO or how 'friendly' a URL is (robots are [written by] people too!). But my intuition tells me that better SEO results would come from unifying the URIs since that also unifies pagerank in the (unlikely) event that your API URIs would get linked to from the world wild web.

What this decision should rest on is what your server and clients are capable of. If they can set Accept request headers, and your server is smart enough to do transparent content negotiation, then by all means unify the URIs. This is what I do (my only JSON client though is myself, issuing AJAX requests served from other HTML parts of my web app, where I do control the Accept header).

If a client is not able to set request headers, such as a web user wanting to get the json response, they will end up with the default (presumably text/html). For this reason you may want to allow non-negotiated responses to occur under unique URIs (/foo.txt, /foo.rtf). Conventionally this is done by appending the format to the URI seperated by a dot, as if it were a filename extension (but it usually isn't, mod_rewrite does the juggling) so that old clients on platforms that need filename extensions can save the file with a meaningful one.

Most pages on my site work something like this:

Determine SQL query from request URL. (e.g. /cars?colour=black => SELECT * FROM cars WHERE colour='black')
Issue SQL query.
Determine acceptable response type from list supported by this file. This is usually HTML and HAL (i.e. JSON), though sometimes XML too. Fall back to text/html if nothing else is Acceptable.
if(HTML) spit out <HEAD> and <NAV> (considering the parameters: <h1>Black Cars</h1>)
spit out results using most acceptable response type.
This function knows how to take a SQL result object and turn it into HTTP Link headers, a stream of HTML <LINK> elements, HAL's _links key, a stream of XLink elements, an HTML <TABLE> element (with cells containing <A> elements), or a CSV file. The SQL query may return 0 rows, in which case a user-friendly message is written instead of an HTML table if that output was being used.
if(HTML) spit out <FOOTER>

This basic outline handles about 30 different resource collections in my web app, though each one has a different set of options the request URI may invoke, so the start of each differs in terms of parameter validation.

So, now I have explained all that, you can see how it might be useful to have all the specifics of each resource handled in one place, and the generics of outputting in format X or format Y handled by a common library function. It's an implementation detail which eases my life and helps me adhere to the Don't Repeat Yourself maxim.

回答4:

I definitely don't agree with the web-site == web-service approach.

Simply put, the web site should be treated as a client, just a client, that consumes the web-service and renders the data in an appropriate form for web use. Just like a mobile application is a client, just a client, consuming the same web-service and renders the data in an appropriate form for the mobile use.

The web-service is the service provider. All others are just clients; web-site, android app, iphone app,...etc.