I am writing a REST API for a service that will accept user contributed data. I would like to keep all operations completely asynchronous, this includes PUT, POST, DELETE and perhaps even GET requests. My idea is to receive the request, process it enough to ensure it is a valid request and then pass a HTTP 202 accepted response along with a url where the data will eventually be available and a token so that subsequent requests can be matched to processed data. If the request is invalid then I will send a HTTP 400.
The client will then be responsible to check the url I provided them at some time in the future and pass along the token. If the data is available I return a normal 200 or 201 but if I am still processing the request I will send another 202 indicating the processing hasn't completed. In case of errors processing the data I will send 4xx or 5xx status as necessary.
The reason I want to do this is so I can dump all valid requests into a request pool and have workers pull from the queue and process requests as they are available. Since I don't know the pool size or number of workers available I can't be certain that I can get to requests fast enough to satisfy the 30 second limit of Google App Engine.
My question is: am I perverting REST by processing requests in this manner? Browsers, for instance, seem to require immediate responses to requests. For my HTML pages I plan to respond with the structured page and then use AJAX to process the data requests.
I'm mostly interested in any opinions or experience in processing data using REST in this manner.
I think that your solution is fine, the Http status 202
is the proper response to use in this specific case indicating that the request has been accepted for processing, but the processing has not been completed.
What I would slightly change in your workflow are the Http status
of the subsequent requests.
As you said, the 202 response
should return a Location header
specifying the URL that client should use to monitor the status of its previous request.
Calling this Check-the-status-of-my-process URL, instead of returning a 202 in case of process pending, I would return:
200 OK
when the requested process is still pending. The Response should describe the pending status of the process.
201 Created
when the processing has been completed. The Response in case of GET/PUT/POST should contain the Location to the requested/created/updated resource.
Adding my two cents to an old question. My idea is similar to systempuntoout and Avi Flax's suggestions.
I agree that a HTTP 202
response is appropriate for the initial request with a redirect to another resource via a Location
header.
I think the Location
URL should probably include the token you reference to conform to common expectations of a Location
redirect. For example Location: /queue?token={unique_token}
or Location: /task/{unique_token}
.
I also think the resource used to check the status of the process should return a HTTP 200
response when the action of "checking the status" is successful (not a HTTP 202
because that implies the current request was "accepted").
However, I think when the new entity is created "checking the status" should return a HTTP 303
(See Other) response with a Location
header for the new entity once it has been created. This is more appropriate than sending a HTTP 201
because nothing was created due to the GET
request just performed to check status.
I also think the resource used to check the status should return error codes appropriately. Whenever "checking the status" is performed successfully, an appropriate success code should be returned. Errors can be handled at the application level (by checking the response body).
This is a really old question, but I would like to offer up a slightly different view of this, which I do not claim to be correct, just my view.
From the client perspective
Let's start off with the initial HTTP request. First and foremost, the request should be POST. You are sending a message to the server to create a resource. GET and PUT are not valid in this case because:
- A GET is not valid in this context because a GET is meant to obtain the resource at a specific location
- A PUT is not valid because you are not creating the request, you are asking the server to create the request.
From the service perspective
So now you are sending a POST to the server to process a request. The server has really 3 possible return values (not including the 4xx and 5xx errors):
- "201 Created" indicates that the service got the request and was able to process it immediately, or within an acceptable time period. This time period is completely up to the service design. It is up to the service developer to define this.
- "202 Accepted" indicates that the service got the request and is processing it. This is used when the service knows something is going to take a while. The other perspective is that if the service is reliant on any other asynchronous operation that it has no way of determining the outcome, then it should return the "202 Accepted" response. Finally, some service designers may simply always return "202 Accepted" regardless of how quickly it can be done.
- In some cases, you would get a "302 Found". This is usually when the service can identify a request as generating a resource that already exist (and is still valid and not in an error state) and that reusing an existing resource is acceptable. Not all services work like this: posting a comment to a thread should always create a new resources. Other services do: post a set of criteria to get a list of doctors produces the same list of doctors. If this information can be reused, then reuse it.
- With all these responses, the "Location" HTTP Header is returned to the client containing where the resource can be found. This is important and where some people tend to diverge in approach, as you will see later. If the resource can be reused with other requests, the "Location" should really be generated in a way that the same requests always generate the same URLs. This provides for a good deal of caching and reuse.
When the service has completed the request successfully, it will create the resource at the location that was returned to the client.
Now this is where I start seeing things a little different from the response above.
If the service fails to complete the request, it should still create a resource at the location that was returned to the client. This resource should indicate the reason for the failure. It much more flexible to have a resource provide failure information than trying to shoe-horn it into the HTTP protocol.
If the service gets the request for this resource before it is completed, it should return a "404 Not Found". The reason I believe that it should be a "404 Not Found" is because it really does not exist. The HTTP specifications do not say that "404 Not Found" can only be used for when a resource is never going to exist, just that it doesn't exist there now. This type of response to an asynchronous polling flow is completely correct in my opinion.
There is also the scenario of when a resource is supposed to only be there for a fixed time. For example, it may be data based on a source that is refreshed nightly. What should happen in these cases is that the resource should be removed, but an indicator should be provided to the service that it can know to return a "410 Gone" status code. This basically is telling the client that the resource was here, but is no longer available (ie: may have expired). The typical action from the client would be to resubmit the request.
From the client perspective again
When the client gets the response for it's initial POST, it gets the "Location" and makes the request to the service using that URL using a GET (again, not POST). The service will generally response with these values:
- "200 OK" indicates that the request did complete. The result of the request is returned in the content body, providing the content in the format defined by the Accept HTTP header.
- "404 Not Found" would tell the client that the request did not complete yet, the resource is not there yet, and, in this case, it should basically try again later.
- "410 Gone" would be returned in cases where the client may attempt to get the resource after a long period of time and it's not there anymore. In this case, it should simply resubmit the original query
The one thing that needs to be pointed out is that the resource that is returned is generally in a format that can define success and failure responses. The client should be able to determine from this resource if there was an error, what it was, and be able to respond accordingly.
Also, the service developer may make it so that service expires and deletes the error resource after a short period of time.
So that's my thoughts on this question. It's very late to the party, but hopefully future readers may see a slightly different view to a commonly asked question.
FWIW, Microsoft Flow uses a pattern like this.
First call returns 202 w/ Location header.
Followup calls return either:
1. If still processing --> 202 w/ a location header. The loc header can be different, which provides a way to pass state between calls (and potentially make the server stateless!).
2. If done --> 200.
Details at: https://github.com/jeffhollan/LogicAppsAsyncResponseSample