RESTful API and bulk operations

2019-04-08 07:50发布

问题:

I have a middle tier which performs CRUD operations on a shared database. When I converted the product to .NET Core I thought I'd also look at using REST for the API as CRUD is supposed to be what it does well. It seems like REST is a great solution for single record operations, but what happens when I want to delete, say, 1,000 records?

Every professional multi-user application is going to have some concept of Optimistic Concurrency checking: you can't have one user wipe out the work of another user without some feedback. As I understand it, REST handles this with the HTTP ETag header record. If the ETag send by the client doesn't match the server's tag, then you issue a 412 Precondition Failed. So far, so good. But what do I use when I want to delete 1,000 records? The back-and-forth time for 1,000 individual calls is considerable, so how would REST handle a batch operation that involved Optimistic Concurrency?

回答1:

RESTs focus is on resources and the decoupling of clients from servers, it is though not a simple CRUD architecture or protocol. While CRUD and REST seem to be very similar, managing resources through REST principles can often also have sideeffects. Therefore, describing REST as simple CRUD thing is an oversimplification.

In regards to batch-processing of REST resources, the underlying protocol (most often HTTP) does define the capabilities that can be used. HTTP defines a couple of operations that can be used to modify multiple resources.

POST is the all-purpose, swiss-army knife of the protocol and can be used to literally manage resources to your likings. As the semantics are defined by the developer you can use it to create, update or delete multiple resources at once.

PUT has the semantics of replacing the state of a resource obtainable at a given URI with the payload body of the request. If you send a PUT request to a "list"-resource and the payload defines a list of entries, you can achieve a batch operation as well.

The fundamental difference between the POST and PUT methods is highlighted by the different intent for the enclosed representation. The target resource in a POST request is intended to handle the enclosed representation according to the resource's own semantics, whereas the enclosed representation in a PUT request is defined as replacing the state of the target resource.

...

A PUT request applied to the target resource can have side effects on other resources. For example, an article might have a URI for identifying "the current version" (a resource) that is separate from the URIs identifying each particular version (different resources that at one point shared the same state as the current version resource). A successful PUT request on "the current version" URI might therefore create a new version resource in addition to changing the state of the target resource, and might also cause links to be added between the related resources. (Source)

PATCH (RFC 5789) is not yet included in the HTTP protocol, though supported by plenty frameworks. It is primarily used to alter multiple resources at once or to perform partial updates on resources, which PUT is also able to achieve if the updated part is a sub-resource of some other resource; in that case it has the effect of a partial update on the outer resource.

It is important to know that a PATCH request contains the necessary steps a server has to fulfill to transform a resource to its intended state. A client therefore has to grab the current state and calculate the necessary steps needed for the transformation beforehand. A very informative blog post on this topic is Don't patch like an idiot. Here JSON Patch (RFC) is a JSON based media type that visualizes the PATCH concept clearly. A patch request has to be applied either fully (each operation defined in the patch request) or applied not at all. It therefore requires a transaction scoped handling and a roll back in case any of the operations failed.

Conditional requests like ETag and IfModifiedSince headers are defined in RFC 7232 and can be used in HTTP requests to perform the modifications only if the request is applied on the most recent version of resource and therefore correlates to an optimistic locking in (distributed) databases.

So far, so good. But what do I use when I want to delete 1,000 records?

This depends on what framework you'll use. If it supports PATCH I clearly vote for PATCH. In case it does not, you are probably safer to use POST than PUT as of the very restrictive semantics PUT has, as the semantics are clearly defined by you then. In case of a batch-delete, PUT can also be used by targeting the collection resource with an empty body which has the result of removing any items in the collection and therefore clearing the whole collection. If some of the items should remain in the collection though, PATCH or POST are probably more easy to use.



回答2:

If I understand correctly, you want optimistic concurrency for each record individually. That is, each record is to be deleted only if its state matches the client’s expectation. (If you only want to assert the entire collection’s state, then If-Match and 412 are sufficient.)

Roman Vottner’s answer does an excellent job of explaining the HTTP methods involved, but I’ll try to fill in some details.

Caveat emptor

When we talk about “how would REST handle” this or that, you understand that technically you can use HTTP as a transport for any operation in any way that suits you.

So when you’re asking about REST, I’m assuming you’re interested in a uniform interface — an approach that could theoretically be used by a range of various clients and servers.

But the key word there is “theoretically”. For example, once you define your own media type (your own JSON structure), a lot of the uniformity goes down the drain, because a client would have to be coded against your specific API anyway, at which point you can ask it to jump through any hoops you want.

But if you’re still interested in salvaging as much of the uniformity as possible, then read on.

All or nothing

If you want an all-or-nothing operation, which fails entirely if any of the individual preconditions fails, then, as Roman suggests, you can use PATCH with the JSON Patch format. For this, you need a conceptual representation of your collection as a single JSON object, to which the patch is to be applied.

For example, suppose you have resources like /my/collection/1, /my/collection/4, and so on. You could represent /my/collection/ as:

{
    "resources": {
        "1": {
            "href": "1",
            "etag": "\"BRkDVtYw\"",
            "name": "Foo Bar",
            "price": 1234.5,
            ...
        },
        "4": {
            "href": "4",
            "etag": "\"RCi8knuN\"",
            "name": "Baz Qux",
            "price": 2345.6,
            ...
        },
        ...
    }
}

Here, "1" and "4" are URLs relative to /my/collection/. You could use domain-specific IDs instead, but proper REST operates in terms of opaque URLs.

The standards don’t require you to actually serve this representation on GET /my/collection/, but if you do support such a request, then you should use that representation. Anyway, to this structure you can apply the following JSON patch:

PATCH /my/collection/ HTTP/1.1
Content-Type: application/json-patch+json

[
    {"op": "test", "path": "/resources/1/etag", "value": "\"BRkDVtYw\""},
    {"op": "remove", "path": "/resources/1"},
    {"op": "test", "path": "/resources/4/etag", "value": "\"RCi8knuN\""},
    {"op": "remove", "path": "/resources/4"},
    ...
]

Here, path is not a URL path, it’s a JSON pointer into the above representation.

If all patch operations succeed, then you respond with a successful status code like 204 (No Content) or 200 (OK).

If any of the ETag test operations fails, you respond with 409 (Conflict). You should not respond with 412 (Precondition Failed) in this case, because there is no precondition (like If-Match) on the request itself.

If anything else goes wrong, you respond with other appropriate status codes: see RFC 5789 § 2.2 and RFC 7231 § 6.6.

Mixed result

If you don’t want “all-or-nothing” semantics, then I’m not aware of any standardized solution. As Roman notes, you cannot use the PATCH method in this case, but you can use POST with a custom media type (RFC 6838 § 3.4). It could look like this:

POST /my/collection/ HTTP/1.1
Content-Type: application/x.my-patch+json
Accept: application/x.my-patch-results+json

{
    "delete": [
        {"href": "1", "if-match": "\"BRkDVtYw\""},
        {"href": "4", "if-match": "\"RCi8knuN\""},
        ...
    ]
}

You can respond to such a request with 200 (OK), regardless of whether any of the individual deletes succeeded. Another option would be 207 (Multi-Status), but I don’t see any benefits to it in this case, and it’s not widely used outside of WebDAV, so Postel’s law would suggest not going there.

HTTP/1.1 200 OK
Content-Type: application/x.my-patch-results+json

{
    "delete": [
        {"href": "1", "success": true},
        {"href": "4", "success": false, "error": {...}},
        ...
    ]
}

Of course, if the patch was invalid in the first place, you should instead respond with 415 (Unsupported Media Type) or 422 (Unprocessable Entity) as appropriate.

Another angle

The back-and-forth time for 1,000 individual calls is considerable

It is in HTTP/1.1. But if you can use HTTP/2 — which has much better support for concurrent requests, as well as much smaller network overhead per request — then 1000 individual requests might work out just fine for you.