RESTful undelete

2019-03-08 18:09发布

站内文章 / 移动开发

52 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

It is a fairly common requirement to support undeletes or delayed/batched deletions for data services. What I'm wondering is how to implement this in a RESTful way. I'm torn between a few different options (none of which seems terribly attractive to me). Common across these different options, I believe, is the need for an API which returns all resource marked as deleted for a particular resource type.

Here are some options I've thought about and some of their pros/cons:

Options to mark resource as deleted:

Use HTTP DELETE to mark the resource as deleted.
Use HTTP PUT/POST to update deleted flag. This doesn't feel right since it maps what is essentially a deletion away from the HTTP DELETE method and into other HTTP methods.

Options when GET-ing resource marked for deletion:

Return HTTP Status 404 for a resource marked as deleted. Clean & transparent, but how do we tell the difference between a resource that is really deleted vs. one just marked as deleted.
Return HTTP Status 410. Provides way to tell the difference, but 410 technically says it "is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval." There may be enough wiggle room in the words "expected" and "SHOULD" here. Not sure how well 410 is supported/understood out there in clients.
Return HTTP Status 200 and include flag field indicating resource is deleted. This seems wierd since the idea of deleting it in the first place was because you actually wanted it to not appear. This pushes responsibility for filtering out deleted resources down to clients.

Options for responses which include this deleted resource:

Omit the resources makred as deleted. Clean & simple. But what if you actually want to know about deleted resources.
Include them along with field indicating that they are deleted. This pushes responsibility for filtering out deleted resources down to clients. It makes pagination tricky if you want to only page through active or deleted resources.

Options when updating a resource marked for deletion:

Use HTTP Status 404. The resource is gone right? But, how can you tell the difference between a resource marked as deleted and one actually deleted. HTTP body in 404 response could disambiguate here but then clients are left with parsing/interpreting your body to disambiguate. Maybe response header might help here? Which one? Custom header?
Use HTTP Status 409 with message about how resource must first be undeleted.

Options to undelete resource marked for deletion:

Use HTTP PUT/POST for update operation of resource and mark it as active again. This only works as long as you're not returning an HTTP 404 for the GET operation for the resource since it doesn't make since to PUT/POST to a resource that is "Not found" (404).
Use HTTP PUT/POST for creation operation of resource. The problem here is which data takes precedence? The data sent up in the create operation? Or the data that is being undeleted? filter it out of any other queries that would have returned it. Then, treat the HTTP PUT/POST that creates the resource as an undelete if the resource identifier points to a resource marked as deleted.
Separate REST path dedicated to undelete resources marked for deletion.

This is by no means an exhaustive list. I just wanted to enumerate some of the options that have been bouncing around in my head.

I know the answer to how to do this is, as usual, "it depends". What I'm curious about is what qualifications/requirements would you use to make your decision? How have you seen this implemented or implemented it yourself?

回答1:

Going by the book: RFC 2616-9.7:

  The DELETE method requests that the origin server delete the resource 
  identified by the Request-URI. This method MAY be overridden by human 
  intervention (or other means) on the origin server. The client cannot
  be guaranteed that the operation has been carried out, even if the 
  status code returned from the origin server indicates that the action
  has  been completed successfully. However, the server SHOULD NOT 
  indicate success unless, at the time the response is given, if it intends
  to delete the resource or move it to an inaccessible location.

When you DELETE a resource the server should mark the resource for deletion on it's side. It doesn't really have to delete the resource, it just can't give any guarantee that the operation has been carried out. Even so, the server shouldn't say it's been deleted when it hasn't.

  A successful response SHOULD be 200 (OK) if the response includes an entity
  describing the status, 202 (Accepted) if the action has not yet been enacted,
  or 204 (No Content) if the action has been enacted but the response does not
  include an entity.

If the operation is delayed, send a 202 and an entity body describing the result of the action. (Think of a poll-able "task" representing the server's deferred deletion of the resource; It could theoretically leave it forever in that state.) All it has to do is prevent the client from retrieving it again in it's original form. Use a 410 for the response code, and when the "task" finishes or the server otherwise deletes the resource, return a 404.

However, if a DELETE's semantics don't make sense for the resource in question, perhaps it's not a deletion you're looking for, but an addition state transition that alters the resource state but keeps it accessible? In that case, use a PUT/PATCH to update the resource and be done.

回答2:

I think the most RESTful way to solve this is to use HTTP PUT to mark the resource for deletion (and undelete) and then use HTTP DELETE to permanently delete the resource. To get a list of resources marked for deletion I would use a parameter in the HTTP GET request eg. ?state=markedForDeletion. If you requests a resource marked for deletion without the parameter, I think you should return a "404 Not Found" status.

回答3:

The Short Version

You cannot RESTfully undelete a resource using any method on it's original URI - it's illogical, because any operation attempted on a resource that has been deleted should return either a 404 or a 410. While this is not explicitly stated in the spec, it's strongly implied in the definition of the DELETE method 1 (emphasis added):

In effect, this method is similar to the rm command in UNIX: it expresses a deletion operation on the URI mapping of the origin server rather than an expectation that the previously associated information be deleted.

In other words, when you've DELETEd a resource, the server no longer maps that URI to that data. So you can't PUT or POST to it to make an update like "mark this as undeleted" etc. (Remember that a resource is defined as a mapping between a URI and some underlying data).

Some Solutions

Since it's explicitly stated that the underlying data is not necessarily deleted, it doesn't preclude the server making a new URI mapping as part of the DELETE implementation, thereby effectively making a backup copy that can be restored later.

You could have a "/deleted/" collection that contains all the deleted items - but how would you actually perform the undelete? Perhaps simplest RESTful way is to have the client retrieve the item with GET, and then POST it to the desired URL.

What if you need to be able to restore the deleted item to it's original location? If you're using a media type that supports it, you could include the original URI in the response to a GET from the /deleted/ collection. The client could then use it to POST. Such a response might look like this in JSON:

{
    "original-url":"/some/place/this/was/deleted/from",
    "body":<base64 encoded body>
}

The client could then POST that body to that URI to perform an undelete.

Alternatively, if your resource definition allows the concept of moving (by updating a "location" property or something like that) then you can do a partial update and avoid the round trip of the entire object. Or, do what most people do and implement an RPC-like operation to tell the server to move the resource! UnRESTful, yes but it will probably work fine in most situations.

How You Decide These Things

Regarding the question of how you decide these things: you have to consider what delete means in the context of your application, and why you want it. In a lot of applications, nothing ever gets deleted, and "delete" really just means "exclude this item from all further queries/listings etc. unless I explicitly undelete it". So, it's really just a piece of metadata, or a move operation. In that case, why bother with HTTP DELETE? One reason might be if you want a 2-tiered delete - a soft or temporary version that's undoable, and a hard/permanent version that's, well...not.

Absent any specific application context, I'd be inclined to implement them like this:

I don't want to see this resource any longer, for my convenience: POST a partial update to mark the resource as "temporarily deleted"

I don't want anyone to be able to reach this resource any longer because it's embarrassing/incriminating/costs me money/etc: HTTP DELETE

The next question to consider is: should the permanent delete only unmap the URI permanently, so that no one can link to it any longer, or is it necessary to purge the underlying data too? Obviously if you keep the data, then an administrator could restore even a "permanently" deleted resource (not through any RESTful interface however). The downside of this is that if the owner of the data really wants it purged, then an admin has to do that outside the REST interface.

回答4:

"Deleted" (trashed) items also may be considered as a resource, right? Then we can access this resource in one of these ways (e.g. for a deleted a user):

PATCH deleted_users/{id}
PATCH trash/users/{id}
PATCH deleted/users/{id}

or some people may think this is more restful way:

PATCH deleted/{id}?type=users

and in payload goes something like this:

{ deleted_at: null }

回答5:

I'm also running in this problem and I've been looking on the Internet for what feels like the best solution. Since none of the main answers I can find seem correct to me, here is my own research results.

Others are right that the DELETE is the way to go. You could include a flag to determine whether it's immediately a permanent DELETE or a move to the trashcan (and probably only administrators can do an immediate permanent DELETE.)

DELETE /api/1/book/33
DELETE /api/1/book/33?permanent

The backend can then mark the book as deleted. Assuming you have an SQL database, it could be something such as:

UPDATE books SET status = 'deleted' WHERE book_id = 33;

As mentioned by others, once the DELETE is done, a GET of the collection does not return that item. In terms of SQL, this means you must make sure not to return an item with a status of deleted.

SELECT * FROM books WHERE status <> 'deleted';

Also, when you do a GET /api/1/book/33, you must return a 404 or 410. One problem with 410 is that it means Gone Forever (at least that's my understanding of that error code,) so I would return 404 as long as the item exists but is marked as 'deleted' and 410 once it was permanently removed.

Now to undelete, the correct way is to PATCH. Contrary to a PUT which is used to update an item, the PATCH is expected to be an operation on an item. From what I can see, the operation is expected to be in the payload. For that to work, the resource needs to be accessible in some way. As someone else suggested, you can provide a trashcan area where the book would appear once deleted. Something like this would work to list books that were put in the trashcan:

GET /api/1/trashcan/books

[{"path":"/api/1/trashcan/books/33"}]

So, the resulting list would now include book number 33, which you can then PATCH with an operation such as:

PATCH /api/1/trashcan/books/33

{
    "operation": "undelete"
}

If you'd like to make the operation more versatile, you could use something such as:

PATCH /api/1/trashcan/books/33

{
    "operation": "move",
    "new-path": "/api/1/books/33"
}

Then the "move" could be used for other changes of URL wherever possible in your interface. (I am working on a CMS where the path to a page is in one table called tree, and each page is in another table called page and has an identifier. I can change the path of a page by moving it between paths in my tree table! This is where a PATCH is very useful.)

Unfortunately, the RFCs do not clearly define the PATCH, only that it is to be used with an operation as shown above, opposed to a PUT which accepts a payload representing a new version, possibly partial, of the targeted item:

PUT /api/1/books/33

{
    "title": "New Title Here"
}

Whereas the corresponding PATCH (if you were to support both) would be:

PATCH /api/1/books/33

{
    "operation": "replace",
    "field": "title",
    "value": "New Title Here"
}

I think that supporting that many PATCH operations would be crazy. But I think that a few good examples give a better idea of why PATCH is the correct solution.

You can think of it as: using patch is to change a virtual field or run a complex operation such as a move which would otherwise require a GET, POST, DELETE (and that's assuming the DELETE is immediate and you could get errors and end up with a partial move...) In a way, the PATCH is similar to having any number of methods. An UNDELETE or MOVE method would work in a similar way, but the RFC clearly says there is a set of standardized methods and you should certainly stick to them and the PATCH gives you plenty of room to not have to add your own methods. Although I did not see anything in the specs saying you should not add your own methods. If you do, though, make sure to clearly document them.