Transactional batch processing with OData

2019-02-03 10:45发布

问题:

Working with Web API OData, I have $batch processing working, however, the persistence to the database is not Transactional. If I include multiple requests in a Changeset in my request, and one of those items fails, the other still completes, because each separate call to the controller has it's own DbContext.

for example, if I submit a Batch with two change sets:

Batch 1 - ChangeSet 1 - - Patch valid object - - Patch invalid object - End Changeset 1 - ChangeSet 2 - - Insert Valid Object - End ChangeSet 2 End Batch

I would expect that the first valid patch would be rolled back, as the change set could not be completed in its entirety, however, since each call gets its own DbContext, the first Patch is committed, the second is not, and the insert is committed.

Is there a standard way to support transactions through a $batch request with OData?

回答1:

The following link shows the Web API OData implementation that is required to process the changeset in transactions. You are correct that the default batch handler does not do this for you:

http://aspnet.codeplex.com/SourceControl/latest#Samples/WebApi/OData/v3/ODataEFBatchSample/

UPDATE The original link seems to have gone away - the following link includes similar logic (and for v4) for transactional processing:

https://damienbod.com/2014/08/14/web-api-odata-v4-batching-part-10/



回答2:

  1. The theory: let's make sure we're talking about the same thing.
  2. In practice: addressing the problem as far as I can (no definitive answer) could.
  3. In practice, really (update): a canonical way of implementing the backend-specific parts.
  4. Wait, does it solve my problem?: let's not forget that the implementation (3) is bound by the specification (1).
  5. Alternatively: the usual "do you really need it?" (no definitive answer).

The theory

For the record, here is what the OData spec has to say about it (emphasis mine):

All operations in a change set represent a single change unit so a service MUST successfully process and apply all the requests in the change set or else apply none of them. It is up to the service implementation to define rollback semantics to undo any requests within a change set that may have been applied before another request in that same change set failed and thereby apply this all-or-nothing requirement. The service MAY execute the requests within a change set in any order and MAY return the responses to the individual requests in any order. (...)

http://docs.oasis-open.org/odata/odata/v4.0/cos01/part1-protocol/odata-v4.0-cos01-part1-protocol.html#_Toc372793753

This is V4, which barely updates V3 regarding Batch Requests, so the same considerations apply for V3 services AFAIK.

To understand this, you need a tiny bit of background:

  • Batch requests are sets of ordered requests and change sets.
  • Change Sets themselves are atomic units of work consisting of sets of unordered requests, though these requests can only be Data Modification requests (POST, PUT, PATCH, DELETE, but not GET) or Action Invocation requests.

You might raise an eyebrow at the fact that requests within change sets are unordered, and quite frankly I don't have proper rationale to offer. The examples in the spec clearly show requests referencing each other, which would imply that an order in which to process them must be deduced. In reality, my guess is that change sets must really be thought of as single requests themselves (hence the atomic requirement) that are parsed together and are possibly collapsed into a single backend operation (depending on the backend of course). In most SQL databases, we can reasonably start a transaction and apply each subrequest in a certain order defined by their interdependencies, but for some other backends, it may be required that these subrequests be mangled and made sense of together before sending any change to the platters. This would explain why they aren't required to be applied in order (the very notion might not make sense to some backends).

An implication of that interpretation is that all your change sets must be logically consistent on their own; for example you can't have a PUT and a PATCH that touch the same properties on the same change set. This would be ambiguous. It's thus the responsibility of the client to merge operations as efficiently as possible before sending the requests to the server. This should always be possible.

(I'd love someone to confirm this.) I'm now fairly confident that this is the correct interpretation.

While this may seem like an obvious good practice, it's not generally how people think of batch processing. I stress again that all of this applies to requests within change sets, not requests and change sets within batch requests (which are ordered and work pretty much as you'd expect, minus their non-atomic/non-transactional nature).

In practice

To come back to your question, which is specific to ASP.NET Web API, it seems they claim full support of OData batch requests. More information here. It also seems true that, as you say, a new controller instance is created for each subrequest (well, I take your word for it), which in turn brings a new context and breaks the atomicity requirement. So, who's right?

Well, as you rightly point out too, if you're going to have SaveChanges calls in your handlers, no amount of framework hackery will help much. It looks like you're supposed to handle these subrequests yourself with the considerations I outlined above (looking out for inconsistent change sets). Quite obviously, you'd need to (1) detect that you're processing a subrequest that is part of a changeset (so that you can conditionally commit) and (2) keep state between invocations.

Update: See next section for how to do (2) while keeping controllers oblivious to the functionality (no need for (1)). The next two paragraphs may still be of interest if you want more context on the problems that are solved by the HttpMessageHandler solution.

I don't know if you can detect whether you're in a changeset or not (1) with the current APIs they provide. I don't know if you can force ASP.NET to keep the controller alive for (2) either. What you could do for the latter however (if you can't keep it alive) is to keep a reference to the context elsewhere (for example in some kind of session state Request.Properties) and reuse it conditionally (update: or unconditionally if you manage the transaction at a higher-level, see below). I realize this is probably not as helpful as you might have hoped, but at least now you should have the right questions to direct to the developers/documentation writers of your implementation.

Dangerously rambling: Instead of conditionally calling SaveChanges, you could conditionally create and terminate a TransactionScope for every changeset. This doesn't remove the need for (1) or (2), just another way of doing things. It sort of follows that the framework could technically implement this automatically (as long as the same controller instance can be reused), but without knowing the internals enough I wouldn't revisit my statement that the framework doesn't have enough to go with to do everything itself just yet. After all, the semantics of TransactionScope might be too specific, irrelevant or even undesired for certain backends.

Update: This is indeed what the proper way of doing things look like. The next section shows a sample implementation that uses the Entity Framework explicit transaction API instead of TransactionScope, but this has the same end-result. Although I feel there are ways to make a generic Entity Framework implementation, currently ASP.NET doesn't provide any EF-specific functionality so you're required to implement this yourself. If you ever extract your code to make it reusable, please do share it outside of the ASP.NET project if you can (or convince the ASP.NET team that they should include it in their tree).

In practice, really (update)

See snow_FFFFFF's helpful answer, which references a sample project.

To put it in context of this answer, it shows how to use an HttpMessageHandler to implement requirement #2 which I outlined above (keeping state between invocations of the controllers within a single request). This works by hooking at a higher-level than controllers, and splitting the request in multiple "subrequests", all the while keeping state oblivious to the controllers (the transactions) and even exposing state to the controllers (the Entity Framework context, in this case via HttpRequestMessage.Properties). The controllers happily process each subrequest without knowing if they are normal requests, part of a batch request, or even part of a changeset. All they need to do is use the Entity Framework context in the properties of the request instead of using their own.

Note that you actually have a lot of built-in support to achieve this. This implementation builds on top of the DefaultODataBatchHandler, which builds on top of the ODataBatchHandler code, which builds on top of the HttpBatchHandler code, which is an HttpMessageHandler. The relevant requests are explicitly routed to that handler using Routes.MapODataServiceRoute.

How does this implementation map to the theory? Quite well, actually. You can see that each subrequest is either sent to be processed as-is by the relevant controller if it is an "operation" (normal request), or handled by more specific code if it is a changeset. At this level, they are processed in order, but not atomically.

The changeset handling code however does indeed wrap each of its own subrequests in a transaction (one transaction for each changeset). While the code could at this point try to figure out the order in which to execute statements in the transaction by looking at the Content-ID headers of each subrequest to build a dependency graph, this implementation takes the more straightforward approach of requiring the client to order these subrequests in the right order itself, which is fair enough.

Wait, does it solve my problem?

If you can wrap all your operations in a single changeset, then yes, the request will be transactional. If you can't, then you must modify this implementation so that it wraps the entire batch in a single transaction. While the specification supposedly doesn't preclude this, there are obvious performance considerations to take into account. You could also add a non-standard HTTP header to flag whether you want the batch request to be transactional or not, and have your implementation act accordingly.

In any case, this wouldn't be standard, and you couldn't count on it if you ever wanted to use other OData servers in an interoperable manner. To fix this, you need to argue for optional atomic batch requests to the OData committee at OASIS.

Alternatively

If you can't find a way to branch code when processing a changeset, or you can't convince the developers to provide you with a way to do so, or you can't keep changeset-specific state in any satisfactory way, then it looks like you must [you may alternatively want to] expose a brand new HTTP resource with semantics specific to the operation you need to perform.

You probably know this, and this is likely what you're trying to avoid, but this involves using DTOs (Data Transfer Objects) to populate with the data in the requests. You then interpret these DTOs to manipulate your entities within a single handler controller action and hence with full control over the atomicity of the resulting operations.

Note that some people actually prefer this approach (more process-oriented, less data-oriented), though it can be very difficult to model. There's no right answer, it always depends on the domain and use-cases, and it's easy to fall into traps that would make your API not very RESTful. It's the art of API design. Unrelated: The same remarks can be said about data modeling, which some people actually find harder. YMMV.

Summary

There's a few approaches to explore, some information to retrieve from the developers a canonical implementation technique to use, an opportunity to create a generic Entity Framework implementation, and a non-generic alternative.

It'd be nice if you could update this thread as you gather answers elsewhere (well, if you feel motivated enough) and with what you eventually decide to do, as it seems like something many people would love to have some kind of definitive guidance for.

Good luck ;).



回答3:

There should only be one DbContext for the OData batch request. Both WCF Data Services and HTTP Web API support OData batch scenario and handle it in a transactional manner. You can check this example: http://blogs.msdn.com/b/webdev/archive/2013/11/01/introducing-batch-support-in-web-api-and-web-api-odata.aspx



回答4:

I used the same from V3 of the Odata Samples, I saw that my transaction.rollback was called but the data did not rollback. something is lacking but I can't work out what. this may be an issue with having each Odata call using save changes and do they actually see the transaction as in scope. we may need a guru from the Entity Framework team to help solve this one.



回答5:

I am somewhat new to using OData and Web API. I am on a path to learning myself, so take my response for what its worth to you.

Edit - Ever so true. I just learned about the TransactionScope class and decided much of what I posted is wrong. So, I am updating in favor of a better solution.

This question is rather old too, and since then ASP.Net Core has come along, so some changes will be necessary depending on your target. I'm only posting a response for future Gogglers who landed here like myself :-)

A few points I'd like to make before moving on:

  • The original question posited that each controller called received its own DbContext. This isn't true. The DBContext lifetime is scoped to the entire request. Review Dependency lifetime in ASP.NET Core for further details. I suspect that the original poster is experiencing issues because each sub-request in the batch is invoking its assigned controller method, and each method is calling DbContext.SaveChanges() individually - causing that unit of work to be committed.
  • The original question also asked if there is a "standard". I have no idea if what I am about to propose is anything like someone would consider a "standard", but it works for me.
  • I am making assumptions about the original question that forced me to down vote tne's response as not useful. My understanding of the question comes from the basis of performing database transactions, i.e. (pseudo-code for SQL expected):

    BEGIN TRAN
        DO SOMETHING
        DO MORE THINGS
        DO EVEN MORE THINGS
    IF FAILURES OCCURRED ROLLBACK EVERYTHING.  OTHERWISE, COMMIT EVERYTHING.
    

    This is a reasonable request that I would expect OData to be able to perform with a single POST operation to [base URL]/odata/$batch.

Batch Execution Order Concerns

For our purposes, we may or may not necessarily care what order work is done against the DbContext. We definitely care though that the work being performed is being done as part of a batch. We want it to all succeed or all be rolled back in the database(s) being updated.

If you are using old school Web API (in other words, prior to ASP.Net Core), then your batch handler class is likely the DefaultHttpBatchHandler class. According to Microsoft's documentation here Introducing batch support in Web API and Web API OData , batch transactions using the DefaultHttpBatchHandler in OData are sequential by default. It has an ExecutionOrder property that can be set to change this behavior so that operations are performed concurrently.

If you are using ASP.Net Core, it appears we have two options:

  • If your batch operation is using the "old school" payload format, it appears that batch operations are performed sequentially by default (assuming I am interpreting the source code correctly).
  • ASP.Net Core provides a new option though. A new DefaultODataBatchHandler has replaced the old DefaultHttpBatchHandler class. Support for ExecutionOrder has been dropped in favor of adopting a model where metadata in the payload communicates whether what batch operations should happen in order and/or can be executed concurrently. To utilize this feature, the request payload Content-Type is changed to application/json and the payload itself is in JSON format (see below). Flow control is established within the payload by adding dependency and group directives to control execution order so that batch requests can be split into multiple groups of individual requests that can be executed asynchronously and in parallel where no dependencies exist, or in order where dependencies do exist. We can take advantage of this fact and simply create "Id", "atomicityGroup", and "DependsOn" tags in out payload to ensure operations are performed in the appropriate order.

Transaction Control

As stated previously, your code is likely using either the DefaultHttpBatchHandler class or the DefaultODataBatchHandler class. In either case, these classes aren't sealed and we can easily derive from them to wrap the work being done in a TransactionScope. By default, if no unhandled exceptions occurred within the scope, the transaction is committed when it is disposed. Otherwise, it is rolled back:

/// <summary>
/// An OData Batch Handler derived from <see cref="DefaultODataBatchHandler"/> that wraps the work being done 
/// in a <see cref="TransactionScope"/> so that if any errors occur, the entire unit of work is rolled back.
/// </summary>
public class TransactionedODataBatchHandler : DefaultODataBatchHandler
{
    public override async Task ProcessBatchAsync(HttpContext context, RequestDelegate nextHandler)
    {
        using (TransactionScope scope = new TransactionScope( TransactionScopeAsyncFlowOption.Enabled))
        {
            await base.ProcessBatchAsync(context, nextHandler);
        }
    }
}

Just replace your default class with an instance of this one and you are good to go!

routeBuilder.MapODataServiceRoute("ODataRoutes", "odata", 
  modelBuilder.GetEdmModel(app.ApplicationServices),
  new TransactionedODataBatchHandler());

Contolling Execution Order in the ASP.Net Core POST to Batch Payload

The payload for the ASP.Net Core batch handler uses "Id", "atomicityGroup", and "DependsOn" tags to control execution order of the sub-requests. We also gain a benefit in that the boundary parameter on the Content-Type header is not necessary as it was in prior versions:

    HEADER

    Content-Type: application/json

    BODY

    {
        "requests": [
            {
                "method": "POST",
                "id": "PIG1",
                "url": "http://localhost:50548/odata/DoSomeWork",
                "headers": {
                    "content-type": "application/json; odata.metadata=minimal; odata.streaming=true",
                    "odata-version": "4.0"
                },
                "body": { "message": "Went to market and had roast beef" }
            },
            {
                "method": "POST",
                "id": "PIG2",
                "dependsOn": [ "PIG1" ],
                "url": "http://localhost:50548/odata/DoSomeWork",
                "headers": {
                    "content-type": "application/json; odata.metadata=minimal; odata.streaming=true",
                    "odata-version": "4.0"
                },
                "body": { "message": "Stayed home, stared longingly at the roast beef, and remained famished" }
            },
            {
                "method": "POST",
                "id": "PIG3",
                "dependsOn": [ "PIG2" ],
                "url": "http://localhost:50548/odata/DoSomeWork",
                "headers": {
                    "content-type": "application/json; odata.metadata=minimal; odata.streaming=true",
                    "odata-version": "4.0"
                },
                "body": { "message": "Did not play nice with the others and did his own thing" }
            },
            {
                "method": "POST",
                "id": "TEnd",
                "dependsOn": [ "PIG1", "PIG2", "PIG3" ],
                "url": "http://localhost:50548/odata/HuffAndPuff",
                "headers": {
                    "content-type": "application/json; odata.metadata=minimal; odata.streaming=true",
                    "odata-version": "4.0"
                }
            }
        ]
    }

And that's pretty much it. With the batch operations being wrapped in a TransactionScope, if anything fails, it all gets rolled back.