WCF data serialization : can it go faster?

2020-03-31 04:14发布

问题:

This question is sort of a sequel to that question.

When we want to build a WCF service which works with some kind of data, it's natural that we want it to be fast and efficient. In order to achieve that, we have to make sure all segments of data road trip work as fast as they could, from data storage back-end such as SQL Server, to a WCF client who requested that data.

While seeking for an answer on that previous question, we have learned, thanks to Slauma and others who contributed through comments, that the time consuming part of Entity Framework's (first) large query is object materialization and attaching entities to the context when the result from the database is returned. We have seen that everything works much faster on subsequent queries.

Assuming those large queries are used as read-only operations, we came to a conclusion that we could set EF MergeOption to NoTracking, yielding better first query performance. What we have done with NoTracking was telling EF to create separate object for each record retrieved from the database - even when they have the same key. This will cause additional processing if we have .Include() statement in our query, which will lead to data with much larger size being returned.

The data may be so big that we could easily ask ourselves - did we really help our cause by using NoTracking option, even if we made the query faster (and maybe only the first one, depending on the number of .Include() statements, because subsequent queries without NoTracking option with multiple .Include() statements run faster simply because NoTracking option causes a lot more objects to be created when data returns from the server)?

The biggest problem is how to efficiently serialize this amount of data - and deserialize it on the client. With serialization already as slow as it is (I am using DataContractSerializer with PreserveObjectReferences set to true because I am sending EF 4.x generated POCOs to my client and vice versa), do we want to generate even more data (thanks to NoTracking)? To be honest, I haven't seen the data originated from the query with NoTracking option on ~11.000 objects not including navigation properties obtained via .Include(), arriving at the client side yet. Last time I tried to pull this off, the timeout of 00:10:00 was triggered (!)

So if you are still reading this wall of text, you tell me how to solve this situation. Which serializer to use in order to achieve acceptable results? Currently, if I don't use the NoTracking option, the serialization, transport and deserialization of ~11.000, via wsHttpBinding-like custom binding on the local machine take ~5 seconds. What's scary to me is that this large table is most likely going to contain ~500.000 records eventually.

回答1:

Have you considered creating a View Model for your object and doing a projection in the select statement. That should be a lot faster so:

 var result = from person in DB.Entities.Persons
    .Include("District")
    .Include("District.City")
    .Include("District.City.State")
    .Include("Nationality")
    select new PersonViewModel()
    {
        Name = person.Name, 
        City = person.District.City, 
        State = person.District.City.State
        Nationality = person.Nationality.Name
    };

This would require you to create a ViewModel class to hold the flattened data for the PersonViewModel.

You might be able to further speed up things by creating a database view and letting Entity Framework select directly from there.

If you rally want the front-end to populate a grid with 500.000 records, then I'd remove the webservice layer altogether and use a DataReader to speed up the process. Entity Framework and WCF aren't suitable for transforming the data at a proper performance. What you're basically doing here is:

Database -> TDS -> .NET objects -> XML -> Plain text -> XML -> .NET Objects -> UI

While this could easily be reduced to:

Database -> TDS -> UI

Then use EntityFramwork to handle the changes to the entities in your business logic. This is in line with the Command and Query Separation pattern. Use a technology suitable for high performance querying of data and link that directly to your app. Then use a command strategy to implement your business logic.

OData services might also provide a better way to link your UI directly to the data, as it can be used to quickly query your data allowing you to implement quick filtering without the user really noticing.

If the security settings are prohibiting direct querying through OData or direct access to the SQL database, consider materializing the objects yourself. Select the data directly from either a view or a query and use a IDataReader to directly populate your ViewModel. That will probably give you the highest performance.

There are a lot of alternatives to Entity Framework created especially because EF isn't cut out for large datasets. See FluentData DapperDotNet, Massive or PetaPoco. You might want to use these side-by-side with entity Framework to handle your large, flat data queries.



回答2:

I use Json.Net's implementation of Bson in my RIA application. More info here.

I yield return an IEnumerable, as I read from the database and serialize the rows. I find the speed to be acceptable and I return Entities with roughly 20 properties. This approach should minimize the concurrent memory use on the server.



回答3:

Based on what I have gathered by looking at various reviews and performance benchmarks, I would choose protobuf-net as a serializer. It's just a matter of design whether it can be plugged into my service configuration. More info about that here.

Although not completely an answer to this question, jessehouwing had the best answer and I am marking it as accepted.