可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
We have a .NET Web Role hosted on Windows Azure that only serves a REST API with only a hand few web methods.
API is used rather aggressively by other cloud hosted apps (not browsers). Each method is stateless which enable direct scaling out, and typically interacts with the Blob or Table Storage.
Then contrary to most classical API, the amount of data uploaded to the API is typically much larger than the data downloaded from the API. Then, the size the average message is typically quite big as well (i.e. above 100kB).
So far, we are using WCF on top of ASP.NET Forms with POX messages (Plain Old Xml). The front-end performance are not very good, culprits are:
- XML is verbose ==> bandwidth limitation.
- ASP.NET + WCF + WcfRestContrib slow to parse/serialize messages ==> CPU limitation.
I am wondering what is the best strategy to achieve the highest possible front-end performance to reduce the number of VMs needed to support the workload.
Possible strategies that I am considering:
- Discard XML in favor of ProtoBuf.
- Add upstream GZip compression (classical HTTP compression only applies downstream).
- Discard WCF entirely in favor of raw HttpHandlers.
Does anyone has benchmarked the various alternatives to achieve the most of each Azure VM for such usage?
Ps: Implicitly referring the Lokad Forecasting API but tried to phrase the question in a more general way.
回答1:
In your POCs, I think you can remove Azure from the equation as you test through some of the scenarios.
If this is truly bandwidth, compression is certainly an option, but it can be problematic if this web service will be opened up to the "public" rather than simply used by applications under your control. This is especially true in a heterogenous environment.
A less verbose format is an option, as long as you have a good means of RESTfully communicating failures due to bad formatting. XML makes this very easy. Lacking experience in ProtoBuf, it does appear to have some safety in this area, so it could be a very good option if bandwidth is your problem and may solve the speed of parsing issue. I would POC it outside of Azure first and then put it in.
I would only run the raw HttpHandler direction if you have evidence WCF overhead is an issue. Azure is hard enough to debug with so much being in config that I am not convinced adding the additional issue of raw HttpHandlers is the proper direction to go.
回答2:
Is your XML being serialized via reflection (ie. using attributes and so forth)? If so, then protobuf-net stands to be much, much faster.
In fact, though, even if your XML serialization is customized using explicit getter and setter Func<>
s, you can still see some significant gain with protobuf-net. In our case, depending on the size and content of the objects being serialized, we saw 5-15% speed increases in serialization times.
Using protobuf-net will also provide a bump to available bandwidth, though that will depend on your content to a large extent.
Our system sounds pretty different from yours, but FWIW we find that WCF itself has an almost imperceptibly low overhead compared to the rest of the flow. A profiler like dotTrace might help identify just where you can save once you've switched to protobufs.
回答3:
Is the size of the messages your service is receiving so big because there is a large amount of data in the message or because they contain files?
If it is the first case, then ProtoBuf does indeed seem like a very good option.
If the message size is big because it embeds files, then one strategy I have been using with success is creating two different architectures for your service methods: one for methods that upload and download files and another one for methods that only send and receive messages.
The file related methods will simply transmit the files inside the body of the HTTP requests, in binary form without any transformation or encoding. The rest of the parameters will be send using the request URL.
For file uploads, in WCF REST services, in the service method you will have to declare the parameter representing the file of being of type Stream. For example:
[OperationContract]
[WebInvoke(Method = "POST", UriTemplate = "uploadProjectDocument?projectId={projectId}")]
void UploadProjectDocument(Guid projectId, Stream document);
When encountering Stream parameters, WCF will simply take their content directly from the body of the request without doing any processing on it. You can only have one parameter of type Stream on a service method (which makes sense because each HTTP request has only one body).
The downside to the above approach is that besides the parameter representing the file, all the other ones need to be of basic types (like strings, number, GUIDs). You cannot pass any complex object. If you need to do that you will have to create a separate method for it, so you might end up having two methods (which will translate in two calls at runtime) where at the moment you have only one. However, uploading files directly in the body of the request should be much more efficient than serializing them, so even with the extra call things should be improved.
For downloading files from the service you will need to declare the WCF methods as returning Stream and simply write the file in returned object. As with the Stream paramters, WCF will output the content of the Stream directly into the body of the result without any transformations on it.
回答4:
This article http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/d84ba34b-b0e0-4961-a167-bbe7618beb83 covers performance issues with Azure.
Azure roles by default only run in a single thread, which is very inefficient on the servers. There are some very nice design patterns out there that shows you how to implement multithreaded Azure roles, I personally follow this one http://www.31a2ba2a-b718-11dc-8314-0800200c9a66.com/2010/12/running-multiple-threads-on-windows.html . With this your roles can serialize objects in parallel.
I use JSON as an interchange format instead of XML, it has a much smaller bytesize and is well supported with .NET 4. I currently use DataContractJsonSerializer
but you could also look into JavaScriptSerializer
or JSON.NET, if it is serialization performance you are after I would suggest you compare these.
WCF services are single threaded by default ( source: http://msdn.microsoft.com/query/dev10.query?appId=Dev10IDEF1&l=EN-US&k=k(SYSTEM.SERVICEMODEL.SERVICEBEHAVIORATTRIBUTE.CONCURRENCYMODE);k(TargetFrameworkMoniker-%22.NETFRAMEWORK%2cVERSION%3dV4.0%22);k(DevLang-CSHARP)&rd=true ) . Here is a code sample that will make your RESTfull API multi-threaded:
ExampleService.svc.cs
[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple, InstanceContextMode = InstanceContextMode.PerCall,
IncludeExceptionDetailInFaults = false, MaxItemsInObjectGraph = Int32.MaxValue)]
public class ExampleService : IExample
web.config
<system.serviceModel>
<protocolMapping>
<add scheme="http" binding="webHttpBinding" bindingConfiguration="" />
</protocolMapping>
<behaviors>
<endpointBehaviors>
<behavior name="">
<webHttp defaultOutgoingResponseFormat="Json" />
</behavior>
</endpointBehaviors>
<serviceBehaviors>
<behavior name="">
<serviceMetadata httpGetEnabled="true" />
<serviceDebug includeExceptionDetailInFaults="false" />
</behavior>
</serviceBehaviors>
</behaviors>
<serviceHostingEnvironment multipleSiteBindingsEnabled="true" />
</system.serviceModel>
ExampleService.svc
<%@ ServiceHost Language="C#" Debug="true" Service="WebPages.Interfaces.ExampleService" CodeBehind="ExampleService.svc.cs" %>
Also, ASP.NET by default only allow for two concurrent HTTP connections (source See http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/d84ba34b-b0e0-4961-a167-bbe7618beb83 ) . These settings will allow for up to 48 concurrent HTTP connections:
web.config
<system.net>
<connectionManagement>
<!-- See http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/d84ba34b-b0e0-4961-a167-bbe7618beb83 -->
<add address="*" maxconnection="48" />
</connectionManagement>
</system.net>
If your HTTP POST body messages are usually smaller than 1460 bytes you should turn of nagling to improve performance ( source http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/d84ba34b-b0e0-4961-a167-bbe7618beb83 ) . Here is some settings that does this:
web.config
<system.net>
<settings>
<!-- See http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/d84ba34b-b0e0-4961-a167-bbe7618beb83 -->
<servicePointManager expect100Continue="false" />
</settings>
</system.net>
Define your JSON APIs something like this:
using System.ServiceModel;
using System.ServiceModel.Web;
using Interchange;
namespace WebPages.Interfaces
{
[ServiceContract]
public interface IExample
{
[OperationContract]
[WebInvoke(Method = "POST",
BodyStyle = WebMessageBodyStyle.Bare,
RequestFormat = WebMessageFormat.Json,
ResponseFormat = WebMessageFormat.Json)]
string GetUpdates(RequestUpdates name);
[OperationContract]
[WebInvoke(Method = "POST",
BodyStyle = WebMessageBodyStyle.Bare,
RequestFormat = WebMessageFormat.Json,
ResponseFormat = WebMessageFormat.Json)]
string PostMessage(PostMessage message);
}
}
You can serialize to JSON in .NET 4 like this:
string SerializeData(object data)
{
var serializer = new DataContractJsonSerializer(data.GetType());
var memoryStream = new MemoryStream();
serializer.WriteObject(memoryStream, data);
return Encoding.Default.GetString(memoryStream.ToArray());
}
A typical interchange entity you can define as normal:
using System.Collections.Generic;
using System.Runtime.Serialization;
namespace Interchange
{
[DataContract]
public class PostMessage
{
[DataMember]
public string Text { get; set; }
[DataMember]
public List<string> Tags { get; set; }
[DataMember]
public string AspNetSessionId { get; set; }
}
}
You could write your own HTTPModule for upstream GZip compression, but I would try the stuff above first.
Finally, make sure that your table storage is at the same location as the services that consume them.
回答5:
I've had a very pleasant experience with ServiceStack:
http://www.servicestack.net.
It's basically your last option; a fairly thin layer on top of HttpHandlers with fast XML and JSON serialization, that exposes a REST API.
The JSV serialization it also offers is about half the speed of Protobuf.NET I believe, and support for ProtoBuf is planned.
I don't know for sure if it runs on Azure, but I can't think of a reason why not as it simply integrates into any ASP.NET application.
回答6:
Here are Benchmarks for different .NET serialization options
Out of all JSON Serializers I've benchmarked my ServiceStack's Json serializer performs the best around 3x faster than JSON.NET. Here are a couple of external benchmarks showing this:
- http://daniel.wertheim.se/2011/02/07/json-net-vs-servicestack/
- http://theburningmonk.com/2011/08/performance-test-json-serializers/
ServiceStack (an Open source alternate to WCF) comes pre-configured with .NET's fastest JSV and JSON Text Serializers OOB.
I see someone include lengthy configuration on how you can bend WCF to configure it to use the slower JSON Serializer shipped with .NET. In Service Stack every web service is automatically available via JSON, XML, SOAP (inc. JSV, CSV, HTML) automatically without any config required, so you get to choose the most appropriate endpoint without any additional effort.
The same amount of code and configuration for the WCF example in Service Stack is just:
public class PostMessage
{
public string Text { get; set; }
public List<string> Tags { get; set; }
public string AspNetSessionId { get; set; }
}
public class GetUpdatesService : IService<GetUpdates>
{
public object Execute(GetUpdates request){ ... }
}
public class PostMessageService : IService<PostMessage>
{
public object Execute(PostMessage request){ ... }
}
Note: decorating your DTOs with [DataContract] is optional.
The ServiceStack Hello World example shows all the links different formats, metadata pages XSD Schemas and SOAP WSDLs automatically available after you create a web service.
回答7:
I found the initialization of blob storage (CreateCloudBlobClient(), GetContainerReference() etc) to be quite slow. It's a good idea to take this into consideration when designing Azure services.
I have separate services for anything that requires blob access as it dragged down the time for the pure db requests.