可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a service that transfers messages at a quite high rate.

Currently it is served by akka-tcp and it makes 3.5M messages per minute. I decided to give grpc a try. Unfortunately it resulted in much smaller throughput: ~500k messages per minute an even less.

Could you please recommend how to optimize it?

My setup

Hardware: 32 cores, 24Gb heap.

grpc version: 1.25.0

Message format and endpoint

Message is basically a binary blob. Client streams 100K - 1M and more messages into the same request (asynchronously), server doesn't respond with anything, client uses a no-op observer

service MyService {
    rpc send (stream MyMessage) returns (stream DummyResponse);
}

message MyMessage {
    int64 someField = 1;
    bytes payload = 2;  //not huge
}

message DummyResponse {
}

Problems: Message rate is low compared to akka implementation. I observe low CPU usage so I suspect that grpc call is actually blocking internally despite it says otherwise. Calling onNext() indeed doesn't return immediately but there is also GC on the table.

I tried to spawn more senders to mitigate this issue but didn't get much of improvement.

My findings Grpc actually allocates a 8KB byte buffer on each message when serializes it. See the stacktrace:

java.lang.Thread.State: BLOCKED (on object monitor) at com.google.common.io.ByteStreams.createBuffer(ByteStreams.java:58) at com.google.common.io.ByteStreams.copy(ByteStreams.java:105) at io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:274) at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:230) at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168) at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141) at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:53) at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37) at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:252) at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:346)

Any help with best practices on building high-throughput grpc clients appreciated.

回答1:

I solved the issue by creating several ManagedChannel instances per destination. Despite articles say that a ManagedChannel can spawn enough connections itself so one instance is enough it's wasn't true in my case.

Performance is in parity with akka-tcp implementation.

回答2:

Interesting question. Computer network packages are encoded using a stack of protocols, and such protocols are built on top of the specifications of the previous one. Hence the performance (throughput) of a protocol is bounded by the performance of the one used to built it, since you are adding extra encoding/decoding steps on top of the underlying one.

For instance gRPC is built on top of HTTP 1.1/2, which is a protocol on the Application layer, or L7, and as such its performance is bound by the performance of HTTP. Now HTTP itself is build on top of TCP, which is at Transport layer, or L4, so we can deduce that gRPC throughput cannot be larger than an equivalent code served in the TCP layer.

In other words: if you server is able to handle raw TCP packages, how adding new layers of complexity (gRPC) would improve performance?

回答3:

I'm quite impressed with how good Akka TCP has performed here :D

Our experience was slightly different. We were working on much smaller instances using Akka Cluster. For Akka remoting, we changed from Akka TCP to UDP using Artery and achieved a much higher rate + lower and more stable response time. There is even a config in Artery helping to balance between CPU consumption and response time from a cold start.

My suggestion is to use some UDP based framework which also takes care of transmission reliability for you (e.g. that Artery UDP), and just serialize using Protobuf, instead of using full flesh gRPC. The HTTP/2 transmission channel is not really for high throughput low response time purposes.