Netty instantiates a set of request handler classes whenever a new connection is opened. This seems fine for something like a websocket where the connection will stay open for the lifetime of the websocket.
When using Netty as an HTTP server which could receive thousands of requests per second this seems like it would be rather taxing on garbage collection. Every single request instantiates several classes (in my case 10 handler classes) and then garbage collects them some milliseconds later.
In an HTTP Server with a moderate load ~1000 req/sec, that would be ten thousand classes to instantiate and garbage collect every second.
It seems we could simply see answer below create sharable handlers that would eliminate this large GC overhead using ChannelHandler.Sharable
. They just have to be thread safe.
However, I see that all the very basic HTTP Handlers that are packaged in the library are not sharable, like HttpServerCodec
and HttpObjectAggregator
. Also, none of the HTTP handler examples are sharable. 99% of example code and tutorials don't seem to bother with it. There was only one blurb in Norman Maurer's book (Netty author) which gives a reason for using a shared handler:
WHY SHARE A CHANNELHANDLER?
A common reason for installing a single ChannelHandler in multiple ChannelPipelines is to gather statistics across multiple Channels.
No mention of GC load concerns anywhere.
Netty has been in regular production use for almost a decade. It is arguable the most used java library in existence for highly concurrent non-blocking IO.
In other words, it is designed to do much more than my moderate 1000 requests per second.
Is there something I missed that makes the GC load not a problem?
Or, should I try to implement my own Sharable
handlers with similar functionality for decoding, encoding and writing HTTP requests and responses?
While we always aim to produce as minimal GC as possible in netty there are just situations where this is not really possible. For example the http codecs etc keep state that is per connection so these can't be shared (even if they would be thread-safe).
The only way around this would be to pool them but I think there are other objects which are much more likely to cause GC problems and for these we try to pool when easily possible.
TL;DR:
If you get to the volume needed to make GC a problem with the default HTTP handlers it is time for scaling with a proxy server anyway.
After Norman's answer I ended up attempting a very bare bones sharable HTTP codec/aggregator POC to see if this was something to pursue or not.
My sharable decoder was a long ways from RFC 7230 but it gave me enough of the request for my current project.
I then used httperf and visualvm to get a concept of the GC load difference. For my efforts I only had a 10% decrease in the GC rate. In other words, it really doesn't make much of a difference.
The only real appreciated effect was that I had 5% less errors when running 1000 req/sec compared to using the packaged un-shared HTTP codec + aggregator versus my sharable one. And this only occurred when I was doing 1000 req/sec sustained for longer than 10 seconds.
In the end I'm not going to pursue it. The amount of time needed to make this into a fully HTTP compliant decoder for the tiny benefit that can be solved by using a proxy server is not worth the time at all.
For reference purposes here is the combined sharable decoder/aggregator that I tried:
The resultant object created by the decoder for handling on the pipeline:
A simple handler for the testing:
The full pipeline using these sharable handlers:
The above was tested against this (more usual) unshared pipeline: