I have a basic stream processing flow which looks like
master topic -> my processing in a mapper/filter -> output topics
and I am wondering about the best way to handle "bad messages". This could potentially be things like messages that I can't deserialize properly, or perhaps the processing/filtering logic fails in some unexpected way (I have no external dependencies so there should be no transient errors of that sort).
I was considering wrapping all my processing/filtering code in a try catch and if an exception was raised then routing to an "error topic". Then I can study the message and modify it or fix my code as appropriate and then replay it on to master. If I let any exceptions propagate, the stream seems to get jammed and no more messages are picked up.
- Is this approach considered best practice?
- Is there a convenient Kafka streams way to handle this? I don't think there is a concept of a DLQ...
- What are the alternative ways to stop Kafka jamming on a "bad message"?
- What alternative error handling approaches are there?
For completeness here is my code (pseudo-ish):
class Document {
// Fields
}
class AnalysedDocument {
Document document;
String rawValue;
Exception exception;
Analysis analysis;
// All being well
AnalysedDocument(Document document, Analysis analysis) {...}
// Analysis failed
AnalysedDocument(Document document, Exception exception) {...}
// Deserialisation failed
AnalysedDocument(String rawValue, Exception exception) {...}
}
KStreamBuilder builder = new KStreamBuilder();
KStream<String, AnalysedPolecatDocument> analysedDocumentStream = builder
.stream(Serdes.String(), Serdes.String(), "master")
.mapValues(new ValueMapper<String, AnalysedDocument>() {
@Override
public AnalysedDocument apply(String rawValue) {
Document document;
try {
// Deserialise
document = ...
} catch (Exception e) {
return new AnalysedDocument(rawValue, exception);
}
try {
// Perform analysis
Analysis analysis = ...
return new AnalysedDocument(document, analysis);
} catch (Exception e) {
return new AnalysedDocument(document, exception);
}
}
});
// Branch based on whether analysis mapping failed to produce errorStream and successStream
errorStream.to(Serdes.String(), customPojoSerde(), "error");
successStream.to(Serdes.String(), customPojoSerde(), "analysed");
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
Any help greatly appreciated.
Update Mar 23, 2018: Kafka 1.0 provides much better and easier handling for bad error messages ("poison pills") via KIP-161 than what I described below. See default.deserialization.exception.handler in the Kafka 1.0 docs.
Ok, my answer here focuses on the (de)serialization issues as this might be the most tricky scenario to handle for most users.
The same thinking (for deserialization) can also be applied to failures in the processing logic. Here, most people tend to gravitate towards option 2 below (minus the deserialization part), but YMMV.
Yes, at the moment this is the way to go. Essentially, the two most common patterns are (1) skipping corrupted messages or (2) sending corrupted records to a quarantine topic aka a dead letter queue.
Yes, there is a way to handle this, including the use of a dead letter queue. However, it's (at least IMHO) not that convenient yet. If you have any feedback on how the API should allow you to handle this -- e.g. via a new or updated method, a configuration setting ("if serialization/deserialization fails send the problematic record to THIS quarantine topic") -- please let us know. :-)
See my examples below.
FWIW, the Kafka community is also discussing the addition of a new CLI tool that allows you to skip over corrupted messages. However, as a user of the Kafka Streams API, I think ideally you want to handle such scenarios directly in your code, and fallback to CLI utilities only as a last resort.
Here are some patterns for the Kafka Streams DSL to handle corrupted records/messages aka "poison pills". This is taken from http://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-messages
Option 1: Skip corrupted records with
flatMap
This is arguably what most users would like to do.
flatMap
because it allows you to output zero, one, or more output records per input record. In the case of a corrupted record we output nothing (zero records), thereby ignoring/skipping the corrupted record.flatMap
"marks" the input stream for potential data re-partitioning, i.e. if you perform a key-based operation such as groupings (groupBy
/groupByKey
) or joins afterwards, your data will be re-partitioned behind the scenes. Since this might be a costly step we don't want that to happen unnecessarily. If you KNOW that the record keys are always valid OR that you don't need to operate on the keys (thus keeping them as "raw" keys inbyte[]
format), you can change fromflatMap
toflatMapValues
, which will not result in data re-partitioning even if you join/group/aggregate the stream later.Code example:
Option 2: dead letter queue with
branch
Compared to option 1 (which ignores corrupted records) option 2 retains corrupted messages by filtering them out of the "main" input stream and writing them to a quarantine topic (think: dead letter queue). The drawback is that, for valid records, we must pay the manual deserialization cost twice.
Option 3: Skip corrupted records with
filter
I only mention this for completeness. This option looks like a mix of options 1 and 2, but is worse than either of them. Compared to option 1, you must pay the manual deserialization cost for valid records twice (bad!). Compared to option 2, you lose the ability to retain corrupted records in a dead letter queue.
I hope I could help. If yes, I'd appreciate your feedback on how we could improve the Kafka Streams API to handle failures/exceptions in a better/more convenient way than today. :-)
Right now, Kafka Streams offers only limited error handling capabilities. There is work in progress to simplify this. For now, your overall approach seems to be a good way to go.
One comment about handling de/serialization errors: handling those error manually, requires you to do de/serialization "manually". This means, you need to configure
ByteArraySerde
s for key and value for you input/output topic of your Streams app and add amap()
that does the de/serialization (ie,KStream<byte[],byte[]> -> map() -> KStream<keyType,valueType>
-- or the other way round if you also want to catch serialization exceptions). Otherwise, you cannottry-catch
deserialization exceptions.With your current approach, you "only" validate that the given string represents a valid document -- but it could be the case, that the message itself is corrupted and cannot be converted into a
String
in the source operator in the first place. Thus, you don't actually cover deserialization exception with you code. However, if you are sure a deserialization exception can never happen, you approach would be sufficient, too.Update
This issues is tackled via KIP-161 and will be included in the next release 1.0.0. It allows you to register an callback via parameter
default.deserialization.exception.handler
. The handler will be invoked every time a exception occurs during deserialization and allows you to return anDeserializationResponse
(CONTINUE
-> drop the record an move on, orFAIL
that is the default).Update 2
With KIP-210 (will be part of in Kafka 1.1) it's also possible to handle errors on the producer side, similar to the consumer part, by registering a
ProductionExceptionHandler
via configdefault.production.exception.handler
that can returnCONTINUE
.