I've got a table in SQL Server that I'd like to stream to Kafka topic, the structure is as follows:
(UserID, ReportID)
This table is going to be continuously changed (records added, inserted, no updates)
I'd like to transform this into this kind of structure and put into Elasticsearch:
{
"UserID": 1,
"Reports": [1, 2, 3, 4, 5, 6]
}
Examples I've seen so far are logs or click-stream which and do not work in my case.
Is this kind of use case possible at all? I could always just look at UserID
changes and query database, but that seems naive and not the best approach.
Update
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.Serializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.*;
import java.util.ArrayList;
import java.util.Properties;
public class MyDemo {
public static void main(String... args) {
System.out.println("Hello KTable!");
final Serde<Long> longSerde = Serdes.Long();
KStreamBuilder builder = new KStreamBuilder();
KStream<Long, Long> reportPermission = builder.stream(TOPIC);
KTable<Long, ArrayList<Long>> result = reportPermission
.groupByKey()
.aggregate(
new Initializer<ArrayList<Long>>() {
@Override
public ArrayList<Long> apply() {
return null;
}
},
new Aggregator<Long, Long, ArrayList<Long>>() {
@Override
public ArrayList<Long> apply(Long key, Long value, ArrayList<Long> aggregate) {
aggregate.add(value);
return aggregate;
}
},
new Serde<ArrayList<Long>>() {
@Override
public void configure(Map<String, ?> configs, boolean isKey) {}
@Override
public void close() {}
@Override
public Serializer<ArrayList<Long>> serializer() {
return null;
}
@Override
public Deserializer<ArrayList<Long>> deserializer() {
return null;
}
});
result.to("report-aggregated-topic");
KafkaStreams streams = new KafkaStreams(builder, createStreamProperties());
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
private static final String TOPIC = "report-permission";
private static final Properties createStreamProperties() {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "report-permission-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
return props;
}
}
I'm actually getting stuck at aggregate stage because I cannot write a proper SerDe for ArrayList<Long>
(not enough skills yet), lambdas seem not to work on aggregator - it doesn't know what's the type of agg
:
KTable<Long, ArrayList<Long>> sample = builder.stream(TOPIC)
.groupByKey()
.aggregate(
() -> new ArrayList<Long>(),
(key, val, agg) -> agg.add(val),
longSerde
);