STEP 1: Run the producer to create sample data
./bin/kafka-avro-console-producer \
--broker-list localhost:9092 --topic stream-test-topic \
--property schema.registry.url=http://localhost:8081 \
--property value.schema='{"type":"record","name":"dealRecord","fields":[{"name":"DEAL_ID","type":"string"},{"name":"DEAL_EXPENSE_CODE","type":"string"},{"name":"DEAL_BRANCH","type":"string"}]}'
Sample Data :
{"DEAL_ID":"deal002", "DEAL_EXPENSE_CODE":"EXP002", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal003", "DEAL_EXPENSE_CODE":"EXP003", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal004", "DEAL_EXPENSE_CODE":"EXP004", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal005", "DEAL_EXPENSE_CODE":"EXP005", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal006", "DEAL_EXPENSE_CODE":"EXP006", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal007", "DEAL_EXPENSE_CODE":"EXP001", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal008", "DEAL_EXPENSE_CODE":"EXP002", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal009", "DEAL_EXPENSE_CODE":"EXP003", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal010", "DEAL_EXPENSE_CODE":"EXP004", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal011", "DEAL_EXPENSE_CODE":"EXP005", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal012", "DEAL_EXPENSE_CODE":"EXP006", "DEAL_BRANCH":"AMSTERDAM"}
STEP 2: Open another terminal and run the consumer to test the data.
./bin/kafka-avro-console-consumer --topic stream-test-topic \
--bootstrap-server localhost:9092 \
--property schema.registry.url=http://localhost:8081 \
--from-beginning
STEP 3: Open another terminal and run the producer.
./bin/kafka-avro-console-producer \
--broker-list localhost:9092 --topic expense-test-topic \
--property "parse.key=true" \
--property "key.separator=:" \
--property schema.registry.url=http://localhost:8081 \
--property key.schema='"string"' \
--property value.schema='{"type":"record","name":"dealRecord","fields":[{"name":"EXPENSE_CODE","type":"string"},{"name":"EXPENSE_DESC","type":"string"}]}'
Data:
"pk1":{"EXPENSE_CODE":"EXP001", "EXPENSE_DESC":"Regulatory Deposit"}
"pk2":{"EXPENSE_CODE":"EXP002", "EXPENSE_DESC":"ABC - Sofia"}
"pk3":{"EXPENSE_CODE":"EXP003", "EXPENSE_DESC":"Apple Corporation"}
"pk4":{"EXPENSE_CODE":"EXP004", "EXPENSE_DESC":"Confluent Europe"}
"pk5":{"EXPENSE_CODE":"EXP005", "EXPENSE_DESC":"Air India"}
"pk6":{"EXPENSE_CODE":"EXP006", "EXPENSE_DESC":"KLM International"}
STEP 4: Open another terminal and run the consumer
./bin/kafka-avro-console-consumer --topic expense-test-topic \
--bootstrap-server localhost:9092 \
--property "parse.key=true" \
--property "key.separator=:" \
--property schema.registry.url=http://localhost:8081 \
--from-beginning
STEP 5: Login to KSQL client.
./bin/ksql http://localhost:8088
create following stream and table and run join query.
KSQL:
STREAM:
CREATE STREAM SAMPLE_STREAM
(DEAL_ID VARCHAR, DEAL_EXPENSE_CODE varchar, DEAL_BRANCH VARCHAR)
WITH (kafka_topic='stream-test-topic',value_format='AVRO', key = 'DEAL_ID');
TABLE:
CREATE TABLE SAMPLE_TABLE
(EXPENSE_CODE varchar, EXPENSE_DESC VARCHAR)
WITH (kafka_topic='expense-test-topic',value_format='AVRO', key = 'EXPENSE_CODE');
FOLLOWING is the OUTPUT:
ksql> SELECT STREAM1.DEAL_EXPENSE_CODE, TABLE1.EXPENSE_DESC
from SAMPLE_STREAM STREAM1 LEFT JOIN SAMPLE_TABLE TABLE1
ON STREAM1.DEAL_EXPENSE_CODE = TABLE1.EXPENSE_CODE
WINDOW TUMBLING (SIZE 3 MINUTE)
GROUP BY STREAM1.DEAL_EXPENSE_CODE, TABLE1.EXPENSE_DESC;
EXP001 | null
EXP001 | null
EXP002 | null
EXP003 | null
EXP004 | null
EXP005 | null
EXP006 | null
EXP002 | null
EXP002 | null
tl;dr: Your table data needs to be keyed on the column on which you're joining.
Using the sample data above, here's how to investigate and fix.
Use KSQL to check the data in the topics (no need for
kafka-avro-console-consumer
). Format of the output data is timestamp, key, valuestream
:table
:At this point, note that the key (
pk<x>
) does not match the column on which we will be joiningRegister the two topics:
Tell KSQL to query events from the beginning of each topic
Validate that the table's declared key per the DDL (
KEY='EXPENSE_CODE'
) matches the actual key of the underlying Kafka messages (available through theROWKEY
system column):The keys don't match. Our join is doomed!
Magic workaround—let's rekey the topic using KSQL!
Register the table's source topic as a KSQL
STREAM
:Create a derived stream, keyed on the correct colum. This is underpinned by a re-keyed Kafka topic.
Re-register the KSQL
_TABLE_
on top of the re-keyed topic:Check the keys (declared vs message) match on the new table:
Successful join: