Apache Flink: How to count the total number of eve

2019-02-20 00:39发布

问题:

I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. I am doing this by using map on joinedEventDataStream as shown below

joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() {

            @Override
            public Object map(JoinedEvent joinedEvent) throws Exception {

                number_of_joined_events += 1;

                return null;
            }
        });

Question # 1: Is this the appropriate way to count the number of events in the stream?

Question # 2: I have noticed a wired behavior, which some of you might not believe. The issue is that when I run my Flink program in IntelliJ IDE, it shows me correct value for number_of_joined_events but 0 in the case when I submit this program as jar. So I am getting the initial value of number_of_joined_events when I run the program as a jar file instead of the actual count. Why is this happening only in case of jar file submission and not in IDE?

回答1:

Your approach is not working. The behavior you noticed when executing the program via a JAR file is expected.

I don't know how number_of_joined_events is defined, but I assume its a static variable in your program. When you run the program in your IDE, it runs in a single JVM. Hence, all operators have access to the static variable. When you submit a JAR file to a remote process, the program is executed in a different JVM (possibly multiple JVMs) and the static variable in your client process is never updated.

You can use Flink's metrics or a ReduceFunction that sums 1s to count the number of processed records.