I understand how to get specific data from a file with Java 8 Streams. For example if we need to get Loaded packages from a file like this
2015-01-06 11:33:03 b.s.d.task [INFO] Emitting: eVentToRequestsBolt __ack_ack
2015-01-06 11:33:03 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package com.foo.bar
2015-01-06 11:33:04 b.s.d.executor [INFO] Processing received message source: eventToManageBolt:2, stream: __ack_ack, id: {}, [-6722594615019711369 -1335723027906100557]
2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package co.il.boo
2015-01-06 11:33:04 c.s.p.d.PackagesProvider [INFO] ===---> Loaded package dot.org.biz
we can do
List<String> packageList = Files.lines(Paths.get(args[1])).filter(line -> line.contains("===---> Loaded package"))
.map(line -> line.split(" "))
.map(arr -> arr[arr.length - 1]).collect(Collectors.toList());
I took (and slightly modified) the code from Parsing File Example.
But what if we also need to get all the dates (and times) for Emitting: events from the same log file? How we can do this within working with the same Stream?
I can only imagine using collect(groupingBy(...))
which groups lines with Loaded packages and lines with Emitting: before parsing and then parse each group (a map entry) separately. But that would create a map with all the raw data from log file which is very memory consuming.
Is there a similar way to effectively extract multiple types of data from Java 8 Streams?
You may solve it without defining new collectors and using third-party libraries in more imperative style. First you need to define a class which represents the parsing result. It should have two methods to accept an input line and combine with existing partial result:
class Data {
List<String> packageDates = new ArrayList<>();
List<String> emittingDates = new ArrayList<>();
// Consume single input line
void accept(String line) {
if(line.contains("===---> Loaded package"))
packageDates.add(line.substring(0, "XXXX-XX-XX".length()));
if(line.contains("Emitting"))
packageDates.add(line.substring(0, "XXXX-XX-XX XX:XX:XX".length()));
}
// Combine two partial results
void combine(Data other) {
packageDates.addAll(other.packageDates);
emittingDates.addAll(other.emittingDates);
}
}
Now you can collect in quite straightforward way:
Data result = Files.lines(Paths.get(args[1]))
.collect(Data::new, Data::accept, Data::combine);
You may use pairing
collector which I wrote in this answer and which is available in my StreamEx library. For your concrete problem you will also need a filtering
collector which is available in JDK-9 early access builds and also in my StreamEx library. If you don't like using third-party library, you may copy it from this answer.
Also you will need to store everything into some data structure. I declared the Data
class for this purpose:
class Data {
List<String> packageDates;
List<String> emittingDates;
public Data(List<String> packageDates, List<String> emittingDates) {
this.packageDates = packageDates;
this.emittingDates = emittingDates;
}
}
Putting everything together you can define a parsingCollector
:
Collector<String, ?, List<String>> packageDatesCollector =
filtering(line -> line.contains("===---> Loaded package"),
mapping(line -> line.substring(0, "XXXX-XX-XX".length()), toList()));
Collector<String, ?, List<String>> emittingDatesCollector =
filtering(line -> line.contains("Emitting"),
mapping(line -> line.substring(0, "XXXX-XX-XX XX:XX:XX".length()), toList()));
Collector<String, ?, Data> parsingCollector = pairing(
packageDatesCollector, emittingDatesCollector, Data::new);
And use it like this:
Data data = Files.lines(Paths.get(args[1])).collect(parsingCollector);