Count the number of rows for each file along with

2019-09-10 03:16发布

问题:

I have built a job that reads the data from a file, and based on the unique data of a particular columns, splits the data set into many files.

I am able to acheive the requirement by the below job :

Now from this job which is splitting the output into multiple files, what I want is to add a sub job which would give me two columns.

In the first column I want the name of the files that I created in my main job and in the second column, I want the count of number of rows each created output file has.

To achive this I used tflowmeter and to catch the result of count i used the tFlowmeterCatcher, which is giving me correct result for the count of each rows for the correspoding output files, but is giving the last file name in all the files that i have generated for the counts.

How can I get the correct file names and the corresponding row count.

回答1:

If you use the following directions, your job will in the end have additional components like so:

Use a tJavaFlex directly after the tFileOutputDelimited on main. It should look like this:

Start Code: int countRows = 0;
Main Code:  countRows = countRows + 1;
End Code:   globalMap.put("rowCount", countRows);

Connect this component OnComponentOk with the first component of a new subjob. This subjob holds a tFixedFlowInput, a tJavaRow and a tBufferOutput.

The tFixedFlowInput is just here so that the OnComponentOk can be connected, nothing has to be altered. In tJavaRow you put the following:

output_row.filename = (String)globalMap.get("row7.newColumn"); 
//or whatever is your row variable where the filename is located

output_row.rowCount = (Integer)globalMap.get("rowCount");

In the schema, add the following elements:

Simply add a tBufferOutput now at the end of the first subjob.

Now, create another new subjob with the components tBufferInput and whatever components you may need to process and store the data. Connect the very first component of your job with a OnSubjobOk with the tBufferInput component. I used a tLogRow to show the result (with my randomly created fake data):

.---------------+--------.
|      LogFileData       |
|=--------------+-------=|
|filename       |rowCount|
|=--------------+-------=|
|fileblerb1.txt |27      |
|fileblerb29.txt|14      |
|fileblerb44.txt|20      |
'---------------+--------'

NOTE: Keep in mind that if you add a header to the file (Include Header checked in tFileOutputDelimited), the job might need to be changed (simply set int countRows = 1; or whatever you would need). I did not test this case.



回答2:

You can use tFileproperties component to store file-name generated in a intermediate excel after first sub-job and use this excel in your second sub-job.

Thanks!