How to extract a subset from a CSV file using NiFi

2019-03-04 02:51发布

问题:

I have a csv file say with 100+ columns and I want to extract only specific 60 columns as a subset(both column name + its value). I know we can use Extract Text processors. Can anyone tell me what regular expression to write? Ex- Lets say from the given snapshot I only want NiFi to Extract 'BMS_sw_micro', 'BMU_Dbc_Dbg_Micro', 'BMU_Dbc_Fia_Micro' columns i.e. Extract only column 'F,L,O'.

any help is much appreciated!

回答1:

As I said in the comment, you can Count the number of commas before the text, you want to match and use that in the RegEx, like this:

/(?<=^([^,]+?,){5})[^,]+/

What the RegEx do is, it starts from left of string and Counts the number of commas, before it matches text between 2 commas.

The number in the curly braces defines what column to match (how many commas to skip).

You run the RegEx once for every column, you want, specifying the column number.



回答2:

See my answer to this SO question to your related question about selecting CSV columns.