As a beginner of awk
I am able to split the data with unique value by
awk -F, '{print >> $1".csv";close($1)}' myfile.csv
But I would like to split a large CSV file based on additional condition which is the occurrences of unique values in a specific column.
Specifically, with input
111,1,0,1
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1,1
444,1,1,1
444,0,0,0
555,1,1,1
666,1,0,0
I would like the output files to be
111,1,0,1
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1,1
and
444,1,1,1
444,1,0,1
555,1,1,1
666,1,0,0
each of which contains three(in this case) unique values, 111,222,333
and 444,555,666
respectively, in first column.
Any help would be appreciated.
this one-liner would help:
You change the
u=3
value intox
to gainx
unique values per file.If you run this line with your input file, you should got
1.csv and 2.csv
Edit (add some test output):
This will do the trick and I find it pretty readable and easy to understand:
We start with our count at 0 and our filename at 1. We then count each unique value we get from the fist column, and whenever its the 4th one, we reset our count and move to the next filename.
Here's some sample data I used, which is just yours with some additional lines.
And running the awk like so:
We see the following output files and content: