how to cut columns of csv

2020-05-31 05:48发布

问题:

I have a set of csv files (around 250), each having 300 to 500 records. I need to cut 2 or 3 columns from each file and store it to another one. I'm using ubuntu OS. Is there any way to do it in command or utility?

回答1:

If you know that the column delimiter does not occur inside the fields, you can use cut.

$ cat in.csv
foo,bar,baz
qux,quux,quuux
$ cut -d, -f2,3 < in.csv 
bar,baz
quux,quuux

You can use the shell buildin 'for' to loop over all input files.



回答2:

If the fields might contain the delimiter, you ought to find a library that can parse CSV files. Typically, general purpose scripting languages will include a CSV module in their standard library.

Ruby:   require 'csv'
Python: import csv
Perl:   use Text::ParseWords;


回答3:

If your fields contain commas or newlines, you can use a helper program I wrote to allow cut (and other UNIX text processing tools) to properly work with the data.

https://github.com/dbro/csvquote

This program finds special characters inside quoted fields, and temporarily replaces them with nonprinting characters which won't confuse the cut program. Then they get restored after cut is done.

lutz' solution would become:

csvquote in.csv | cut -d, -f2,3 | csvquote -u 


回答4:

If you used ssconvert to get the CSV you might try:

ssconvert -O 'separator="|"' "file.xls" "file.txt"

Notice the TXT extension instead CSV, this way will use Gnumeric_stf:stf_assistant exporter instead of Gnumeric_stf:stf_csv, which let you use options (-O parameter). Otherwise you'll get a The file saver does not take options error. Pipe character is much more unlikely, but you might want to check before.

Then you can rename it and do things like:

cat file.csv | cut -d "|" -f3 | sort | uniq -c | sort -rn | head
  • Other options example: -O 'eol=unix separator=; format=preserve charset=UTF-8 locale=en_US transliterate-mode=transliterate quoting-mode=never'.
  • A solution with AWK v4+.
  • ssconvert man page.


标签: shell ubuntu csv