Remove header from multiple csv files

2019-03-06 07:52发布

问题:

I have multiple csv files coming on daily basis from a different server. These files are huge(over 200 MB). I have to remove header for all these csv files and replace them with required column headers using batch file.

The below code works fine to remove the column headers from one single file only:

@echo off
set "csv=mycsv.csv">"%csv%.new"
(
    for /f skip^=1^ usebackq^ delims^=^ eol^= %%A in ("%csv%") do echo %%A
)
move /y "%csv%.new" "%csv%" >nul

回答1:

Given that the CSV files do not contain any TAB characters (which were replaced by sequences of SPACE characters by the used more command) and that no file is longer than 65534 lines (in which case more expects user interaction), you could try one of the following:

  1. The new column header is given by another file headerfile.csv:

    < "headerfile.csv" set /P "HEADER="
    for %%F in ("*.csv") do (
        if /I not "%%~F"=="headerfile.csv" (
            > "%%~F.tmp" echo(%HEADER%
            >>"%%~F.tmp" more +1 "%%~F"
            move /Y "%%~F.tmp" "%%~F"
        )
    )
    

    You might not want to exclude headerfile.csv from being processed in case it is not located in the current directory where all the other CSV files are; simply remove the if query then.

  2. The new column header is given as a string constant:

    set "HEADER=new,header,string,here"
    for %%F in ("*.csv") do (
        > "%%~F.tmp" echo(%HEADER%
        >>"%%~F.tmp" more +1 "%%~F"
        move /Y "%%~F.tmp" "%%~F"
    )
    

Update

Here is a way without using the more command, so its limitations do no longer apply. It does also not use for /F which would limit the length of each line to 8191 bytes/characters:

  1. The new column header is given by another file headerfile.csv:

    < "headerfile.csv" set /P "HEADER="
    for %%F in ("*.csv") do (
        if /I not "%%~F"=="headerfile.csv" (
            > "%%~F.tmp" echo(%HEADER%
            >>"%%~F.tmp" < "%%~F" (set /P = & findstr "^")
            move /Y "%%~F.tmp" "%%~F"
        )
    )
    
  2. The new column header is given as a string constant:

    set "HEADER=new,header,string,here"
    for %%F in ("*.csv") do (
        > "%%~F.tmp" echo(%HEADER%
        >>"%%~F.tmp" < "%%~F" (set /P = & findstr "^")
        move /Y "%%~F.tmp" "%%~F"
    )
    

Note that the header line is still limited to 8191 bytes/characters, because it is stored in a variable (in order to avoid multiple file read operations), and also by the related echo(%HEADER% command line which is also limited to that size. To overcome this limit, place only the header into a text file and with in the loop, copy it to %%~F.tmp prior to appending the data.



回答2:

for /f "delims=" %%a in (*.csv) do echo %%a>csv.new&goto mainbody
:mainbody
for /f "skip=1delims=" %%a in (*.csv) do echo %%a>>csv.new

should do what you want, using the ubiquitous crystal ball to scry "required column headers" to mean "the column headers from a .csv file"



回答3:

You may even get away with using MORE:

For %%A In (*.csv) Do More +1 "%%A" 1>%%~nA.new

Note - This method will convert any tabs to spaces