CMD-Batch line length. Possible to remove? If not,

2019-07-21 06:07发布

问题:

Still new to cmd batch scripting...

I've got a batch to remove tab characters from a file. This usually works great with this code:

 setlocal DisableDelayedExpansion
 for /f "delims=" %%A in ('"findstr /n ^^ %FILENAME%"') do (
   set "line=%%A"
   setlocal EnableDelayedExpansion

   set "line=!line:*:=!"
   if defined line (
      set "line=!line:  =!"
      (echo(!line!)>>%TEMPFILE%
   ) ELSE echo(
   endlocal
)

But recently it didn't simply delete the tab character, but the whole line! I figuered out it must have something to do with the unusual length of the line (>9500 characters). If I split the line manually, it works as usual.

Right now I'm looking for a way to either

  1. make the code above work for any line lenght or
  2. insert a check for lines that are to long to process, so the batch can stop the process and display an appropiate message.

回答1:

The problem with long lines in Batch files is that environment variables can only store a maximum of 8 KB. However, it is possible to process longer lines in smaller chunks because when set /P command read a long line, it reads up to 1022 characters and the remaining characters will be read by the next set /P command. The Batch file below use this method (combined with findstr /O "^" that allows to know the length of the lines) to copy a file with lines of unlimited size:

@echo off
setlocal EnableDelayedExpansion

set "last=1022"
< input.txt (
   for /F "delims=:" %%a in ('findstr /O "^" input.txt') do (
      set /A "len=%%a-last-2, last=%%a, chunks=(len-1)/1022+1"
      set "chunk="
      for /L %%i in (1,1,!chunks!) do (
         set /P "chunk="
         set /P "=!chunk!" < NUL
      )
      if !chunks! gtr 0 echo/
   )
   for %%a in (input.txt) do set /A "len=%%~Za-last-2, chunks=(len-1)/1022+1"
   set "chunk="
   for /L %%i in (1,1,!chunks!) do (
      set /P "chunk="
      set /P "=!chunk!" < NUL
   )
   echo/
) > output.txt
move /Y output.txt input.txt

This method requires that the input lines ends in CR+LF characters (Windows standard) and have the problems inherent to set /P: it may eliminate control characters from the end of the line or from the end of each chunk of 1022 characters, or spaces from the beginning of the line/chunk; further details at this post. You may modify this program changing set /P "=!chunk!" < NUL by the corresponding set /P "=!chunk: =!" < NUL one in order to eliminate tab characters.



回答2:

cmd.exe can process lines up to 8k characters. I also need to process longer lines and after some research I found the easiest way is to use an external program. I use sed from UnxUtils.

This sed command should remove all tab characters:

sed -e "s/\t//g" <infile> > <outfile>


回答3:

VBS theoretical line length is 2,000,000,000 bytes (or 1 x 2^30 characters). You'll never get anywhere near that (the actual is largest block of free contigious memory - it will be millions of characters).

Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
'Remove ^ from quoting command line. Quote, ampersand and brackets
Pttn = Replace(Arg(2), "^(", "(")
Pttn = Replace(Pttn, "^)", ")")
Pttn = Replace(Pttn, "^&", "&")
Pttn = Replace(Pttn, "^""", """")
Set regEx1 = New RegExp
If Instr(LCase(Arg(1)), "i") > 0 then
    regEx1.IgnoreCase = True
Else
    regEx1.IgnoreCase = False
End If 
regEx1.Global = False
regEx1.Pattern = Pttn 
Do Until Inp.AtEndOfStream
    Line=Inp.readline
    Line = RegEx1.Replace(Line, Arg(3)) 
    outp.writeline Line
Loop

How to use.

Replace

filter replace {i|n} expression replace
filter repl {i|n} expression replace

Finds and replaces text using regular expressions.

Also used to extract substrings from a file.

Ampersands and brackets in expression must be escaped with the caret. Do not escape carets. Use hexidecimal code \x22 for quotes.

SearchOptions

i - ignore case
n - none

Expression

https://msdn.microsoft.com/en-us/library/ae5bf541(v%3Dvs.90).aspx

Replace

The text to replace. Use $1, $2, $..., $n to specify sub matches in the replace string

Example

filter replace i "=" "No equal sign" < "%systemroot%\win.ini"

This searches for text within square brackets and replaces the line with cat followed by the text within brackets

Filter replace i "^\[^(.*^)\]" "cat$1" < %windir%\win.ini

This searches for any text and prints from the 11th character to the end of the line.

Filter replace i "^.{10}^(.*^)$" "$1" < %windir%\win.ini

This searches a CSV file and prints the second and fourth field

Filter replace i "^.+,^(.+^),.+,^(.+^)$" "$1,$2" < csv.txt

Filter reads and writes standard in and standard out only. These are only available in a command prompt.

filter <inputfile >outputfile
filter <inputfile | other_command
other_command | filter >outputfile
other_command | filter | other_command

Download full source here https://skydrive.live.com/redir?resid=E2F0CE17A268A4FA!121