Still new to cmd batch scripting...
I've got a batch to remove tab characters from a file. This usually works great with this code:
setlocal DisableDelayedExpansion
for /f "delims=" %%A in ('"findstr /n ^^ %FILENAME%"') do (
set "line=%%A"
setlocal EnableDelayedExpansion
set "line=!line:*:=!"
if defined line (
set "line=!line: =!"
(echo(!line!)>>%TEMPFILE%
) ELSE echo(
endlocal
)
But recently it didn't simply delete the tab character, but the whole line! I figuered out it must have something to do with the unusual length of the line (>9500 characters). If I split the line manually, it works as usual.
Right now I'm looking for a way to either
- make the code above work for any line lenght or
- insert a check for lines that are to long to process, so the batch can stop the process and display an appropiate message.
The problem with long lines in Batch files is that environment variables can only store a maximum of 8 KB. However, it is possible to process longer lines in smaller chunks because when set /P
command read a long line, it reads up to 1022 characters and the remaining characters will be read by the next set /P
command. The Batch file below use this method (combined with findstr /O "^"
that allows to know the length of the lines) to copy a file with lines of unlimited size:
@echo off
setlocal EnableDelayedExpansion
set "last=1022"
< input.txt (
for /F "delims=:" %%a in ('findstr /O "^" input.txt') do (
set /A "len=%%a-last-2, last=%%a, chunks=(len-1)/1022+1"
set "chunk="
for /L %%i in (1,1,!chunks!) do (
set /P "chunk="
set /P "=!chunk!" < NUL
)
if !chunks! gtr 0 echo/
)
for %%a in (input.txt) do set /A "len=%%~Za-last-2, chunks=(len-1)/1022+1"
set "chunk="
for /L %%i in (1,1,!chunks!) do (
set /P "chunk="
set /P "=!chunk!" < NUL
)
echo/
) > output.txt
move /Y output.txt input.txt
This method requires that the input lines ends in CR+LF characters (Windows standard) and have the problems inherent to set /P
: it may eliminate control characters from the end of the line or from the end of each chunk of 1022 characters, or spaces from the beginning of the line/chunk; further details at this post. You may modify this program changing set /P "=!chunk!" < NUL
by the corresponding set /P "=!chunk: =!" < NUL
one in order to eliminate tab characters.
cmd.exe
can process lines up to 8k characters. I also need to process longer lines and after some research I found the easiest way is to use an external program. I use sed
from UnxUtils.
This sed
command should remove all tab characters:
sed -e "s/\t//g" <infile> > <outfile>
VBS theoretical line length is 2,000,000,000 bytes (or 1 x 2^30 characters). You'll never get anywhere near that (the actual is largest block of free contigious memory - it will be millions of characters).
Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
'Remove ^ from quoting command line. Quote, ampersand and brackets
Pttn = Replace(Arg(2), "^(", "(")
Pttn = Replace(Pttn, "^)", ")")
Pttn = Replace(Pttn, "^&", "&")
Pttn = Replace(Pttn, "^""", """")
Set regEx1 = New RegExp
If Instr(LCase(Arg(1)), "i") > 0 then
regEx1.IgnoreCase = True
Else
regEx1.IgnoreCase = False
End If
regEx1.Global = False
regEx1.Pattern = Pttn
Do Until Inp.AtEndOfStream
Line=Inp.readline
Line = RegEx1.Replace(Line, Arg(3))
outp.writeline Line
Loop
How to use.
Replace
filter replace {i|n} expression replace
filter repl {i|n} expression replace
Finds and replaces text using regular expressions.
Also used to extract substrings from a file.
Ampersands and brackets in expression must be escaped with the caret. Do not escape carets. Use hexidecimal code \x22 for quotes.
SearchOptions
i - ignore case
n - none
Expression
https://msdn.microsoft.com/en-us/library/ae5bf541(v%3Dvs.90).aspx
Replace
The text to replace. Use $1, $2, $..., $n to specify sub matches in the replace string
Example
filter replace i "=" "No equal sign" < "%systemroot%\win.ini"
This searches for text within square brackets and replaces the line with cat followed by the text within brackets
Filter replace i "^\[^(.*^)\]" "cat$1" < %windir%\win.ini
This searches for any text and prints from the 11th character to the end of the line.
Filter replace i "^.{10}^(.*^)$" "$1" < %windir%\win.ini
This searches a CSV file and prints the second and fourth field
Filter replace i "^.+,^(.+^),.+,^(.+^)$" "$1,$2" < csv.txt
Filter reads and writes standard in and standard out only. These are only available in a command prompt.
filter <inputfile >outputfile
filter <inputfile | other_command
other_command | filter >outputfile
other_command | filter | other_command
Download full source here https://skydrive.live.com/redir?resid=E2F0CE17A268A4FA!121