Find and extract text from within existing text fi

2019-08-02 08:31发布

问题:

I need to be able to extract data from within an existing text file. The structure of the text file looks something like this...

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

As you can see, the text file is arranged in sections. There is always a single header line, followed by a random number of other data lines, and there is always a blank line between sections. Unfortunately, there is no rhyme or reason to the naming scheme of the header sections or the data contained within the other data lines...only the aforementioned structure is somewhat consistent. The data I need to search for is located within one of the other data lines, in only one of the sections, which could be located anywhere within the text file. I can use the FIND command to locate the text I need to find, but once I do that, I need to be able to extract the entire section to a new text file. I can't figure out how to go up however many lines to the first preceeding blank line, then go down to the next following blank line, and extract everything in between. Does that make sense? Unfortunately, VBScript is simply not an option for this application or it would've been over and done with long ago. Any ideas? Thanx.

回答1:

@echo off
setlocal enableDelayedExpansion
set input="test.txt"
set output="extract.txt"
set search="MY TEXT"

::find the line with the text
for /f "delims=:" %%N in ('findstr /n /c:!search! %input%') do set lineNum=%%N
set "begin=0"

::find blank lines and set begin to the last blank before text and end to the first blank after text
for /f "delims=:" %%N in ('findstr /n "^$" %input%') do (
  if %%N lss !lineNum! (set "begin=%%N") else set "end=%%N" & goto :break
)
::end of section not found so we must count the number of lines in the file
for /f %%N in ('find /c /v "" ^<%input%') do set /a end=%%N+1
:break

::extract the section bracketed by begin and end
set /a count=end-begin-1
<%input% (
  rem ::throw away the beginning lines until we reach the desired section
  for /l %%N in (1 1 %begin%) do set /p "ln="
    rem ::read and write the section
    for /l %%N in (1 1 %count%) do (
      set "ln="
      set /p "ln="
      echo(!ln!
    )
)>%output%

Limitations for this solution:

  • Lines must be terminated by <CR><LF> (Windows style)
  • Lines must be <= 1021 bytes long (not including <CR><LF>)
  • Trailing control characters will be stripped from each line

If limitations are a problem then a less efficient variant can be written that reads the section using FOR /F instead of SET /P



回答2:

The program below read file lines and store the lines of one section in a vector, at the same time it check if the search text is inside current section. When the section ends, if the searched text was found, current section is output as the result; otherwise, the process pass to the next section.

@echo off
setlocal EnableDelayedExpansion
set infile=input.txt
set outfile=output.txt
set "search=Any text"
set textFound=
call :SearchSection < %infile% > %outfile%
goto :EOF

:SearchSection
   set i=0
   :readNextLine
      set line=
      set /P line=
      if not defined line goto endSection
      set /A i+=1
      set "ln%i%=!line!"
      if not "!ln%i%!" == "!line:%search%=!" set textFound=True
   goto readNextLine
   :endSection
   if %i% == 0 echo Error: Search text not found & exit /B
if not defined textFound goto SearchSection
for /L %%i in (1,1,%i%) do echo !ln%%i!
exit /B

The limitations of this program are the same that dbenham stated for his program.