I have a word document that has data that I would like to parse into an excel file. The source files are hundreds of pages long. I have been working with VBA, but I just started learning the language and have run into lots of difficulties with trying to input a .doc file. I have been able to use the Open and the Line Input statement to retrieve from a .txt file but only gibberish when I try the .doc file.
I have included two links of screen shots.
The first is a screenshot of a sample of my input data.
http://img717.imageshack.us/i/input.jpg/
The second is a screenshot of my desired output.
http://img3.imageshack.us/i/outputg.jpg/
I have developed an algorithm of what I want to accomplish. I am just having difficulties coding. Below is the pseudocode that I have developed.
Variables:
string line = blank
series_title = blank
folder_title = blank
int series_number = 0
box_number = 0
folder_number = 0
year = 0
do while the <end_of_document> has not been reached
input line
If the first word in the line is “series”
store <series_number>
store the string after “:”into the <series_title>
end if
call parse_box(rest of line)
output < series_number > <series_title> < box_number > < folder_number ><folder_title> <year>
end do while
function parse_box(current line)
If the first word in the line is “box”
store <box_number>
end if
call parse_folder(rest of line)
end function
function parse_folder(current line)
If first word is “Folder”
store <folder_number>
end if
call parse_folder_title(rest of line)
end function
function parse_folder_title_and_year(current line)
string temp_folder_title
store everything as <temp_folder_title> until end of line
if last word in <temp_folder_title> is a year
store <year>
end if
if < temp_folder_title> is empty/blank
//use <folder_title> from before
else
<folder_title> is < temp_folder_title> minus <year>
end if
end parse_folder_title_and_year
Thanks ahead of time for all your help and suggestions