Working on a Python 3.6 read of a text file to extract relative lines to convert into a pandas dataframe.
What works: Searching for a phrase in a text document and converting the line into a pandas df.
import pandas as pd
df = pd.DataFrame()
list1 = []
list2 = []
with open('myfile.txt') as f:
for lineno, line in enumerate(f, 1):
if 'Project:' in line:
line = line.strip('\n')
list1.append(repr(line))
# Convert list1 into a df column
df = pd.DataFrame({'Project_Name':list1})
What doesn't work: Returning a relative line based on the search result. In my case I need to store the "relative" line -6 to -2 (earlier in the text) as Pandas columns.
with open('myfile.txt') as f:
for lineno, line in enumerate(f, 1):
if 'Project:' in line:
list2.append(repr(line)-6) #<--- can't use math here
Returns: TypeError: unsupported operand type(s) for -: 'str' and 'int'
Also tried using a range with partial success:
with open('myfile.txt') as f:
for lineno, line in enumerate(f, 1):
if 'Project' in line:
all_lines = f.readlines()
required_lines = [all_lines[i] for i in range(lineno-6,lineno-2)]
print (required_lines)
list2.append(required_lines) #<-- does not work
Python will print the first 4 target lines but it does not seem to be able to save it as a list or loop through each finding of "Project" in the text doc. Is there a better way to save the results of the relative line above (or below) the search term? Thanks much.
Text data looks like:
0 Exhibit 3
1 Date: February 2018
2 Description
3 Description
4 Description
5 2015
6 2016
7 2017
8 2018
9 $100.50 <---- Add these as different dataframe columns
10 $120.33 <----
11 $135.88 <----
12 $140.22 <----
13 Project A
14
15 Exhibit 4
16 Date: February 2018
17 Description
18 Description
19 2015
20 2016
21 2017
22 2018
23 $899.25 <----
24 $901.00 <----
25 $923.43 <----
26 $1002.02 <----
27 Project B