I have a file with lines of data like below I need to pull out the characters at 74-79 and 122-124 some lines will not have any character at 74-79 and I want to skip those lines.
import re
def main():
file=open("CCDATA.TXT","r")
lines =file.readlines()
file.close()
for line in lines:
lines=re.sub(r" +", " ", line)
print(lines)
main()
CF214L214L1671310491084111159 Customer Name 46081 171638440 0000320800000000HCCCIUAW 0612170609170609170300000000003135
CF214L214L1671310491107111509 Customer Name 46144 171639547 0000421200000000DRNRIUAW 0612170613170613170300000000003135
CF214L214L1671380999999900002000007420
CF214L214L1671310491084111159 Customer Name 46081 171638440 0000320800000000DRCSIU 0612170609170609170300000000003135
CF214L214L1671380999999900001000003208
CF214L214L1671510446646410055 Customer Name 46436 171677320 0000027200000272AA 0616170623170623170300000050003001
CF214L214L1671510126566110169 Customer Name 46450 171677321 0000117900001179AA 0616170623170623170300000250003001
CF214L214L1671510063942910172 Customer Name 46413 171677322 0000159300001593AA 0616170623170623170300000150003001
CF214L214L1671510808861010253 Customer Name 46448 171677323 0000298600002986AA 0616170623170623170300000350003001
CF214L214L1671510077309510502 Customer Name 46434 171677324 0000294300002943AA 0616170622170622170300000150003001
CF214L214L1671580999999900029000077728
CF214L214L1671610049631611165 Customer Name 46221 171677648 0000178700000000 0616170619170619170300000000003000
CF214L214L1671610895609911978 Customer Name 46433 171677348 0000011800000118AC 0616170622170622170300000150003041
CF214L214L1671680999999900002000001905
Edit: Thanks for commenting, apparently it should have been:
Short answer:
Just take
line[74:79]
and such as Roelant suggested. Since the lines in your input are always 230 chars long though, there'll never be anIndexError
, so you rather need to check if the result is all whitespace withisspace()
:A more robust approach that would also validate input (check if you're required to do so) is to parse the entire line and use a specific element from the result.
One way is a regex as per Parse a text file and extract a specific column, Tips for reading in a complex file - Python and an example at get the path in a file inside {} by python .
But for your specific format that appears to be an archaic, punchcard-derived one, with column number defining the datum's meaning, the format can probably be more conveniently expressed as a sequence of column numbers associated with field names (you never told us what they mean so I'm using generic names):
And parsed like this:
All that leaves is skip lines where the field is empty: