I have to read a text file into Python. The file encoding is:
file -bi test.csv
text/plain; charset=us-ascii
This is a third-party file, and I get a new one every day, so I would rather not change it. The file has non ascii characters, such as Ö, for example. I need to read the lines using python, and I can afford to ignore a line which has a non-ascii character.
My problem is that when I read the file in Python, I get the UnicodeDecodeError when reaching the line where a non-ascii character exists, and I cannot read the rest of the file.
Is there a way to avoid this. If I try this:
fileHandle = codecs.open("test.csv", encoding='utf-8');
try:
for line in companiesFile:
print(line, end="");
except UnicodeDecodeError:
pass;
then when the error is reached the for loop ends and I cannot read the remaining of the file. I want to skip the line that causes the mistake and go on. I would rather not do any changes to the input file, if possible.
Is there any way to do this? Thank you very much.
Your file doesn't appear to use the UTF-8 encoding. It is important to use the correct codec when opening a file.
You can tell
open()
how to treat decoding errors, with theerrors
keyword:Opening the file with
'ignore'
or'replace'
will then let you read the file without exceptions being raised.