My company is using a database and I am writing a script that interacts with that database. There is already an script for putting the query on database and based on the query that script will return results from database.
I am working on unix environment and I am using that script in my script for getting some data from database and I am redirecting the result from the query to a file. Now when I try to read this file then I am getting an error saying-
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 9741: ordinal not in range(128)
I know that python is not able to read file because of the encoding of the file. The encoding of the file is not ascii that's why the error is coming. I tried checking the encoding of the file and tried reading the file with its own encoding.
The code that I am using is-
os.system("Query.pl \"select title from bug where (ste='KGF-A' AND ( status = 'Not_Approved')) \">patchlet.txt")
encoding_dict3={}
encoding_dict3=chardet.detect(open("patchlet.txt", "rb").read())
print(encoding_dict3)
# Open the patchlet.txt file for storing the last part of titles for latest ACF in a list
with codecs.open("patchlet.txt",encoding='{}'.format(encoding_dict3['encoding'])) as csvFile
readCSV = csv.reader(csvFile,delimiter=":")
for row in readCSV:
if len(row)!=0:
if len(row) > 1:
j=len(row)-1
patchlets_in_latest.append(row[j])
elif len(row) ==1:
patchlets_in_latest.append(row[0])
patchlets_in_latest_list=[]
# calling the strip_list_noempty function for removing newline and whitespace characters
patchlets_in_latest_list=strip_list_noempty(patchlets_in_latest)
# coverting list of titles in set to remove any duplicate entry if present
patchlets_in_latest_set= set(patchlets_in_latest_list)
# Finding duplicate entries in list
duplicates_in_latest=[k for k,v in Counter(patchlets_in_latest_list).items() if v>1]
# Printing imp info for logs
print("list of titles of patchlets in latest list are : ")
for i in patchlets_in_latest_list:
**print(str(i))**
print("No of patchlets in latest list are : {}".format(str(len(patchlets_in_latest_list))))
Where Query.pl is the perl script that is written to bring in the result of query from database.The encoding that I am getting for "patchlet.txt" (the file used for storing result from HSD) is:
{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}
Even when I have provided the same encoding for reading the file, then also I am getting the error.
Please help me in resolving this error.
EDIT: I am using python3.6
EDIT2:
While outputting the result I am getting the error and there is one line in the file which is having some unknown character. The line looks like:
Some failure because of which vtrace cannot be used along with some trace.
I am using gvim and in gvim the "vtrace" looks like "~Vvtrace" . Then I checked on database manually for this character and the character is "–" which is according to my keyboard is neither hyphen nor underscore.These kinds of characters are creating the problem.
Also I am working on linux environment.
EDIT 3:
I have added more code that can help in tracing the error. Also I have highlighted a "print" statement (print(str(i)))
where I am getting the error.