可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I made this script which removes every trailing whitespace characters and replace all bad french characters by the right ones.

Removing the trailing whitespace characters works but not the part about replacing the french characters.

The file to read/write are encoded in UTF-8 so I added the utf-8 declaration above my script but in the end every bad characters (like \u00e9) are being replaced by litte square.

Any idea why?

script :

# --*-- encoding: utf-8 --*--

import fileinput
import sys

CRLF = "\r\n"

ACCENT_AIGU = "\\u00e9"
ACCENT_GRAVE = "\\u00e8"
C_CEDILLE = "\\u00e7"
A_ACCENTUE = "\\u00e0"
E_CIRCONFLEXE = "\\u00ea"

CURRENT_ENCODING = "utf-8"
#Getting filepath
print "Veuillez entrer le chemin du fichier (utiliser des \\ ou /, c'est pareil) :"
path = str(raw_input())
path.replace("\\", "/")

#removing trailing whitespace characters
for line in fileinput.FileInput(path, inplace=1):
    if line != CRLF:
        line = line.rstrip()
        print line
        print >>sys.stderr, line
    else:
        print CRLF
        print >>sys.stderr, CRLF
fileinput.close()

#Replacing bad wharacters
for line in fileinput.FileInput(path, inplace=1):
        line = line.decode(CURRENT_ENCODING)
        line = line.replace(ACCENT_AIGU, "é")
        line = line.replace(ACCENT_GRAVE, "è")
        line = line.replace(A_ACCENTUE, "à")
        line = line.replace(E_CIRCONFLEXE, "ê")
        line = line.replace(C_CEDILLE, "ç")
        line.encode(CURRENT_ENCODING)
        sys.stdout.write(line) #avoid CRLF added by print
        print >>sys.stderr, line
fileinput.close()

EDIT

the input file contains this type of text :

 * Cette m\u00e9thode permet d'appeller le service du module de tourn\u00e9e
 * <code>rechercherTechnicien</code> et retourne la liste repr\u00e9sentant le num\u00e9ro 
 * de la tourn\u00e9e ainsi que le nom et le pr\u00e9nom du technicien et la dur\u00e9e 
 * th\u00e9orique por se rendre au point d'intervention.
 *

EDIT2

Final code if someone is interested, the first part replaces the badly encoded caracters, the second part removes all right trailing whitespaces caracters.

# --*-- encoding: iso-8859-1 --*--

import fileinput
import re

CRLF = "\r\n"

print "Veuillez entrer le chemin du fichier (utiliser des \\ ou /, c'est pareil) :"
path = str(raw_input())
path = path.replace("\\", "/")

def unicodize(seg):
    if re.match(r'\\u[0-9a-f]{4}', seg):
        return seg.decode('unicode-escape')
    return seg.decode('utf-8')

print "Replacing caracter badly encoded"
with open(path,"r") as f:
    content = f.read()
replaced = (unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})',content))

with open(path, "w") as o:
    o.write(''.join(replaced).encode("utf-8"))

print "Removing trailing whitespaces caracters"
for line in fileinput.FileInput(path, inplace=1):
    if line != CRLF:
        line = line.rstrip()
        print line
    else:
        print CRLF
fileinput.close()

print "Done!"

回答1:

Not so quick, and mostly dirty, but...

with open("enc.txt","r") as f:
    content = f.read()

import re

def unicodize(seg):
    if re.match(r'\\u[0-9a-f]{4}', seg):
        return seg.decode('unicode-escape')

    return seg.decode('utf-8')

replaced = (unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})',content))

print(''.join(replaced))

Given that input file (mixing unicode escaped sequences and properly encoded utf-8 text):

 * Cette m\u00e9thode permet d'appeller le service du module de
 * tourn\u00e9e
 * <code>rechercherTechnicien</code> et retourne la liste
 * repr\u00e9sentant le num\u00e9ro 
 * de la tourn\u00e9e ainsi que le nom et le pr\u00e9nom du technicien
 * et la dur\u00e9e 
 * th\u00e9orique por se rendre au point d'intervention.
 * 
 * S'il le désire le technicien peut dormir à l'hôtel

Produce that result:

 * Cette méthode permet d'appeller le service du module de
 * tournée
 * <code>rechercherTechnicien</code> et retourne la liste
 * représentant le numéro 
 * de la tournée ainsi que le nom et le prénom du technicien
 * et la durée 
 * théorique por se rendre au point d'intervention.
 * 
 * S'il le désire le technicien peut dormir à l'hôtel

回答2:

You are looking for s.decode('unicode_escape'):

>>> s = r"""
...  * Cette m\u00e9thode permet d'appeller le service du module de tourn\u00e9e
...  * <code>rechercherTechnicien</code> et retourne la liste repr\u00e9sentant le num\u00e9ro
...  * de la tourn\u00e9e ainsi que le nom et le pr\u00e9nom du technicien et la dur\u00e9e
...  * th\u00e9orique por se rendre au point d'intervention.
...  *
... """
>>> print(s.decode('unicode_escape'))

 * Cette méthode permet d'appeller le service du module de tournée
 * <code>rechercherTechnicien</code> et retourne la liste représentant le numéro
 * de la tournée ainsi que le nom et le prénom du technicien et la durée
 * théorique por se rendre au point d'intervention.
 *

And don't forget to encode your string before writing it to a file (e.g. as UTF-8):

writable_s = s.decode('unicode_escape').encode('utf-8')

回答3:

To read a file encoded in utf-8 that has non-ascii characters in it and that literally has \, u, 0, 0, e, 9 character sequences that you also want to decode:

import codecs
import re

repl = lambda m: m.group().encode('ascii', 'strict').decode('unicode-escape')
with codecs.open(filename, encoding='utf-8') as file:
    text = re.sub(r'\\u[0-9a-f]{4}', repl, file.read())

Note: normally, non-ascii characters and Unicode escapes (\uxxxx) should not be mixed in a single file. Use one or another but not both simultaneously.

The file to read/write are encoded in UTF-8 so I added the utf-8 declaration above my script

The utf-8 declaration in your Python source affects only character encoding of your Python source e.g., it allows to use non-ascii characters in bytestring and unicode literals. It has no effect on character encoding of the files that you read.

but in the end every bad characters (like \u00e9) are being replaced by litte square.

"litte square" might be an artifact of printing to console. Try this in a console to see whether squares are present:

>>> s = "\u00e9" # 6 bytes in a bytestring
>>> len(s)
6
>>> u = u"\u00e9" # unicode escape in a Unicode string
>>> len(u)
1
>>> print s
\u00e9
>>> print u
é
>>> b = "é" # non-ascii char in a bytestring
>>> len(b) # note: it is 2 bytes 
2
>>> ub = u"é"  # non-ascii char in a Unicode string
>>> len(ub)
1
>>> print b
é
>>> print ub
é
>>> se = u.encode('ascii', 'backslashreplace') # non-ascii chars are escaped
>>> len(se)
4
>>> (s.decode('unicode-escape') == u == b.decode('utf-8') == ub == 
     se.decode('unicode-escape') == unichr(0xe9))
True