什么是在代码的单代码编码误差低于[重复](what is the uni code encoding

这个问题已经在这里有一个答案：

UnicodeEncodeError： 'ASCII'编解码器不能编码字符'\ xe9' - -当使用urlib.request python3 2个回答

我得到了以下unicode编码错误。

当我运行程序介绍如下，我得到一个unicode编码相关的错误

import bs4
import requests
from xhtml2pdf import pisa  # import python module
from xhtml2pdf.config.httpconfig import httpConfig

res = requests.get("https://www.insightsonindia.com/2018/06/04/insights-daily-current-affairs-04-june-2018/")
soup = bs4.BeautifulSoup(res.text, 'lxml')
pf = soup.find("div", class_="pf-content")

sourceHtml =str(pf)
outputFilename = "test.pdf"

def convertHtmlToPdf(sourceHtml, outputFilename):
    # open output file for writing (truncated binary)

    httpConfig.save_keys('nosslcheck', True)

    resultFile = open(outputFilename, "w+b")

    # convert HTML to PDF
    pisaStatus = pisa.CreatePDF(sourceHtml, dest=resultFile, encoding="utf-8")

    # close output file
    resultFile.close()  # close output file

    # return True on success and False on errors
    return pisaStatus.err

# Main program
if __name__ == "__main__":
    pisa.showLogging()
    convertHtmlToPdf(sourceHtml, outputFilename)

下面给出的错误

self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 37: ordinal not in range(128)

我试图下载使用xhtml2pdf一个网站的一部分。要做到这一点我用BS4和刮网站和存储。然后，它通过使用xhtml2pdf保存为PDF。大多数时候，它的工作就像魅力。但是，对于这种情况下它给我的错误。链接到github上完整的代码如下

链接到完整的代码，请点击这里

xhtml2pdf与ASCII编码，因为我的HTML文件中包含非ASCII字符，它显示错误。我不知道如何更改编码器xhtml2pdf。省略非ASCII字符不是不是一种选择。如果我不理它，然后链接到将被破坏的形象和图像不会在PDF格式显示。

完整回溯

```回溯（最近通话最后一个）：文件 “test3.py”，行80，在 convertHtmlToPdf（sourceHtml，outputFilename）文件 “test3.py” 68行，在convertHtmlToPdf pisaStatus = pisa.CreatePDF（sourceHtml，DEST = resultFile，编码= 'UTF-8'）文件 “C：\用户\ Ananthu \应用程序数据\地方\程序\的Python \ Python37-32 \ lib中\站点包\ xhtml2pdf \ document.py”，行97，在pisaDocument 编码，上下文=上下文，xml_output = xml_output）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ document.py” 59行，在pisaStory pisaParser（SRC，语境，DEFAULT_CSS，XHTML，编码，xml_output）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ parser.py”，线路759，在pisaParser pisaLoop（文件，上下文）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ parser.py”，线700，在pisaLoop pisaLoop（节点，上下文，路径，**千瓦）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ parser.py”，线路644，在pisaLoop pisaLoop（nnode，语境，路径，**千瓦）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ parser.py”，线路644，在pisaLoop pisaLoop（nnode，语境，路径，**千瓦）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ parser.py”，线路644，在pisaLoop pisaLoop（nnode，语境，路径，**千瓦） [上线重复2次以上] 文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ parser.py”，线路514，在pisaLoop ATTR = pisaGetAttributes（上下文，node.tagName，node.attributes）文件 “C：\ Users \用户Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ lib中\站点包\ xhtml2pdf \ parser.py”，线路124，在pisaGetAttributes NV = c.getFile（NV）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\的Python \ Python37-32 \ LIB \站点包\ xhtml2pdf \ context.py”，线路818，在的GetFile 返回的GetFile（姓名，亲属或self.pathDirectory）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ util.py”，线路738，在的GetFile 文件= pisaFileObject（*一个，**千瓦）文件“C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \站点包\ xhtml2pdf \ util.py”，线路644，在初始化 conn.request（ “GET”，路径）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \ HTTP \ client.py”，线1229，在请求 self._send_request（方法，URL，主体，标头，encode_chunked）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \ HTTP \ client.py”，线1240，在_send_request self.putrequest（方法，URL，**跳过）文件 “C：\用户\ Ananthu \应用程序数据\本地\程序\ Python的\ Python37-32 \ LIB \ HTTP \ client.py”，线1107，在putrequest self._output（request.encode（ 'ASCII'）） UnicodeEncodeError：“ASCII”编解码器不能在37位编码字符“\ u2019”：序数不在范围内（128）

问题是，检索到HTML中包含img标记其一些的src属性包含的URL '\u2019' （“右单引号”）字符。

xhtml2pdf正在通过这些URL Python的http.client模块前没有逃离他们。 http.client试图检索它们之前编码的网址为ASCII，以及错误发生。

这可以通过周围生成PDF之前逃脱在检索到的HTML的URL来工作。

的urllib.parse提供的工具来做到这一点。

from urllib import parse
...
res = requests.get("https://www.insightsonindia.com/2018/06/04/insights-daily-current-affairs-04-june-2018/")
soup = bs4.BeautifulSoup(res.text, 'lxml')
pf = soup.find("div", class_="pf-content")

imgs = pf.find_all('img')
for img in imgs: 
    url = img['src'] 
    scheme, netloc, path, params, query, fragment = parse.urlparse(url)
    new_path = parse.quote(path)
    new_url = parse.urlunparse((scheme, netloc, new_path, params, query, fragment))
    img['src'] = new_url

sourceHtml =str(pf)
outputFilename = "test.pdf"
...

对这些问题的答案这个问题提供Unicode和网址的一些背景资料。