How to make new line commands work in a .txt file

2019-02-15 05:50发布

问题:

I just started using Python, I am trying to make a program that writes the lyrics of a song on the screen opened from the internet "www....../lyrics.txt". My first code:

    import urllib.request
    lyrics=urllib.request.urlopen("http://hereIsMyUrl/lyrics.txt")
    text=lyrics.read()
    print(text)

When I activated this code, it didn't give me the lyrics as they are written on the website, it gave me new line commands '\r\n' at all the places that should have been new lines and gave me all the lyrics in a long messy string. For example: Some lyrics here\r\nthis should already be the next line\r\nand so on.

I searched the internet for codes to replace the '\r\n' commands with new lines and tried the following:

    import urllib.request
    lyrics=urllib.request.urlopen("http://hereIsMyUrl/lyrics.txt")
    text=lyrics.read()
    text=text.replace("\r\n","\n")
    print(text)

I hoped it would atleast replace something, but instead it gave me a runtime-error:

    TypeError: expected bytes, bytearray or buffer compatible object

I searched the internet about that error, but I didn't find anything connected to opening files from the internet.

I have been stuck at this point for hours and have no idea how to continue. Please help! Thanks in advance!

回答1:

Your example is not working because the data returned by the read statement is a "bytes object". You need to decode it using an appropriate encoding. See also the docs for request.urlopen, file.read and byte array operations.

A complete working example is given below:

#!/usr/bin/env python3

import urllib.request

# Example URL
url="http://ntl.matrix.com.br/pfilho/oldies_list/top/lyrics/black_or_white.txt"

# Open URL: returns file-like object
lyrics=urllib.request.urlopen(url)

# Read raw data, this will return a "bytes object"
text=lyrics.read()

# Print raw data
print(text)

# Print decoded data:
print(text.decode('utf-8'))


回答2:

In Python 3, bytes are treated differently from text strings. After the line

text=lyrics.read()

If you try this

print(type(text))

It returns

<class 'bytes'>

So it is not a string, it's a list of bytes.

When you're calling text=text.replace("\r\n","\n") you're passing it strings, which is the reason for the error message. So you have two options.

  1. Convert variable "text" from bytes to text by adding this line after the text=lyrics.read() line.

    text = text.decode("utf-8")
    
  2. Change the replace call to use bytes instead of strings

        text=text.replace(b"\r\n",b"\n")
    

I recommend option 1 just in case you have more string manipulation to do on the text.



回答3:

The following works for me in Python 3.2:

import urllib.request
lyrics=urllib.request.urlopen("http://google.com/")
text=str(lyrics.read())
text=text.replace("\r\n","\n")
print(text)

Key difference was that lyrics.read() was returning a bytes object, rather than a string, which the replace() did not know how to handle. Wrapping this in str() before performing the replace works.