I just started using Python, I am trying to make a program that writes the lyrics of a song on the screen opened from the internet "www....../lyrics.txt".
My first code:
import urllib.request
lyrics=urllib.request.urlopen("http://hereIsMyUrl/lyrics.txt")
text=lyrics.read()
print(text)
When I activated this code, it didn't give me the lyrics as they are written on the website, it gave me new line commands '\r\n' at all the places that should have been new lines and gave me all the lyrics in a long messy string. For example:
Some lyrics here\r\nthis should already be the next line\r\nand so on.
I searched the internet for codes to replace the '\r\n' commands with new lines and tried the following:
import urllib.request
lyrics=urllib.request.urlopen("http://hereIsMyUrl/lyrics.txt")
text=lyrics.read()
text=text.replace("\r\n","\n")
print(text)
I hoped it would atleast replace something, but instead it gave me a runtime-error:
TypeError: expected bytes, bytearray or buffer compatible object
I searched the internet about that error, but I didn't find anything connected to opening files from the internet.
I have been stuck at this point for hours and have no idea how to continue.
Please help!
Thanks in advance!
Your example is not working because the data returned by the read
statement is a "bytes object". You need to decode it using an appropriate encoding. See also the docs for request.urlopen
, file.read
and byte array operations.
A complete working example is given below:
#!/usr/bin/env python3
import urllib.request
# Example URL
url="http://ntl.matrix.com.br/pfilho/oldies_list/top/lyrics/black_or_white.txt"
# Open URL: returns file-like object
lyrics=urllib.request.urlopen(url)
# Read raw data, this will return a "bytes object"
text=lyrics.read()
# Print raw data
print(text)
# Print decoded data:
print(text.decode('utf-8'))
In Python 3, bytes are treated differently from text strings. After the line
text=lyrics.read()
If you try this
print(type(text))
It returns
<class 'bytes'>
So it is not a string, it's a list of bytes.
When you're calling text=text.replace("\r\n","\n")
you're passing it strings, which is the reason for the error message. So you have two options.
Convert variable "text" from bytes to text by adding this line after the
text=lyrics.read()
line.
text = text.decode("utf-8")
Change the replace
call to use bytes instead of strings
text=text.replace(b"\r\n",b"\n")
I recommend option 1 just in case you have more string manipulation to do on the text.
The following works for me in Python 3.2:
import urllib.request
lyrics=urllib.request.urlopen("http://google.com/")
text=str(lyrics.read())
text=text.replace("\r\n","\n")
print(text)
Key difference was that lyrics.read() was returning a bytes object, rather than a string, which the replace() did not know how to handle. Wrapping this in str() before performing the replace works.