I want to extract "SNG_TITLE" and "ART_NAME" values from the code in "script" tag using BeautifulSoup in Python. (the whole script is too long to paste)
<script>window.__DZR_APP_STATE__ = {"TAB":{"loved":{"data":[{"SNG_ID":"126884459","PRODUCT_TRACK_ID":"360276641","UPLOAD_ID":0,"SNG_TITLE":"Heathens","ART_ID":"647650","PROVIDER_ID":"3","ART_NAME":"Twenty One Pilots","ARTISTS":[{"ART_ID":"647650","ROLE_ID":"0","ARTISTS_SONGS_ORDER":"1","ART_NAME":"Twenty One Pilots","ART_PICTURE":"259dcf52853363d79753ec301377645d","SMARTRADIO":"1","RANK":"487762","LOCALES":[],"__TYPE__":"artist"}],"ALB_ID":"13371165","ALB_TITLE":"Heathens","TYPE":0,"MD5_ORIGIN":"5cea723b83af1ff0a62d65d334b978d4","VIDEO":false,"DURATION":"195","ALB_PICTURE":"3dfc8c9e406cf1bba8ce0695a44a9b7e","ART_PICTURE":"259dcf52853363d79753ec301377645d","RANK_SNG":"967143","SMARTRADIO":"1","FILESIZE_AAC_64":0,"FILESIZE_MP3_64":"0","FILESIZE_MP3_128":"3135946","FILESIZE_MP3_256":0,"FILESIZE_MP3_320":"7839868","FILESIZE_FLAC":"21777150","FILESIZE":"3135946","GAIN":"-12","MEDIA_VERSION":"4","DISK_NUMBER":"1","TRACK_NUMBER":"1","VERSION":"","EXPLICIT_LYRICS":"0","RIGHTS":{"STREAM_ADS_AVAILABLE":true,"STREAM_ADS":"2000-01-01","STREAM_SUB_AVAILABLE":true,"STREAM_SUB":"2000-01-01"},"ISRC":"USAT21601930","DATE_ADD":1497886149,"HIERARCHICAL_TITLE":"","SNG_CONTRIBUTORS":{"mainartist":["Twenty One Pilots"],"engineer":["Adam Hawkins"],"mixer":["Adam Hawkins"],"masterer":["Chris Gehringer"],"drums":["Josh Dun"],"producer":["Mike Elizondo","Tyler Joseph"],"programmer":["Mike Elizondo","Tyler Joseph"],"vocals":["Tyler Joseph"],"writer":["Tyler Joseph"]},"LYRICS_ID":30553991,"__TYPE__":"song"},{"SNG_ID":"99976952","PRODUCT_TRACK_ID":"171067651","UPLOAD_ID":0,"SNG_TITLE":"Stressed Out","ART_ID":"647650","PROVIDER_ID":"3","ART_NAME":"Twenty One Pilots","ARTISTS":[{"ART_ID":"647650","ROLE_ID":"0","ARTISTS_SONGS_ORDER":"1","ART_NAME":"Twenty One Pilots", ...</script>
The idea of the code is to print out the user name, all song and artist names that can be found on the given page.
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.deezer.com/en/profile/1589856782/loved'
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
user_name = soup.find(class_='user-name')
print(user_name.text)
This prints the user name.
for script in soup.find_all('script'):
print(script.contents)
If I understand correctly, the script I need is a dictionary, so I just need to find it and get its contents. The problem is I don't know how to specifically find exactly this "script". It doesn't have any attributes or anything that makes it unique. So I tried a loop that finds all scripts on the page and prints out their contents, but not sure how to proceed further.
How do I find only this particular "script" on the page? Can I access the values in a different way?
If my understanding is correct, you want only the script element with "SNG_TITLE" in it.
You can use
re
and get only the script element with the fields of your interest as follows:EDIT:
@furas answer is the complete solution using
json
to find the 'SNG_TITLE' and 'ART_TITLE'. My answer help you find only the script with 'SNG_TITLE'. You can combine both to get better code.Scripts don't change places in code so you can count them and use index to get correct script.
Script is normal string so you can also use standard string functions ie.
Code with both methods - I use
[:100]
to display only part of string.Result:
EDIT: When you have correct script then you can use slicing to get only
JSON
string and use modulejson
to convert it to python dictionary and then tou can get dataResult: