I understand how to obtain the text from a specific div
or span
style from this question: How to find the most common span styles
Now the difficulty is trying to find all the span styles with font sizes larger than the most common one?
I suspect I should use regular expressions, but first I need to extract the specific most common font size?
Also, how do you determine "larger than" when the condition is a string?
This may help you:-
To find all the span styles with font sizes larger than the most common span style using BeautifulSoup, you need to parse each CSS style that has been returned.
Parsing CSS is better done using a library such as
cssutils
. This would then let you access thefontSize
attribute directly.This would have a value such as
12px
which does not naturally sort correctly. To get around this, you could use a library such asnatsort
.So, first parse each of the styles into css objects. At the same time keep a list of all the soup for each span, along with the parsed CSS for the style.
Now use the
fontSize
attribute as the key for sorting with natsort. This would give you a correctly sorted list of styles according to their font size, largest first (by usingreverse=True
).takewhile()
is then used to create a list of all entries in the list up to the point where the size matches the most common one resulting in a list of entries larger than the most common one.In the example shown, the most commonly used font size is
12px
, so there are 3 other entries larger than this as follows:To install you will probably need:
Note, this does assume the font sizes used are consistent on your website, it is not able to compare different font metrics, only the numerical value.