I have need to get the average div height and width of an html doc.
I have try this solution but it doesn't work:
import numpy as np
average_width = np.mean([div.attrs['width'] for div in my_doc.get_div() if 'width' in div.attrs])
average_height = np.mean([div.attrs['height'] for div in my_doc.get_div() if 'height' in div.attrs])
print average_height,average_width
the get_div
method return the list of all div retrieved by the find_all
method of beautifulSoup
here is an example :
print my_doc.get_div()[1]
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;">
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of Infection (2015)
</span>
<span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span>
<span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4
<br/>
</span>
</div>
when i get the attributes, it works perfectly
print my_doc.get_div()[1].attrs
{u'style': u'position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;'}
but when i try to get the value
print my_doc.get_div()[1].attrs['width']
I get an error :
KeyError: 'width'
but i don't understand because when i check the type :
print type(my_doc.get_div()[1].attrs)
it's a dictionary , <type 'dict'>
There may be better way-
Way -1
Below is my tested code to extract width and height.
Way-2 Use regular expression as described here.
Working code-