pandas read_html - no tables found

2020-08-01 06:12发布

问题:

I am attempting to see if I can read a table of data from WU.com, but I am getting a type error for no tables found. (first timer on web scrapping too here) There is also another person with a very similar stackoverflow question here with WU table of data, but the solution is a little bit complicated to me.

import pandas as pd

df_list = pd.read_html('https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26')

print(df_list)

On the webpage of historical data for Milwaukee, this is the table of data (daily observations) that I am attempting to retrieve into Pandas:

Any tips help, thank you.

回答1:

the page is dynamic which means you'll need to to render the page first. So you would need to use something like Selenium to render the page, then you can pull the table using pandas .read_html():

from selenium import webdriver
import pandas as pd


driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get("https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26")

html = driver.page_source

tables = pd.read_html(html)
data = tables[1]

driver.close()

Output:

print (data)
        Time Temperature      ...       Precip Accum      Condition
0    6:52 PM        68 F      ...             0.0 in  Mostly Cloudy
1    7:52 PM        69 F      ...             0.0 in  Mostly Cloudy
2    8:52 PM        70 F      ...             0.0 in  Mostly Cloudy
3    9:52 PM        67 F      ...             0.0 in         Cloudy
4   10:52 PM        65 F      ...             0.0 in  Partly Cloudy
5   11:42 PM        66 F      ...             0.0 in  Mostly Cloudy
6   11:52 PM        68 F      ...             0.0 in  Mostly Cloudy
7   12:08 AM        68 F      ...             0.0 in         Cloudy
8   12:52 AM        68 F      ...             0.0 in  Mostly Cloudy
9    1:52 AM        70 F      ...             0.0 in         Cloudy
10   2:13 AM        70 F      ...             0.0 in         Cloudy
11   2:52 AM        71 F      ...             0.0 in         Cloudy
12   3:52 AM        70 F      ...             0.0 in  Mostly Cloudy
13   4:19 AM        70 F      ...             0.0 in         Cloudy
14   4:29 AM        70 F      ...             0.0 in         Cloudy
15   4:52 AM        70 F      ...             0.0 in         Cloudy
16   5:25 AM        70 F      ...             0.0 in  Mostly Cloudy
17   5:52 AM        71 F      ...             0.0 in         Cloudy
18   6:52 AM        73 F      ...             0.0 in         Cloudy
19   7:52 AM        74 F      ...             0.0 in         Cloudy
20   8:52 AM        73 F      ...             0.0 in         Cloudy
21   9:52 AM        71 F      ...             0.0 in         Cloudy
22  10:52 AM        71 F      ...             0.0 in         Cloudy
23  11:52 AM        70 F      ...             0.0 in         Cloudy
24  12:52 PM        72 F      ...             0.0 in  Mostly Cloudy
25   1:52 PM        70 F      ...             0.0 in  Mostly Cloudy
26   2:52 PM        71 F      ...             0.0 in  Mostly Cloudy
27   3:52 PM        71 F      ...             0.0 in  Partly Cloudy
28   4:52 PM        68 F      ...             0.0 in  Mostly Cloudy
29   5:52 PM        66 F      ...             0.0 in  Mostly Cloudy

[30 rows x 11 columns]


回答2:

also check if you have the file name correct if you want to access a non existing file you will get the same error "No tables found" I made my mistake with X.htm and was looking at X.html