Use first row as column names? Pandas read_html

2019-04-23 22:49发布

I have this simple one line script:

from pandas import read_html

print read_html('http://money.cnn.com/data/hotstocks/', flavor = 'bs4')

Which works, fine, but the column names are missing, they are being identified as 1, 2, 3. Is there an easy way to tell pandas to use the first row as the column names? I know I could just store the names as a list and set them, and then skip the first row, but am wondering if there is an easier/better way.

Currently it prints:

                           0       1       2         3
0                    Company   Price  Change  % Change
1             AAPL Apple Inc  115.31   +6.17    +5.65%
2   BAC Bank of America Corp   15.20   -0.43    -2.75%
3            YHOO Yahoo! Inc   46.46   -1.53    -3.19%
4        MSFT Microsoft Corp   41.19   -1.47    -3.45%
5            FB Facebook Inc   76.24   +0.46    +0.61%
6     GE General Electric Co   23.84   -0.54    -2.21%
7                 T AT&T Inc   32.68   -0.13    -0.40%
8            F Ford Motor Co   14.46   -0.24    -1.63%
9            INTC Intel Corp   33.78   -0.41    -1.20%
10    CSCO Cisco Systems Inc   26.80   -0.09    -0.35%

1条回答
Melony?
2楼-- · 2019-04-23 23:44

'read_html` takes a header parameter. You can pass a row index:

read_html('http://money.cnn.com/data/hotstocks/', header =0, flavor = 'bs4')

Worth noting this caveat in the docs:

For example, you might need to manually assign column names if the column names are converted to NaN when you pass the header=0 argument

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html

查看更多
登录 后发表回答