我刚开始接触的IPython的笔记本大熊猫遇到以下问题:当一个DataFrame
从一个CSV文件中读取小,IPython的笔记本电脑显示在一个不错的表视图。 当DataFrame
大,像这样的输出中:
In [27]:
evaluation = readCSV("evaluation_MO_without_VNS_quality.csv").filter(["solver", "instance", "runtime", "objective"])
In [37]:
evaluation
Out[37]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 333 entries, 0 to 332
Data columns:
solver 333 non-null values
instance 333 non-null values
runtime 333 non-null values
objective 333 non-null values
dtypes: int64(1), object(3)
我想看到的数据帧作为表的一小部分只是为了确保它在正确的格式。 我有什么选择?
Answer 1:
在这种情况下,当DataFrame
很长,但不能太宽,你可以简单地切它:
>>> df = pd.DataFrame({"A": range(1000), "B": range(1000)})
>>> df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns:
A 1000 non-null values
B 1000 non-null values
dtypes: int64(2)
>>> df[:5]
A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
九,已被弃用。
如果它是既宽又长,我倾向于使用.ix
:
>>> df = pd.DataFrame({i: range(1000) for i in range(100)})
>>> df.ix[:5, :10]
0 1 2 3 4 5 6 7 8 9 10
0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5
Answer 2:
# Say you have a df object containing your dataframe
df.head(5) # will print out the first 5 rows
df.tail(5) # will print out the 5 last rows
# Note: it is similar to R
Answer 3:
我写一个方法来显示数据和猴子补丁的四角,以数据帧可以这样做:
def _sw(df, up_rows=10, down_rows=5, left_cols=4, right_cols=3, return_df=False):
''' display df data at four corners
A,B (up_pt)
C,D (down_pt)
parameters : up_rows=10, down_rows=5, left_cols=4, right_cols=3
usage:
df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10])
df.sw(5,2,3,2)
df1 = df.set_index(['A','B'], drop=True, inplace=False)
df1.sw(5,2,3,2)
'''
#pd.set_printoptions(max_columns = 80, max_rows = 40)
ncol, nrow = len(df.columns), len(df)
# handle columns
if ncol <= (left_cols + right_cols) :
up_pt = df.ix[0:up_rows, :] # screen width can contain all columns
down_pt = df.ix[-down_rows:, :]
else: # screen width can not contain all columns
pt_a = df.ix[0:up_rows, 0:left_cols]
pt_b = df.ix[0:up_rows, -right_cols:]
pt_c = df[-down_rows:].ix[:,0:left_cols]
pt_d = df[-down_rows:].ix[:,-right_cols:]
up_pt = pt_a.join(pt_b, how='inner')
down_pt = pt_c.join(pt_d, how='inner')
up_pt.insert(left_cols, '..', '..')
down_pt.insert(left_cols, '..', '..')
overlap_qty = len(up_pt) + len(down_pt) - len(df)
down_pt = down_pt.drop(down_pt.index[range(overlap_qty)]) # remove overlap rows
dt_str_list = down_pt.to_string().split('\n') # transfer down_pt to string list
# Display up part data
print up_pt
start_row = (1 if df.index.names[0] is None else 2) # start from 1 if without index
# Display omit line if screen height is not enought to display all rows
if overlap_qty < 0:
print "." * len(dt_str_list[start_row])
# Display down part data row by row
for line in dt_str_list[start_row:]:
print line
# Display foot note
print "\n"
print "Index :",df.index.names
print "Column:",",".join(list(df.columns.values))
print "row: %d col: %d"%(len(df), len(df.columns))
print "\n"
return (df if return_df else None)
DataFrame.sw = _sw #add a method to DataFrame class
下面是示例:
>>> df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10])
>>> df.sw()
A B C D .. H I J
0 -0.8166 0.0102 0.0215 -0.0307 .. -0.0820 1.2727 0.6395
1 1.0659 -1.0102 -1.3960 0.4700 .. 1.0999 1.1222 -1.2476
2 0.4347 1.5423 0.5710 -0.5439 .. 0.2491 -0.0725 2.0645
3 -1.5952 -1.4959 2.2697 -1.1004 .. -1.9614 0.6488 -0.6190
4 -1.4426 -0.8622 0.0942 -0.1977 .. -0.7802 -1.1774 1.9682
5 1.2526 -0.2694 0.4841 -0.7568 .. 0.2481 0.3608 -0.7342
6 0.2108 2.5181 1.3631 0.4375 .. -0.1266 1.0572 0.3654
7 -1.0617 -0.4743 -1.7399 -1.4123 .. -1.0398 -1.4703 -0.9466
8 -0.5682 -1.3323 -0.6992 1.7737 .. 0.6152 0.9269 2.1854
9 0.2361 0.4873 -1.1278 -0.2251 .. 1.4232 2.1212 2.9180
10 2.0034 0.5454 -2.6337 0.1556 .. 0.0016 -1.6128 -0.8093
..............................................................
15 1.4091 0.3540 -1.3498 -1.0490 .. 0.9328 0.3668 1.3948
16 0.4528 -0.3183 0.4308 -0.1818 .. 0.1295 1.2268 0.1365
17 -0.7093 1.3991 0.9501 2.1227 .. -1.5296 1.1908 0.0318
18 1.7101 0.5962 0.8948 1.5606 .. -0.6862 0.9558 -0.5514
19 1.0329 -1.2308 -0.6896 -0.5112 .. 0.2719 1.1478 -0.1459
Index : [None]
Column: A,B,C,D,E,F,G,H,I,J
row: 20 col: 10
>>> df.sw(4,2,3,4)
A B C .. G H I J
0 -0.8166 0.0102 0.0215 .. 0.3671 -0.0820 1.2727 0.6395
1 1.0659 -1.0102 -1.3960 .. 1.0984 1.0999 1.1222 -1.2476
2 0.4347 1.5423 0.5710 .. 1.6675 0.2491 -0.0725 2.0645
3 -1.5952 -1.4959 2.2697 .. 0.4856 -1.9614 0.6488 -0.6190
4 -1.4426 -0.8622 0.0942 .. -0.0947 -0.7802 -1.1774 1.9682
..............................................................
18 1.7101 0.5962 0.8948 .. -0.8592 -0.6862 0.9558 -0.5514
19 1.0329 -1.2308 -0.6896 .. -0.3954 0.2719 1.1478 -0.1459
Index : [None]
Column: A,B,C,D,E,F,G,H,I,J
row: 20 col: 10
Answer 4:
这里有一个快速的方式来预览一个大表,而无需运行它太宽:
显示功能:
# display large dataframes in an html iframe
def ldf_display(df, lines=500):
txt = ("<iframe " +
"srcdoc='" + df.head(lines).to_html() + "' " +
"width=1000 height=500>" +
"</iframe>")
return IPython.display.HTML(txt)
现在只需在任一单元格运行以下命令:
ldf_display(large_dataframe)
这将转换成数据帧为html然后在iframe中显示。 其优点是,你可以控制输出的大小,并有方便的滚动条。
工作对于我而言,也许这将帮助别人。
Answer 5:
要查看数据框的前n行:
df.head(n) # (n=5 by default)
要看到最后n行:
df.tail(n)
Answer 6:
你可以只用nrows
。 例如
pd.read_csv('data.csv',nrows=6)
将展示从第一6行data.csv
。
Answer 7:
更新一个生成字符串,而不是和适应Pandas0.13 +
def _sw2(df, up_rows=5, down_rows=3, left_cols=4, right_cols=2, return_df=False):
""" return df data display string at four corners
A,B (up_pt)
C,D (down_pt)
parameters : up_rows=10, down_rows=5, left_cols=4, right_cols=3
usage:
df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10])
df.sw(5,2,3,2)
df1 = df.set_index(['A','B'], drop=True, inplace=False)
df1.sw(5,2,3,2)
"""
#pd.set_printoptions(max_columns = 80, max_rows = 40)
nrow, ncol = df.shape #ncol, nrow = len(df.columns), len(df)
# handle columns
if ncol <= (left_cols + right_cols) :
up_pt = df.ix[0:up_rows, :] # screen width can contain all columns
down_pt = df.ix[-down_rows:, :]
else: # screen width can not contain all columns
pt_a = df.ix[0:up_rows, 0:left_cols]
pt_b = df.ix[0:up_rows, -right_cols:]
pt_c = df[-down_rows:].ix[:,0:left_cols]
pt_d = df[-down_rows:].ix[:,-right_cols:]
up_pt = pt_a.join(pt_b, how='inner')
down_pt = pt_c.join(pt_d, how='inner')
up_pt.insert(left_cols, '..', '..')
down_pt.insert(left_cols, '..', '..')
overlap_qty = len(up_pt) + len(down_pt) - len(df)
down_pt = down_pt.drop(down_pt.index[range(overlap_qty)]) # remove overlap rows
dt_str_list = down_pt.to_string().split('\n') # transfer down_pt to string list
# Display up part data
ds = up_pt.__str__()
#get rid of ending part of Pandas0.13+ display string by finding the last 3 '\n', ugly though
Display_str = ds[0:ds[0:ds[0:ds.rfind('\n')].rfind('\n')].rfind('\n')] #refer to http://stackoverflow.com/questions/4664850/find-all-occurrences-of-a-substring-in-python
start_row = (1 if df.index.names[0] is None else 2) # start from 1 if without index
# Display omit line if screen height is not enought to display all rows
if overlap_qty < 0:
Display_str += "\n"
Display_str += "." * len(dt_str_list[start_row])
Display_str += "\n"
# Display down part data row by row
for line in dt_str_list[start_row:]:
Display_str += "\n"
Display_str += line
# Display foot note
Display_str += "\n\n"
Display_str += "Index : %s\n"%str(df.index.names)
col_name_list = list(df.columns.values)
if ncol < 10:
col_name_str = ", ".join(col_name_list)
else:
col_name_str = ", ".join(col_name_list[0:7]) + ' ... ' + ", ".join(col_name_list[-2:])
Display_str = Display_str + "Column: " + col_name_str + "\n"
Display_str = Display_str + "row: %d col: %d"%(nrow, ncol) + " "
dty_dict={} #simulate defaultdict
for k,g in itertools.groupby(list(df.dtypes.values)): #http://stackoverflow.com/questions/13565248/grouping-the-same-recurring-items-that-occur-in-a-row-from-list/13565414#13565414
try:
dty_dict[k] = dty_dict[k] + len(list(g))
except:
dty_dict[k] = len(list(g))
for key in dty_dict:
Display_str += "{0}: {1} ".format(key, dty_dict[key])
Display_str += "\n\n"
return (df if return_df else Display_str)
Answer 8:
为了查看其作为唯一的前几个条目就可以使用,熊猫头功能
dataframe.head(any number) // default is 5
dataframe.head(n=value)
或者你也可以切片,你为这个目的,还可以给出相同的结果,
dataframe[:n]
为了以类似的方式,查看您可以使用熊猫尾巴()的最后几个条目,
dataframe.tail(any number) // default is 5
dataframe.tail(n=value)
Answer 9:
在Python大熊猫提供头()和尾部()来分别打印头和尾的数据。
import pandas as pd
train = pd.read_csv('file_name')
train.head() # it will print 5 head row data as default value is 5
train.head(n) # it will print n head row data
train.tail() #it will print 5 tail row data as default value is 5
train.tail(n) #it will print n tail row data
Answer 10:
这条线将让你看到所有行(最多在设置为“MAX_ROWS”的数量)没有任何行由点(“...”)在打印输出的头部和尾部之间,通常会出现隐藏。
pd.options.display.max_rows = 500
Answer 11:
我发现下面的方法是最有效的采样数据帧:
print(df[A:B]) ## 'A' and 'B' are the first and last records in range
例如, print(df[10:15])
将打印行10到15 -包-从数据集。
文章来源: How to preview a part of a large pandas DataFrame, in iPython notebook?