I have data in different columns but I don't know how to extract it to save it in another variable.

index  a   b   c
1      2   3   4
2      3   4   5

How do I select 'a', 'b' and save it in to df1?

I tried

df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']

None seem to work.

标签： python pandas dataframe select

14条回答

唯独是你

2楼-- · 2018-12-31 15:56

just use: it will select b and c column.

df1=pd.DataFrame()
df1=df[['b','c']]

then u can just call df1:

df1

0人赞添加讨论(0) 举报

牵手、夕阳

3楼-- · 2018-12-31 15:57

I realize this question is quite old, but in the latest version of pandas there is an easy way to do exactly this. Column names (which are strings) can be sliced in whatever manner you like.

columns = ['b', 'c']
df1 = pd.DataFrame(df, columns=columns)

0人赞添加讨论(0) 举报

君临天下

4楼-- · 2018-12-31 15:58

The different approaches discussed in above responses are based on the assumption that either the user knows column indices to drop or subset on, or the user wishes to subset a dataframe using a range of columns (for instance between 'C' : 'E'). pandas.DataFrame.drop() is certainly an option to subset data based on a list of columns defined by user (though you have to be cautious that you always use copy of dataframe and inplace parameters should not be set to True!!)

Another option is to use pandas.columns.difference(), which does a set difference on column names, and returns an index type of array containing desired columns. Following is the solution:

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=['a','b','c'],index=[1,2])
columns_for_differencing = ['a']
df1 = df.copy()[df.columns.difference(columns_for_differencing)]
print(df1)

The output would be:b c 1 3 4 2 4 5

0人赞添加讨论(0) 举报

孤独总比滥情好

5楼-- · 2018-12-31 16:00

The column names (which are strings) cannot be sliced in the manner you tried.

Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []'s).

df1 = df[['a','b']]

Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:

df1 = df.iloc[:,0:2] # Remember that Python does not slice inclusive of the ending index.

Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).

Sometimes, however, there are indexing conventions in Pandas that don't do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the copy() function to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.

df1 = df.iloc[0,0:2].copy() # To avoid the case where changing df1 also changes df

0人赞添加讨论(0) 举报

浅入江南

6楼-- · 2018-12-31 16:01

With pandas,

wit column names

dataframe[['column1','column2']]

with iloc, column index can be used like

dataframe[:,[1,2]]

with loc column names can be used like

dataframe[:,['column1','column2']]

hope it helps !

0人赞添加讨论(0) 举报

路过你的时光

7楼-- · 2018-12-31 16:01

I am quite sure that this is not an optimized approach but can be considered as a different one.

using iterows

`df1= pd.DataFrame() #creating an empty dataframe
 for index,i in df.iterrows():
 df1.loc[index,'A']=df.loc[index,'A']
 df1.loc[index,'B']=df.loc[index,'B']
 df1.head()

0人赞添加讨论(0) 举报

Selecting multiple columns in a pandas dataframe

using iterows

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间