I have data in different columns but I don't know how to extract it to save it in another variable.
index a b c
1 2 3 4
2 3 4 5
How do I select 'a'
, 'b'
and save it in to df1?
I tried
df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']
None seem to work.
Below is my code:
Output:
First dataframe is the master one. I just copied two columns into df1.
I found this method to be very useful:
More details can be found here
As of version 0.11.0, columns can be sliced in the manner you tried using the
.loc
indexer:returns columns
C
throughE
.A demo on a randomly generated DataFrame:
To get the columns from C to E (note that unlike integer slicing, 'E' is included in the columns):
Same works for selecting rows based on labels. Get the rows 'R6' to 'R10' from those columns:
.loc
also accepts a boolean array so you can select the columns whose corresponding entry in the array isTrue
. For example,df.columns.isin(list('BCD'))
returnsarray([False, True, True, True, False, False], dtype=bool)
- True if the column name is in the list['B', 'C', 'D']
; False, otherwise.You could provide a list of columns to be dropped and return back the DataFrame with only the columns needed using the
drop()
function on a Pandas DataFrame.Just saying
would return a DataFrame with just the columns
b
andc
.The
drop
method is documented here.Assuming your column names (
df.columns
) are['index','a','b','c']
, then the data you want is in the 3rd & 4th columns. If you don't know their names when your script runs, you can do thisAs EMS points out in his answer,
df.ix
slices columns a bit more concisely, but the.columns
slicing interface might be more natural because it uses the vanilla 1-D python list indexing/slicing syntax.WARN:
'index'
is a bad name for aDataFrame
column. That same label is also used for the realdf.index
attribute, aIndex
array. So your column is returned bydf['index']
and the real DataFrame index is returned bydf.index
. AnIndex
is a special kind ofSeries
optimized for lookup of it's elements' values. For df.index it's for looking up rows by their label. Thatdf.columns
attribute is also apd.Index
array, for looking up columns by their labels.