I have a dataset with multiple columns and I am only interested in analyzing the data from six of the columns. It is in a txt file, and I want to load the file and pull out the following columns (0, 1, 2, 4, 6, 7) with the headings (time, mode, event, xcoord, ycoord, phi). There are ten columns total, Here is an example of what the data looks like:
1385940076332 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076336 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076339 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076342 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076346 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076350 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076353 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076356 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
When I use the following code to parse the data into columns, it only appears to count the data- but I would like to be able to list the data for further analysis. Here is the code I am using from @alko:
import pandas as pd
df = pd.read_csv('filtered.txt', header=None, false_values=None, sep='\s+')[[0, 1, 2, 4, 6, 7]]
df.columns = ['time', 'mode', 'event', 'xcoord', 'ycoord', 'phi']
print df
Here is what that code returns:
class 'pandas.core.frame.DataFrame'
Int64Index: 115534 entries, 0 to 115533
Data columns (total 6 columns):
time 115534 non-null values
mode 115534 non-null values
event 115534 non-null values
xcoord 115534 non-null values
ycoord 115534 non-null values
phi 115534 non-null values
dtypes: float64(3), int64(2), object(1)
So the goal is to pull out these 6 columns from the 10 original, label them, and list them.
You can use pandas' read_csv parser:
Note, that I corrected columns indices, as it seems that ones provided by You in the question are not correct.