可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The Scenario

I've read a csv (which is \t seperated) into a Dataframe, which is now needed to be in a numpy array format for clustering without changing type

The Problem

So far as per tried references (below) I've failed to get the output as required. The two column's values I'm trying to fetch are in int64 / float64, as below

         uid   iid       rat
0        196   242  3.000000
1        186   302  3.000000
2         22   377  1.000000

I'm intrested in only iid and rat for the moment, and to pass it to Kmeans.fit() method and that too not with EPSILON in it. I need it in following format

Expected format

[[242, 3.000000],
[302, 3.000000],
[22, 1.000000]]

Unsucessful Attempt

X = values[:, 1:2]
Y = values[:, 2:3]
someArray = np.array([X,Y])
print someArray

and doesn't farewell on execution

[[[  2.42000000e+02]
  [  3.02000000e+02]
  [  3.77000000e+02]
  ..., 
  [  1.35200000e+03]
  [  1.62600000e+03]
  [  1.65900000e+03]]
 [[  3.00000000e+00]
  [  3.00000000e+00]
  [  1.00000000e+00]
  ..., 
  [  1.00000000e+00]
  [  1.00000000e+00]
  [  1.00000000e+00]]]

Unhelped references so far

This one
This two
This three
This four

EDIT 1

tried np_df = np.genfromtxt('AllData.csv', delimiter='\t', unpack=True) and got this

[[             nan   1.96000000e+02   1.86000000e+02 ...,   4.79000000e+02
    4.79000000e+02   4.79000000e+02]
 [             nan   2.42000000e+02   3.02000000e+02 ...,   1.36000000e+03
    1.39400000e+03   1.65200000e+03]
 [             nan   3.00000000e+00   3.00000000e+00 ...,   2.00000000e+00
    1.92803605e+00   1.00000000e+00]]

回答1:

It seems you need read_csv for DataFrame first with filter only second and third column first and then convert to numpy array by values: import pandas as pd from sklearn.cluster import KMeans from pandas.compat import StringIO

temp=u"""col,iid,rat
4,1,0
5,2,4
6,3,3
7,4,1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), usecols = [1,2])
print (df)
   iid  rat
0    1    0
1    2    4
2    3    3
3    4    1

X = df.values 
print (X)
[[1 0]
 [2 4]
 [3 3]
 [4 1]]

kmeans = KMeans(n_clusters=2)
a = kmeans.fit(X)
print (a)
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=2, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

回答2:

Use label-based selection and the .values attribute of the resulting pandas objects, which will be some sort of numpy array:

>>> df
   uid  iid  rat
0  196  242  3.0
1  186  302  3.0
2   22  377  1.0
>>> df.loc[:,['iid','rat']]
   iid  rat
0  242  3.0
1  302  3.0
2  377  1.0
>>> df.loc[:,['iid','rat']].values
array([[ 242.,    3.],
       [ 302.,    3.],
       [ 377.,    1.]])

Note, your integer column will get promoted to float.

Also note, this particular selection could be approached in different ways:

>>> df.iloc[:, 1:] # integer-position based
   iid  rat
0  242  3.0
1  302  3.0
2  377  1.0
>>> df[['iid','rat']] # plain indexing performs column-based selection
   iid  rat
0  242  3.0
1  302  3.0
2  377  1.0

I like label-based because it is more explicit.

Edit

The reason you aren't seeing commas is an artifact of how numpy arrays are printed:

>>> df[['iid','rat']].values
array([[ 242.,    3.],
       [ 302.,    3.],
       [ 377.,    1.]])
>>> print(df[['iid','rat']].values)
[[ 242.    3.]
 [ 302.    3.]
 [ 377.    1.]]

And actually, it is the difference between the str and repr results of the numpy array:

>>> print(repr(df[['iid','rat']].values))
array([[ 242.,    3.],
       [ 302.,    3.],
       [ 377.,    1.]])
>>> print(str(df[['iid','rat']].values))
[[ 242.    3.]
 [ 302.    3.]
 [ 377.    1.]]

回答3:

Why don't you just import the 'csv' as a numpy array?

import numpy as np 
def read_file( fname): 
    return np.genfromtxt( fname, delimiter="/t", comments="%", unpack=True)

Dataframe into numpy array with values comma seper

问题:

The Scenario

The Problem

EDIT 1

回答1:

回答2:

Edit

回答3:

收藏的人(0)

Dataframe into numpy array with values comma seper

问题:

The Scenario

The Problem

EDIT 1

回答1:

回答2:

Edit

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮