CSV File Into SkFlow

2019-09-05 15:12发布

I'm just starting out with Tensorflow. As I understand it, SkFlow is a...

Simplified interface for TensorFlow

And for me simple is good.

TensorFlow's Github has some useful starter examples using the Iris dataset included in SkFlow. This is from the first example, the Linear Classifier.

iris = datasets.load_iris()
feature_columns = learn.infer_real_valued_columns_from_input(iris.data)

This iris object has the type <class 'sklearn.datasets.base.Bunch'> and is a dict like structure containing two lists and the data and the targets.

This link shows how to load data from a CSV (or at least a URL). At the top of the page it shows how to load via the method above, and then via the URL, like so

# Load the Pima Indians diabetes dataset from CSV URL
import numpy as np
import urllib
# URL REMOVED - SO DOES NOT LIKE SHORTENED URL
# URL for the Pima Indians Diabetes dataset
raw_data = urllib.urlopen(url)
# load the CSV file as a numpy matrix
dataset = np.loadtxt(raw_data, delimiter=",")
print(dataset.shape)
# separate the data from the target attributes
X = dataset[:,0:7]
y = dataset[:,8]

I get that X is the data, and y is the target. But that's not the structure of the data in the github example, or in the first example of the guide.

Am I meant to turn the CSV data into a single object as in

    iris = datasets.load_iris()

Or do I work with the X and y outputs? And if so, how do I do that with the Linear Classifier example on Github

1条回答
叼着烟拽天下
2楼-- · 2019-09-05 15:32

I was working on the same tutorial. I used scikit learn's cross_validation method to break the scikit Bunch object into train/test splits. Then just use those in the classifier.fit and classifier.evaluate methods.

from sklearn import cross_validation
import tensorflow as tf
import numpy as np
from sklearn import datasets

# load from scikit learn
iris = datasets.load_iris()
# break into train/test splits
x_train, x_test, y_train, y_test = cross_validation.train_test_split(
             iris.data, iris.target, test_size=0.2, random_state=42)

# commented out the previous loading code
'''
# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"
# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING,
    target_dtype=np.int,
    features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TEST,
    target_dtype=np.int,
    features_dtype=np.float32)
'''
# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]

# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
                                        hidden_units=[10, 20, 10],
                                        n_classes=3,
                                        model_dir="./tmp/iris_model")

# Fit model. Add your train data here
classifier.fit(x=x_train,y=y_train,steps=2000)

# Evaluate accuracy. Add your test data here
accuracy_score = classifier.evaluate(x=x_test,y=y_test)["accuracy"]
print('Accuracy: {0:f}'.format(accuracy_score))

# Classify two new flower samples.
new_samples = np.array(
    [[6.4, 3.2, 4.5, 1.5], [5.8, 3.1, 5.0, 1.7]], dtype=float)
y = list(classifier.predict(new_samples, as_iterable=True))
print('Predictions: {}'.format(str(y)))
查看更多
登录 后发表回答