So, what I'm trying to do is to classify between exoplanets and non exoplanets using the kepler data obtained here. The data type is a time series with the dimension of (num_of_samples,3197). I figured out that this can be done by using 1D Convolutional Layer in Keras. But I keep messing the dimensions and get the following error
Error when checking model input: expected conv1d_1_input to have shape (None, 3197, 1) but got array with shape (1, 570, 3197)
So, the questions are:
1.Does the data (training_set and test_set) need to be converted into 3D tensor? If yes, what is the correct dimension?
2.What is the correct input shape? I know that I have 3197 timesteps for 1 feature but the documentation does not specify whether they use TF or theano backend so I'm still getting headaches.
By the way, I'm using TF backend. Any kind help is greatly appreciated! Thanks!
"""
Created on Wed May 17 18:23:31 2017
@author: Amajid Sinar
"""
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use("ggplot")
import numpy as np
#Importing training set
training_set = pd.read_csv("exoTrain.csv")
X_train = training_set.iloc[:,1:].values
y_train = training_set.iloc[:,0:1].values
#Importing test set
test_set = pd.read_csv("exoTest.csv")
X_test = test_set.iloc[:,1:].values
y_test = test_set.iloc[:,0:1].values
#Scale the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)
#Convert data into 3d tensor
X_train = np.reshape(X_train,(1,X_train.shape[0],X_train.shape[1]))
X_test = np.reshape(X_test,(1,X_test.shape[0],X_test.shape[1]))
#Importing convolutional layers
from keras.models import Sequential
from keras.layers import Convolution1D
from keras.layers import MaxPooling1D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers.normalization import BatchNormalization
#Convolution steps
#1.Convolution
#2.Max Pooling
#3.Flattening
#4.Full Connection
#Initialising the CNN
classifier = Sequential()
#Input shape must be explicitly defined, DO NOT USE (None,shape)!!!
#1.Multiple convolution and max pooling
classifier.add(Convolution1D(filters=8, kernel_size=11, activation="relu", input_shape=(3197,1)))
classifier.add(MaxPooling1D(strides=4))
classifier.add(BatchNormalization())
classifier.add(Convolution1D(filters=16, kernel_size=11, activation='relu'))
classifier.add(MaxPooling1D(strides=4))
classifier.add(BatchNormalization())
classifier.add(Convolution1D(filters=32, kernel_size=11, activation='relu'))
classifier.add(MaxPooling1D(strides=4))
classifier.add(BatchNormalization())
#classifier.add(Convolution1D(filters=64, kernel_size=11, activation='relu'))
#classifier.add(MaxPooling1D(strides=4))
#2.Flattening
classifier.add(Flatten())
#3.Full Connection
classifier.add(Dropout(0.5))
classifier.add(Dense(64, activation='relu'))
classifier.add(Dropout(0.25))
classifier.add(Dense(64, activation='relu'))
classifier.add(Dense(1, activation='sigmoid'))
#Configure the learning process
classifier.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
#Train!
classifier.fit_generator(X_train, steps_per_epoch=X_train.shape[0], epochs=1, validation_data=(X_test,y_test))
score = classifier.evaluate(X_test, y_test)