I am trying to replicate Chevalier's LSTM Human Activity Recognition algorithm and came across a problem when I was trying to implement my own data in a CSV format. The format used in the git was txt. My CSV data is of the following format:
0.000995,8
0.020801,8
0.040977,8
0.060786,8
0.080970,8
... ...
The original file can be found here. The x-values (time) are in column 0 (-80.060003, etc.) and the y-values (value) are in column 1 (8, 8, etc.). I tried to use pandas
pandas.read_csv(DATASET_PATH + TRAIN + "data_train.csv", skiprows=1, header=None, sep=',', usecols=[0, 1])
but it does not seem to be compatible with the format of the data in the "Prepare Dataset" section (and possibly others as well):
TRAIN = "train/"
TEST = "test/"
# Load "X" (the neural network's training and testing inputs)
def load_X(X_signals_paths):
X_signals = []
for signal_type_path in X_signals_paths:
file = open(signal_type_path, 'r')
# Read dataset from disk, dealing with text files' syntax
X_signals.append(
[np.array(serie, dtype=np.float32) for serie in [
row.replace(' ', ' ').strip().split(' ') for row in file
]]
)
file.close()
return np.transpose(np.array(X_signals), (1, 2, 0))
X_train_signals_paths = [
DATASET_PATH + TRAIN + "Inertial Signals/" + signal + "train.txt" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
DATASET_PATH + TEST + "Inertial Signals/" + signal + "test.txt" for signal in INPUT_SIGNAL_TYPES
]
X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
# Load "y" (the neural network's training and testing outputs)
def load_y(y_path):
file = open(y_path, 'r')
# Read dataset from disk, dealing with text file's syntax
y_ = np.array(
[elem for elem in [
row.replace(' ', ' ').strip().split(' ') for row in file
]],
dtype=np.int32
)
file.close()
# Substract 1 to each output class for friendly 0-based indexing
return y_ - 1
y_train_path = DATASET_PATH + TRAIN + "y_train.txt"
y_test_path = DATASET_PATH + TEST + "y_test.txt"
y_train = load_y(y_train_path)
y_test = load_y(y_test_path)
This was what is happening with my implementation via iPython3:
In[0]:
TRAIN = "train/"
TEST = "test/"
def load_X(X_signals_paths):
X_signals = []
for signal_type_path in X_signals_paths:
file = pandas.read_csv(DATASET_PATH + TRAIN + "data_train.csv", skiprows=1, header=None, sep=',', usecols=[0])
X_signals.append(
[np.array(serie, dtype=np.float32) for serie in [
str(row).replace(' ', ' ').strip().split(' ') for row in file
]]
)
return np.transpose(np.array(X_signals), (1, 2, 0))
_train_signals_paths = [
DATASET_PATH + TRAIN + signal + "train.csv" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
DATASET_PATH + TEST + signal + "test.csv" for signal in INPUT_SIGNAL_TYPES
]
X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
print(X_train, X_test)
Out[0]:
[[[ 0.]]] [[[ 0.]]]
I hope that I could receive some help with properly formatting my data to work seamlessly with this algorithm. If there are any questions please let me know.