I've made this neural net to figure out whether a house is a good buy or a bad buy. For some reasons the code is not updating weights and biases. My loss stays same. This is my code:
I've made this neural net to figure out whether a house is a good buy or a bad buy. For some reasons the code is not updating weights and biases. My loss stays same. This is my code:
import pandas as pd
import tensorflow as tf
data = pd.read_csv("E:/workspace_py/datasets/good_bad_buy.csv")
features = data.drop(['index', 'good buy'], axis = 1)
lbls = data.drop(['index', 'area', 'bathrooms', 'price', 'sq_price'], axis = 1)
features = features[0:20]
lbls = lbls[0:20]
print(features)
print(lbls)
n_examples = len(lbls)
# Model
# Hyper parameters
epochs = 100
learning_rate = 0.1
batch_size = 1
input_data = tf.placeholder('float', [None, 4])
labels = tf.placeholder('float', [None, 1])
weights = {
'hl1': tf.Variable(tf.random_normal([4, 10])),
'hl2': tf.Variable(tf.random_normal([10, 10])),
'hl3': tf.Variable(tf.random_normal([10, 4])),
'ol': tf.Variable(tf.random_normal([4, 1]))
}
biases = {
'hl1': tf.Variable(tf.random_normal([10])),
'hl2': tf.Variable(tf.random_normal([10])),
'hl3': tf.Variable(tf.random_normal([4])),
'ol': tf.Variable(tf.random_normal([1]))
}
hl1 = tf.nn.relu(tf.add(tf.matmul(input_data, weights['hl1']), biases['hl1']))
hl2 = tf.nn.relu(tf.add(tf.matmul(hl1, weights['hl2']), biases['hl2']))
hl3 = tf.nn.relu(tf.add(tf.matmul(hl2, weights['hl3']), biases['hl3']))
ol = tf.nn.sigmoid(tf.add(tf.matmul(hl3, weights['ol']), biases['ol']))
loss = tf.reduce_mean((labels - ol)**2)
train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
iterations = int(n_examples/batch_size)
for epoch_no in range(epochs):
ptr = 0
for iteration_no in range(iterations):
epoch_input = features[ptr:ptr+batch_size]
epoch_label = lbls[ptr: ptr+batch_size]
ptr = ptr + batch_size
_, err = sess.run([train, loss], feed_dict={input_data: features, labels: lbls})
print("Error at epoch ", epoch_no, ": ", err)
print(sess.run(ol, feed_dict={input_data: [[2104, 3, 399900, 190.0665]]}))
This is the dataset:
Features:
area bathrooms price sq_price
0 2104 3 399900 190.066540
1 1600 3 329900 206.187500
2 2400 3 369000 153.750000
3 1416 2 232000 163.841808
4 3000 4 539900 179.966667
5 1985 4 299900 151.083123
6 1534 3 314900 205.280313
7 1427 3 198999 139.452698
8 1380 3 212000 153.623188
9 1494 3 242500 162.315930
10 1940 4 239999 123.710825
11 2000 3 347000 173.500000
12 1890 3 329999 174.602645
13 4478 5 699900 156.297454
14 1268 3 259900 204.968454
15 2300 4 449900 195.608696
16 1320 2 299900 227.196970
17 1236 3 199900 161.731392
18 2609 4 499998 191.643542
19 3031 4 599000 197.624546
labels:
good buy
0 1.0
1 0.0
2 1.0
3 0.0
4 1.0
5 0.0
6 0.0
7 1.0
8 0.0
9 0.0
10 1.0
11 1.0
12 1.0
13 1.0
14 0.0
15 1.0
16 0.0
17 1.0
18 1.0
19 1.0
Any suggestions on how to fix this? I've tried tf.reduce_sum other than tf.reduce_mean. I've also tried a larger batch_size.
A few things to consider
The below code achieves a training error of 95%. You'd want to test with a held out data set not used for training to evaluate the testing error.
Also, please do not post usage questions like this on github, they belong here on StackOverflow.
I'm not sure if this is the problem for you. But sigmoid functions gradient can get very small if its input is to big, this can make updates very slow.
To check if this is the case for you try initializing all your weights at very small values. You can adjust this by setting a standard deviation for your random norms.
There are several things not ok with your code. First, you mean
Now it uses minibatch.
Debugging the gradient:
You can always check some stuff by adding
This will print the elements of that list
[tf.reduce_sum(weights['hl1'])]
. To investigate further your problem, you can check the gradients instead of using minimizeAnd finally, the loss function is inappropriate/numerical instable for classification. With your version, I get:
As you see, almost all gradients are zero. Why?
(labels - ol)
is in [0, 1]s(x)
iss'(x) = s(x)*(1-s(x))
the gradients are multiplied by this value which is again much smaller than one.But after using
sparse_softmax_cross_entropy_with_logits
which is numerically stable and operates in the log-domain I getThe gradients are (while very small) this time non zero. The code for reproducing that is