I want to provide a mask, the same size as the input image and adjust the weights learned from the image according to this mask (similar to attention, but pre-computed for each image input). How can I do this with keras (or tensorflow)?
问题:
回答1:
Question
How can I add another feature layer to an image, like a Mask, and have the neural network take this new feature layer into account?
Answer
The short answer is to add it as another colour channel to the image. If your image already has 3 colour channels; red, blue, green, then adding another channel of 1 & 0 of a mask gives the neural network that much more information to use to make decisions.
Thought Experiment
As a thought experiment, let's tackle MNIST. MNIST images are 28x28. Let's take 1 image, the 'true' image, and 3 other images, the 'distractions' and form a 56x56 image of the 4 28x28 images. MNIST is black and white so it only has 1 colour channel, brightness. Let's now add another colour channel which is a mask, 1's in area of the 56x56 image where the 'true' image is and 0's else where.
If we use the same architecture as usual for solving MNIST, convolution all the way down, we can imagine that it can use this new information to learn to only pay attention to the 'true' area and categorize the image correctly.
Code Example
In this example we try and solve the XOR problem. We take a classic XOR and double the input with noise and add a channel that is 1's for the non-noise and 0's for the noise
# Adapted from https://github.com/panchishin/learn-to-tensorflow/blob/master/solutions/04-xor-2d.py
# -- The xor problem --
x = np.array([[0., 0.], [1., 1.], [1., 0.], [0., 1.]])
y_ = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]
def makeBatch() :
# Add an additional 2 channels of noise
# either before or after the two real 'x's.
global x
rx = np.random.rand(4,4,2) > 0.5
# set the mask to 0 for all items
rx[:,:,1] = 0
index = int(np.random.random()*3)
rx[:,index:index+2,0] = x
# set the mask to 1 for 'real' values
rx[:,index:index+2,1] = 1
return rx
# -- imports --
import tensorflow as tf
# np.set_printoptions(precision=1) reduces np precision output to 1 digit
np.set_printoptions(precision=2, suppress=True)
# -- induction --
# Layer 0
x0 = tf.placeholder(dtype=tf.float32, shape=[None, 4, 2])
y0 = tf.placeholder(dtype=tf.float32, shape=[None, 2])
# Layer 1
f1 = tf.reshape(x0,shape=[-1,8])
m1 = tf.Variable(tf.random_uniform([8, 9], minval=0.1, maxval=0.9, dtype=tf.float32))
b1 = tf.Variable(tf.random_uniform([9], minval=0.1, maxval=0.9, dtype=tf.float32))
h1 = tf.sigmoid(tf.matmul(f1, m1) + b1)
# Layer 2
m2 = tf.Variable(tf.random_uniform([9, 2], minval=0.1, maxval=0.9, dtype=tf.float32))
b2 = tf.Variable(tf.random_uniform([2], minval=0.1, maxval=0.9, dtype=tf.float32))
y_out = tf.nn.softmax(tf.matmul(h1, m2) + b2)
# -- loss --
# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum(tf.square(y0 - y_out))
# training step : gradient descent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
# -- training --
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print("\nloss")
for step in range(5000):
sess.run(train, feed_dict={x0: makeBatch(), y0: y_})
if (step + 1) % 1000 == 0:
print(sess.run(loss, feed_dict={x0: makeBatch(), y0: y_}))
results = sess.run([m1, b1, m2, b2, y_out, loss], feed_dict={x0: makeBatch(), y0: y_})
labels = "m1,b1,m2,b2,y_out,loss".split(",")
for label, result in zip(*(labels, results)):
print("")
print(label)
print(result)
print("")
Output
We can see that the network correctly solves the problem and give the correct output with high certainty
y_ (truth) = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]
y_out
[[0.99 0.01]
[0.99 0.01]
[0.01 0.99]
[0.01 0.99]]
loss
0.00056630466
Confirmation that the mask is doing something
Let's change the mask function so that it is just random by commenting out the lines that set 0's for noise and 1's for signal
def makeBatch() :
global x
rx = np.random.rand(4,4,2) > 0.5
#rx[:,:,1] = 0
index = int(np.random.random()*3)
rx[:,index:index+2,0] = x
#rx[:,index:index+2,1] = 1
return rx
and then rerun the code. Indeed we can see that the network cannot learn without the mask.
y_out
[[0.99 0.01]
[0.76 0.24]
[0.09 0.91]
[0.58 0.42]]
loss
0.8080765
Conclusion
If you have some signal and noise in an image (or other data structure), and successfully add another channel (a mask) that indicates where the signal is and where the noise is, a neural net can leverage that mask to focus on the signal yet still have access to the noise.