Tensorflow - predicting sequences: what is X and Y

I have a tensor that needs to predict the next element in a sequence with a tensorflow LSTM/RNN, while taking into account the previous 5 elements. What should I feed into X and Y?

From 1 2 3 4 5, I want to predict 6

Suppose my input sequence X is:

X = 1 2 3 4 5 
    6 7 8 9 10 
    11 12 13 14 15
    ...

Would my Y be:

Y = 2 3 4 5 6 
    7 8 9 10 11 
    12 13 14 15 16
    ... ?

Or should I feed it:

X = 1 2 3 4 5 
    2 3 4 5 6
    3 4 5 6 7 
    ....

Would my Y be:

Y = 6
    7 
    8 
    ... ?

Or does TensorFlow do this automatically?

I am using the first approach now, inspired by a tutorial, with:

    x = tf.placeholder(tf.int32, [None, num_steps], name='input_placeholder')
    y = tf.placeholder(tf.int32, [None, num_steps], name='labels_placeholder')

    rnn_outputs = tf.reshape(rnn_outputs, [-1, state_size])
    y_reshaped = tf.reshape(y, [-1])
    logits = tf.matmul(rnn_outputs, W) + b
    predictions = tf.nn.softmax(logits)
    total_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped))

If I ask for a prediction (in the actual code the time steps is 16 and number of classes is 14313, sorry for that):

        prevSlice = np.array([[1, 2 , 3 , 4, 5, 6 ,7, 8, 9 ,10, 11, 12, 13, 14, 15, 16]], dtype=np.string_)
        feed_dict={g['x']: prevSlice}
        preds, state = sess.run([g['preds'],g['final_state']], feed_dict)

I get 15 predictions too many. Or how should I interpret these? I don't need predictions for the next 16 slices, just for the 1 next.

Since an LSTM performs a sequence to sequence prediction, would this not mean that you're being given a sequence of batch_size length as the output of your predictor instead of a single timestep.

So in short you would be getting a sequence of the same size as a prediction.

Edit:

def predict_point_by_point(model, data):
    #Predict each timestep given the last sequence of true data, in effect only predicting 1 step ahead each time
    predicted = model.predict(data)
    predicted = np.reshape(predicted, (predicted.size,))
    return predicted

you could do something along those lines, and add a moving window for each len(timestep) that you feed onto your model accounting for that one timestep added so you output one at a time aswell.