How to format a data set for fully convolutional n

I am trying to prepare my data set for fully convolutional network. I've looked through some data sets and I'm having a really hard time figuring out how to format it. For instance, in the Kitti data set, there are these 2 images and this text file in the training folder:

image 1

image 2

text

P0: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 0.000000000000e+00 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 P1: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.875744000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 P2: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 4.485728000000e+01 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.163791000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.745884000000e-03 P3: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.395242000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.199936000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.729905000000e-03 R0_rect: 9.999239000000e-01 9.837760000000e-03 -7.445048000000e-03 -9.869795000000e-03 9.999421000000e-01 -4.278459000000e-03 7.402527000000e-03 4.351614000000e-03 9.999631000000e-01 Tr_velo_to_cam: 7.533745000000e-03 -9.999714000000e-01 -6.166020000000e-04 -4.069766000000e-03 1.480249000000e-02 7.280733000000e-04 -9.998902000000e-01 -7.631618000000e-02 9.998621000000e-01 7.523790000000e-03 1.480755000000e-02 -2.717806000000e-01 Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01 Tr_cam_to_road: 9.999570839814e-01 -5.508724949246e-03 -7.452906591504e-03 9.610489538319e-03 5.425697507328e-03 9.999234779341e-01 -1.111504746388e-02 -1.597134401910e+00 7.513565886504e-03 1.107413060494e-02 9.999104059534e-01 2.788606298060e-01

This data set is very different from the regular data sets I've seen being used for CNNs. Hence, I had the following questions:

What is happening in the text file?
How to generate the 2nd image with solid colored pixels?
One of the proposed advantages of FCNs is the ability to feed input images of arbitrary sizes. How small can I make the input images - is 50x50 too small? I looked for some literature surrounding this but couldn't find much.

Essentially, I'm trying to create a data set to use this network from this github. Which has only 2 folders for training: training_img_lmdb and training_label_lmdb. So, I'm not exactly sure if the text file or the pixelated image goes in the label folder. Any help would be greatly appreciated!!

Looks like some kind of telemetry, from Tr_cam_to_road, Tr_velo_to_cam, etc... usually the dataset will have documentation
Please clarify. You posted the image. Surely you know how to load an image?
You are correct, however any purely convolutional network will have a minimum input size equivalent to the input neighborhood size of a single output pixel.