I am trying to prepare my data set for fully convolutional network. I've looked through some data sets and I'm having a really hard time figuring out how to format it. For instance, in the Kitti data set, there are these 2 images and this text file in the training folder:
image 1
image 2
text
P0: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 0.000000000000e+00 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 P1: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.875744000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 P2: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 4.485728000000e+01 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.163791000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.745884000000e-03 P3: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.395242000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.199936000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.729905000000e-03 R0_rect: 9.999239000000e-01 9.837760000000e-03 -7.445048000000e-03 -9.869795000000e-03 9.999421000000e-01 -4.278459000000e-03 7.402527000000e-03 4.351614000000e-03 9.999631000000e-01 Tr_velo_to_cam: 7.533745000000e-03 -9.999714000000e-01 -6.166020000000e-04 -4.069766000000e-03 1.480249000000e-02 7.280733000000e-04 -9.998902000000e-01 -7.631618000000e-02 9.998621000000e-01 7.523790000000e-03 1.480755000000e-02 -2.717806000000e-01 Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01 Tr_cam_to_road: 9.999570839814e-01 -5.508724949246e-03 -7.452906591504e-03 9.610489538319e-03 5.425697507328e-03 9.999234779341e-01 -1.111504746388e-02 -1.597134401910e+00 7.513565886504e-03 1.107413060494e-02 9.999104059534e-01 2.788606298060e-01
This data set is very different from the regular data sets I've seen being used for CNNs. Hence, I had the following questions:
- What is happening in the text file?
- How to generate the 2nd image with solid colored pixels?
- One of the proposed advantages of FCNs is the ability to feed input images of arbitrary sizes. How small can I make the input images - is 50x50 too small? I looked for some literature surrounding this but couldn't find much.
Essentially, I'm trying to create a data set to use this network from this github. Which has only 2 folders for training: training_img_lmdb
and training_label_lmdb
. So, I'm not exactly sure if the text file or the pixelated image goes in the label folder. Any help would be greatly appreciated!!