-->

Unable to get AWS SageMaker to read RecordIO files

2019-08-18 08:31发布

问题:

I'm trying to convert an object detection lst file to a rec file and train with it in SageMaker. My list looks something like this:

10  2   5   9.0000  1008.0000   1774.0000   1324.0000   1953.0000   3.0000  2697.0000   3340.0000   948.0000    1559.0000   0.0000  0.0000  0.0000  0.0000  0.0000  IMG_1091.JPG
58  2   5   11.0000 1735.0000   2065.0000   1047.0000   1300.0000   6.0000  2444.0000   2806.0000   1194.0000   1482.0000   1.0000  2975.0000   3417.0000   1739.0000   2139.0000   IMG_7000.JPG
60  2   5   12.0000 1243.0000   1861.0000   1222.0000   1710.0000   6.0000  2423.0000   2971.0000   1205.0000   1693.0000   0.0000  0.0000  0.0000  0.0000  0.0000  IMG_7061.JPG
80  2   5   1.0000  1865.0000   2146.0000   818.0000    969.0000    14.0000 1559.0000   1918.0000   1658.0000   1914.0000   6.0000  2638.0000   3042.0000   2125.0000   2490.0000   IMG_9479.JPG
79  2   5   13.0000 1556.0000   1812.0000   1440.0000   1637.0000   7.0000  2216.0000   2452.0000   1595.0000   1816.0000   0.0000  0.0000  0.0000  0.0000  0.0000  IMG_9443.JPG

Where the columns are

index, header length, object length, class id, xmin, ymin, xmax, ymax, (repeat any other ids...), image path

I then run the list through im2rec with

$ /incubator-mxnet/tools/im2rec.py my_lst.lst my_image_folder

I then upload the resultant .rec file to s3.

I then pull the necessary parts from this AWS sample notebook.

I think the only key piece is probably this:

def set_hyperparameters(num_epochs, lr_steps):
    num_classes = 16
    num_training_samples = 227
    print('num classes: {}, num training images: {}'.format(num_classes, num_training_samples))

    od_model.set_hyperparameters(base_network='resnet-50',
                                 use_pretrained_model=1,
                                 num_classes=num_classes,
                                 mini_batch_size=16,
                                 epochs=num_epochs,               
                                 learning_rate=0.001, 
                                 lr_scheduler_step=lr_steps,      
                                 lr_scheduler_factor=0.1,
                                 optimizer='sgd',
                                 momentum=0.9,
                                 weight_decay=0.0005,
                                 overlap_threshold=0.5,
                                 nms_threshold=0.45,
                                 image_shape=512,
                                 label_width=350,
                                 num_training_samples=num_training_samples)

set_hyperparameters(100, '33,67')

Ultimately I get the error: Not enough label packed in img_list or rec file.

Can someone help me identify what parts I'm missing in order to properly train with SageMaker and RecordIO files?

Thanks for your help!

Also, if I instead use

$ /incubator-mxnet/tools/im2rec.py my_lst.lst my_image_folder --pass-through --pack-label

I get the error:

Expected number of batches: 14, did not match the number of batches processed: 5. This may happen when some images or annotations are invalid and cannot be parsed. Please check the dataset and ensure it follows the format in the documentation.