Invalid .lst file in sagemaker

2019-08-15 03:28发布

Folder structure for my S3 bucket is:

Bucket
    ->training-set
           ->medium
                 ->    img1.jpeg
                 ->    img2.jpeg
                 ->    img3.PNG

My training-set.lst file looks like this:

1  \t 1  \t medium/img1.jpeg
2  \t 1  \t medium/img2.jpeg
3  \t 1  \t medium/img3.PNG

I created this using excel sheet.

Error: Training failed with the following error: ClientError: Invalid lst file: training-set.lst

   "InputDataConfig": [
        {
          "ChannelName": "train",
          "CompressionType": "None",
          "ContentType": "application/x-image",
          "DataSource": {
            "S3DataSource": {
              "S3DataDistributionType": "FullyReplicated",
              "S3DataType": "S3Prefix",
              "S3Uri": 's3://{}/training-set/'.format(bucket)
            }
          },
          "RecordWrapperType": "None"
        },
        {
          "ChannelName": "validation",
          "CompressionType": "None",
          "ContentType": "application/x-image",
          "DataSource": {
            "S3DataSource": {
              "S3DataDistributionType": "FullyReplicated",
              "S3DataType": "S3Prefix",
              "S3Uri": 's3://{}/test-set/'.format(bucket)
            }
          },
          "RecordWrapperType": "None"
        },
        {
          "ChannelName": "train_lst",
          "CompressionType": "None",
          "ContentType": "application/x-image",
          "DataSource": {
            "S3DataSource": {
              "S3DataDistributionType": "FullyReplicated",
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://bucket/training-set/training-set.lst"
            }
          },
          "RecordWrapperType": "None"
        },
        {
          "ChannelName": "validation_lst",
          "CompressionType": "None",
          "ContentType": "application/x-image",
          "DataSource": {
            "S3DataSource": {
              "S3DataDistributionType": "FullyReplicated",
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://bucket/test-set/test-set.lst"
            }
          },
          "RecordWrapperType": "None"
        }
    ]

I am trying to use this in Amazon Sagemaker. But I'm unable to do that. Can someone please help?

2条回答
ら.Afraid
2楼-- · 2019-08-15 03:33

Could you please post the lst files you are using, looking at the documentation you need a tab delimited file place at the top of the folder hierarchy in your S3 bucket. Here is an example of a train_set.lst file from a flower classification example I built:

1   0   daisy/754296579_30a9ae018c_n.jpg
2   1   dandelion/18089878729_907ed2c7cd_m.jpg
3   1   dandelion/284497199_93a01f48f6.jpg
4   1   dandelion/3554992110_81d8c9b0bd_m.jpg
5   0   daisy/4065883015_4bb6010cb7_n.jpg

Please note that the sequence index (the first column) is required, and that the classes for your classification problem need to be number coded (starting at zero).

hope this helps!

查看更多
手持菜刀,她持情操
3楼-- · 2019-08-15 03:56

Your question doesn't explicitly say this - but based on your description of the problem am I right in assuming you are trying to use the SageMaker Image Classification algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html)?

Can you please double-check by downloading "s3://bucket/training-set/training-set.lst" (don't use the local copy you have) and checking the contents of this file - don't use Excel to open it, open it with a text editor and check that the format conforms to specification documented above - in particular I'd make sure the file is not in encoded in a non-standard encoding (it should be in UTF8) and that there are no extra tabs or spaces.

Also have a look at your training job's logs there may be additional clues there as to what went wrong.

查看更多
登录 后发表回答