Folder structure for my S3 bucket is:
Bucket
->training-set
->medium
-> img1.jpeg
-> img2.jpeg
-> img3.PNG
My training-set.lst file looks like this:
1 \t 1 \t medium/img1.jpeg
2 \t 1 \t medium/img2.jpeg
3 \t 1 \t medium/img3.PNG
I created this using excel sheet.
Error: Training failed with the following error: ClientError: Invalid lst file: training-set.lst
"InputDataConfig": [
{
"ChannelName": "train",
"CompressionType": "None",
"ContentType": "application/x-image",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": 's3://{}/training-set/'.format(bucket)
}
},
"RecordWrapperType": "None"
},
{
"ChannelName": "validation",
"CompressionType": "None",
"ContentType": "application/x-image",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": 's3://{}/test-set/'.format(bucket)
}
},
"RecordWrapperType": "None"
},
{
"ChannelName": "train_lst",
"CompressionType": "None",
"ContentType": "application/x-image",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": "s3://bucket/training-set/training-set.lst"
}
},
"RecordWrapperType": "None"
},
{
"ChannelName": "validation_lst",
"CompressionType": "None",
"ContentType": "application/x-image",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": "s3://bucket/test-set/test-set.lst"
}
},
"RecordWrapperType": "None"
}
]
I am trying to use this in Amazon Sagemaker. But I'm unable to do that. Can someone please help?
Could you please post the lst files you are using, looking at the documentation you need a tab delimited file place at the top of the folder hierarchy in your S3 bucket. Here is an example of a train_set.lst file from a flower classification example I built:
Please note that the sequence index (the first column) is required, and that the classes for your classification problem need to be number coded (starting at zero).
hope this helps!
Your question doesn't explicitly say this - but based on your description of the problem am I right in assuming you are trying to use the SageMaker Image Classification algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html)?
Can you please double-check by downloading "s3://bucket/training-set/training-set.lst" (don't use the local copy you have) and checking the contents of this file - don't use Excel to open it, open it with a text editor and check that the format conforms to specification documented above - in particular I'd make sure the file is not in encoded in a non-standard encoding (it should be in UTF8) and that there are no extra tabs or spaces.
Also have a look at your training job's logs there may be additional clues there as to what went wrong.