I am trying to run imagenet example in Caffe. In this(https://github.com/BVLC/caffe/tree/master/examples/imagenet) page they say
We assume that you already have downloaded the ImageNet training data and validation data, and they are stored on your disk like:
/path/to/imagenet/train/n01440764/n01440764_10026.JPEG
/path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG
Where do I find this data?
It's a bit of a process.
1. Got to imagenet's download page and select "Download Image URLs".
2. Download the image URL list from the links at the bottom of the page, e.g., fall 2011's list.
3. Download the images from their URLs (this may take a few days).
Note that some of the URLs (~5% last time I checked) are no longer valid, and will return a "stub" flickr image.
Here's a perl script I used to download the images using convert
utility:
#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
use File::Copy;
my $base = "/path/to/imagenet/train/";
open my $fh, '/path/to/train_image_urls.txt' or die "Cannot not open url list: $!";
while( my $line = <$fh> ) {
# a line in the url list looks like:
# n00005787_13 http://www.powercai.net/Photo/UploadPhotos/200503/20050307172201492.jpg
chomp($line);
if ( $line =~ /^(n\d+)_(\d+)\s+(\S.+)$/ ) {
my $type = $1;
my $filename = $1 . "_" . $2;
my $url = $3;
my $dst = "$base/$type/$filename" . ".JPEG";
if (! -d $base.$type ) {
mkdir($base.$type)
}
my $convertCmd = "convert \"$url\" $dst";
if ( system( $convertCmd ) == 0 ) {
if ( -e $dst ) {
my $size = -s $dst;
# check that image is not a "flickr" stub:
if ( $size == 24921 || $size == 6898 ) {
open( my $FILE, $dst );
binmode($FILE);
my $md5sum = Digest::MD5->new->addfile($FILE)->hexdigest;
if ( $md5sum eq "513dd080b92472dab22ad3e09f58f1af" || $md5sum == "ed15d4fe8b5680d1b3e01c0d2778d145" ) {
print $invl "$dst\n";
move( $dst, $base . "../invalid/" );
}
close($FILE);
}
}
} else {
# invalid image file
}
} else {
# error downloading an image
}
}
close $fh;
exit(0);