I have 60000 train_images brought in as a shape (28,28,60000) matrix. It is a numpy.ndarray. I want to convert it to an array of 1 dimensional images, meaning each image is represented as a single line/array of numbers, and I want 60000 arrays. In other words, I want to go from (28, 28, 60000) to (60000, 28*28). In python, it would be:
images_features = []
for image in images:
imageLine = []
for y in range(len(image)):
for x in range(len(image[0])):
imageLine.append(image[y][x])
images_features.append(imageLine)
How can I do this? I suspect that I need to use reshape but I couldn't figure out how exactly I can do this.
This is how I'm getting the images:
data = scipy.io.loadmat('train.mat')
images = data["train_images"]
So the "images" is the array I'm talking about.
Someone suggested to me that:
"You may need to change axes or combine them do get the functionality you want. I recommend plotting them as well in case an image ends up sideways. Make sure you are diligent with your axes to avoid further problems there."
I have no idea what "axes" is being referred to here and how to take what's said above into account.
Can someone explain what I need to do and why? (What it does)
Since this is coming via loadmat
, a shape of (28,28,60000)
makes sense - MATLAB iterates starting with the last index.
images.transpose() # or images.T
reorders the axes, so the result is (60000,28,28)
. The last two dimensions can combined with a reshape
images.T.reshape(60000,28*28)
images.T.reshape(60000,-1) # short hand
You many need to transpose the 28x28 images, e.g.
images.transpose([2,0,1]) # instead of the default [2,1,0]
.T
is the same as the MATLAB '
(or .'
).
images
may also be order='F'
.
octave:38> images=reshape(1:30,2,3,5);
octave:39> save test.mat -v7 images
octave:40> images
images =
ans(:,:,1) =
1 3 5
2 4 6
ans(:,:,2) =
7 9 11
8 10 12
....
I chose test dimensions to be small, and to make it easy to distinguish the different axes.
In a Ipython session:
In [15]: data=io.loadmat('test.mat')
In [16]: data
Out[16]:
{'__globals__': [],
'__header__': 'MATLAB 5.0 MAT-file, written by Octave 3.8.2, 2016-02-10 05:19:18 UTC',
'__version__': '1.0',
'images': array([[[ 1., 7., 13., 19., 25.],
[ 3., 9., 15., 21., 27.],
[ 5., 11., 17., 23., 29.]],
[[ 2., 8., 14., 20., 26.],
[ 4., 10., 16., 22., 28.],
[ 6., 12., 18., 24., 30.]]])}
In [18]: data['images'].T
Out[18]:
array([[[ 1., 2.],
[ 3., 4.],
[ 5., 6.]],
[[ 7., 8.],
[ 9., 10.],
[ 11., 12.]],
....
In [19]: data['images'].transpose([2,0,1])
Out[19]:
array([[[ 1., 3., 5.],
[ 2., 4., 6.]],
[[ 7., 9., 11.],
[ 8., 10., 12.]],
....
In [22]: data['images'].transpose([2,1,0]).reshape(5,-1)
Out[22]:
array([[ 1., 2., 3., 4., 5., 6.],
[ 7., 8., 9., 10., 11., 12.],
...
I think you just need to use reshape:
>>> images = np.ndarray([60000, 28, 28])
>>> images.shape
(60000, 28, 28)
>>> images_rs = images.reshape([60000, 28*28])
>>> images_rs.shape
(60000, 784)
You can reshape train_images
and verify it by plotting the images,
Reshaping:
train_features_images = train_images.reshape(train_images.shape[0],28,28)
Plotting images:
import matplotlib.pyplot as plt
def show_images(features_images,labels,start, howmany):
for i in range(start, start+howmany):
plt.figure(i)
plt.imshow(features_images[i], cmap=plt.get_cmap('gray'))
plt.title(labels[i])
plt.show()
show_images(train_features_images, labels, 1, 10)