I have one big array:
[(1.0, 3.0, 1, 427338.4297000002, 4848489.4332)
(1.0, 3.0, 2, 427344.7937000003, 4848482.0692)
(1.0, 3.0, 3, 427346.4297000002, 4848472.7469) ...,
(1.0, 1.0, 7084, 427345.2709999997, 4848796.592)
(1.0, 1.0, 7085, 427352.9277999997, 4848790.9351)
(1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)]
I want to split this array into multiple arrays based on the 2nd value in the array (3.0, 3.0, 3.0...1.0,1.0,10).
Every time the 2nd value changes, I want a new array, so basically each new array has the same 2nd value. I've looked this up on Stack Overflow and know of the command
np.split(array, number)
but I'm not trying to split the array into a certain number of arrays, but rather by a value. How would I be able to split the array in the way specified above?
Any help would be appreciated!
You can find the indices where the values differ by using numpy.where
and numpy.diff
on the first column:
>>> arr = np.array([(1.0, 3.0, 1, 427338.4297000002, 4848489.4332),
(1.0, 3.0, 2, 427344.7937000003, 4848482.0692),
(1.0, 3.0, 3, 427346.4297000002, 4848472.7469),
(1.0, 1.0, 7084, 427345.2709999997, 4848796.592),
(1.0, 1.0, 7085, 427352.9277999997, 4848790.9351),
(1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)])
>>> np.split(arr, np.where(np.diff(arr[:,1]))[0]+1)
[array([[ 1.00000000e+00, 3.00000000e+00, 1.00000000e+00,
4.27338430e+05, 4.84848943e+06],
[ 1.00000000e+00, 3.00000000e+00, 2.00000000e+00,
4.27344794e+05, 4.84848207e+06],
[ 1.00000000e+00, 3.00000000e+00, 3.00000000e+00,
4.27346430e+05, 4.84847275e+06]]),
array([[ 1.00000000e+00, 1.00000000e+00, 7.08400000e+03,
4.27345271e+05, 4.84879659e+06],
[ 1.00000000e+00, 1.00000000e+00, 7.08500000e+03,
4.27352928e+05, 4.84879094e+06],
[ 1.00000000e+00, 1.00000000e+00, 7.08600000e+03,
4.27359161e+05, 4.84878743e+06]])]
Explanation:
Here first we are going to fetch the items in the second 2 column:
>>> arr[:,1]
array([ 3., 3., 3., 1., 1., 1.])
Now to find out where the items actually change we can use numpy.diff
:
>>> np.diff(arr[:,1])
array([ 0., 0., -2., 0., 0.])
Any thing non-zero means that the item next to it was different, we can use numpy.where
to find the indices of non-zero items and then add 1 to it because the actual index of such item is one more than the returned index:
>>> np.where(np.diff(arr[:,1]))[0]+1
array([3])