Say I have a 10,000 pt vector that I want to take a slice of only 100 logarithmically spaced points. I want a function to give me integer values for the indices. Here's a simple solution that is simply using around + logspace, then getting rid of duplicates.
def genLogSpace( array_size, num ):
lspace = around(logspace(0,log10(array_size),num)).astype(uint64)
return array(sorted(set(lspace.tolist())))-1
ls=genLogspace(1e4,100)
print ls.size
>>84
print ls
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 13, 14, 15, 17, 19, 21, 23, 25, 27, 30,
33, 37, 40, 44, 49, 54, 59, 65, 71, 78, 86,
94, 104, 114, 125, 137, 151, 166, 182, 200, 220, 241,
265, 291, 319, 350, 384, 422, 463, 508, 558, 613, 672,
738, 810, 889, 976, 1071, 1176, 1291, 1416, 1555, 1706, 1873,
2056, 2256, 2476, 2718, 2983, 3274, 3593, 3943, 4328, 4750, 5213,
5721, 6279, 6892, 7564, 8301, 9111, 9999], dtype=uint64)
Notice that there were 16 duplicates, so now I only have 84 points.
Does anyone have a solution that will efficiently ensure the number of output samples is num? For this specific example, input values for num of 121 and 122 give 100 output points.
This is a bit tricky. You can't always get logarithmically spaced numbers. As in your example, first part is rather linear. If you are OK with that, I have a solution. But for the solution, you should understand why you have duplicates.
Logarithmic scale satisfies the condition:
Let's call this constant
r
forratio
. Forn
of these numbers between range1...size
, you'll get:So this gives you:
In your case,
n=100
andsize=10000
,r
will be~1.0974987654930561
, which means, if you start with1
, your next number will be1.0974987654930561
which is then rounded to1
again. Thus your duplicates. This issue is present for small numbers. After a sufficiently large number, multiplying with ratio will result in a different rounded integer.Keeping this in mind, your best bet is to add consecutive integers up to a certain point so that this multiplication with the ratio is no longer an issue. Then you can continue with the logarithmic scaling. The following function does that:
Python 3 update: Last line used to be
return np.array(map(lambda x: round(x)-1, result), dtype=np.uint64)
in Python 2Here are some examples using it:
And just to show you how logarithmic the results are, here is a semilog plot of the output for
x = gen_log_scale(10000, 100)
(as you can see, left part is not really logarithmic):The approach in Avaris's answer of generating your log-spaced points directly, is definitely the way to go. But I thought it would be interesting to see how to pick the appropriate value to pass to
logspace
to get what you want.The values in the array generated by
logspace(0, k, n)
are the numbers 10ik / (n−1) for 0 ≤ i < n:This sequence consists of an initial segment where the values are more closely than unit spaced (and so there may be duplicates when they are rounded to the nearest integer), followed by a segment where the values are more widely than unit spaced and there are no duplicates.
The spacing between values is s(i) = 10iK − 10(i−1)K, where K = k / (n − 1). Let m be the smallest value such that s(m) ≥ 1. (m = 7 in the example above.) Then when duplicates are removed, there are exactly ⌊½ + 10(m−1)K⌋ + n − m remaining numbers.
A bit of algebra finds:
Let's check that.
The doctests pass, so this looks good to me. So all you need to do is find
n
such thatlogspace_size(4, n) == 100
. You could do this by binary chop or one of thescipy.optimize
methods:I've got here while searching a simple method to get logarithmically spaced series (with base of 10) in python (omitting use of numpy). But your solutions are way to complicated for my ultra simple demands.
Since it's generator in order to getting list you have to:
Gives following the following output: