Selecting data from a table in mathematica

2019-05-09 23:56发布

问题:

I'm trying to write a function that will take select the first element in the table that satisfies a criteria. For example, if I am given the following table with times in the first column and number of people infected with a disease in the second, I want to write an argument that will return the time where at least 100 people are infected.

0   1
1   2
2   4
3   8
4   15
5   29
6   50
7   88
8   130
9   157
10  180
11  191
12  196
13  199
14  200

So from this table, I want the arguemnt to tell me that at 8 seconds, at least 100 people were infected. I tried using SELECT to do this, but I'm not sure how to use SELECT with a table of 2 columns and have it return a value in the first column based on criteria from the second column.

回答1:

An alternative that uses replacement rules is

ImportString["0 1 1 2 2 4 3 8 4 15 5 29 6 50 7 88 8 130 9 157 10 180 11 191 12 196 13 199 14 200", "Table"];
Partition[Flatten[%], 2]
% /. {___, x : {_, _?(# >= 100 &)}, ___} :> x

The algorithm with which Mathematica searches for patterns ensures that this will return the first such case. If you want all cases then you can use ReplaceList. I suggest you read the tutorial on Patterns and Rules.


Edit: ImportString works on the newly formatted data as well - but you no longer need to use Partition.



回答2:

You can also use a simple NestWhile

data = {{0,1},{1,2},{2,4},{3,8},{4,15},{5,29},{6,50},{7,88},{8,130},{9,157},{10,180},
 {11,191},{12,196},{13,199},{14,200}};
NestWhile[# + 1 &, 1, data[[#, 2]] < 100 &] - 1


回答3:

Here are a few different ways to do this, assuming I've interpreted your data correctly...

In[3]:= data = {{0,1},{1,2},{2,4},{3,8},{4,15},{5,29},{6,50},{7,88},{8,130},{9,157},{10,180},{11,191},{12,196},{13,199},{14,200}};

In[8]:= Cases[data, {_, _?(#>=100&)}, 1, 1][[1, 1]]
Out[8]= 8

In[9]:= Select[data, #[[2]]>=100&, 1][[1, 1]]
Out[9]= 8

I suggest you read up on Part[] to understand this better.



回答4:

I believe there is a faster way than what has already been given, but first, Joshua's Cases method can be made a little faster by using /; rather than & for the test.

This is the solution I propose (edit: adding white space for clarity, since the double brackets do not format here):

dat[[
  Position[
    dat[[All, 2]],
    x_ /; x >= 100,
    1, 1
  ][[1, 1]],
  1
]]

Here are timings for the various methods offered. Please note that the /. method is only being run once, while the others are being run loops times. Therefore, in this first test it is 100x slower than the Position method. Also, the NestWhile method is only returning an index, rather than an actual first column element.

In[]:= 
dat = {Range[5000], Sort@RandomInteger[1*^6, 5000]} // Transpose;
lim = 300000; loops = 100;
dat /. {___, {x_, _?(# >= lim &)}, ___} :> x; // Timing
Do[  Cases[dat, {_, _?(# >= lim &)}, 1, 1][[1, 1]]  , {loops}] // Timing
Do[  Cases[dat, {_, y_ /; y >= lim}, 1, 1][[1, 1]]  , {loops}] // Timing
Do[  Select[dat, #[[2]] >= lim &, 1][[1, 1]]  , {loops}] // Timing
Do[  NestWhile[# + 1 &, 1, dat[[#, 2]] < lim &]  , {loops}] // Timing
Do[  dat[[Position[dat[[All, 2]], x_ /; x >= lim, 1, 1][[1, 1]], 1]]  , {loops}] // Timing

Out[]= {0.125, Null}

Out[]= {0.438, Null}

Out[]= {0.406, Null}

Out[]= {0.469, Null}

Out[]= {0.281, Null}

Out[]= {0.125, Null}

With a longer table (I leave out the slow method):

In[]:= 
dat = {Range[35000], Sort@RandomInteger[1*^6, 35000]} // Transpose;
lim = 300000; loops = 25;
Do[  Cases[dat, {_, _?(# >= lim &)}, 1, 1][[1, 1]]  , {loops}] // Timing
Do[  Cases[dat, {_, y_ /; y >= lim}, 1, 1][[1, 1]]  , {loops}] // Timing
Do[  Select[dat, #[[2]] >= lim &, 1][[1, 1]]  , {loops}] // Timing
Do[  NestWhile[# + 1 &, 1, dat[[#, 2]] < lim &]  , {loops}] // Timing
Do[  dat[[Position[dat[[All, 2]], x_ /; x >= lim, 1, 1][[1, 1]], 1]]  , {loops}] // Timing

Out[]= {0.734, Null}

Out[]= {0.641, Null}

Out[]= {0.734, Null}

Out[]= {0.5, Null}

Out[]= {0.266, Null}

Finally, confirmation of agreement:

In[]:= SameQ[
         Select[dat, #[[2]] >= lim &, 1][[1, 1]],
         dat[[Position[dat[[All, 2]], x_ /; x >= lim, 1, 1][[1, 1]], 1]]
       ]

Out[]= True