How to make a random sampling of 20% of records in

2019-07-30 00:50发布

问题:

In Tableau 9.2, is it possible to generate a random sample of records? If so, how could I do this? Say I have a field called Field1, then I intend to only select 20% of the records. So far, I have found how to a generate random integer in Tableau, though it is bewildering that Tableau does not already have a function for this:

Random Seed

(DATEPART('second', NOW()) + 1) * (DATEPART('minute', NOW()) + 1) * (DATEPART('hour', NOW()) + 1) * (DATEPART('day', NOW()) + 1)

Random Number

((PREVIOUS_VALUE(MIN([Seed])) * 1140671485 + 12820163) % (2^24))

Random Int

INT([Random Number] / (2^24) * [Random Upper Limit]) + 1

So how could I create a calculated field to only show random records that make up 20% of Field1?

回答1:

When you make an extract, there is a dialog panel where you can filter records and specify rolling up to visible dimensions.

For at least some data sources, you can also specify a limit of the number of records (say grab the first 2000 records) or a random percentage (say, 10% of the records)

Then you can work with the small extract quickly to design you viz, and then remove the extract or refresh with all the data when you are ready. I don't think every data source supports the random selection though.



回答2:

There is a random number function ins Tableau, but it is hidden and doesn't appear on the list of available functions.

It is "random()". It generates a uniformly distributed number between 0 and 1.

It isn't documented but it works. See, for example, this previous answer: how to generate pseudo random numbers and row-count in Tableau



回答3:

I ended up solving my issue through the back-end in my MS Access database with the following MS Access SQL Query within a MS Access VBA macro I made:

    value1 = "some_value"
    fieldName = "[my_field_name]"
    sqlQuery = "SELECT [my_table].* " & _
                 " INTO new_table_name" & _
                 " FROM [my_table] " & _
                 " WHERE [some_field] = '" & value1 & "'" & _
                 " ORDER BY Rnd(-(100000*" & fieldName & ")*Time())"

    Debug.Print sqlQuery
    CurrentDb.Execute sqlQuery

I ended up deciding that something like this would be best left to the back-end and to leave the visual analytics to Tableau.