-->

How to decide group assignments in Dirichlet proce

2019-07-24 05:06发布

问题:

As in the Dirichlet clustering, the dirichlet process can be represented by the following:

  • Chinese Restaurant Process
  • Stick Breaking Process
  • Poly Urn Model

For instance, if we consider Chinese Restaurant Process the process is as follows:

  • Initially the restaurant is empty
  • The first person to enter (Alice) sits down at a table (selects a group).
  • The second person to enter (Bob) sits down at a table.
  • Which table does he sit at?
  • He sits down at a new table with probability α/(1+α)
  • He sits with at existing table with Alice (mean he'll join existing group) with probability 1/(1+α)
  • The (n+1)-st person sits down at a new table with probability α/(n+α)α/(n+α), and at table k with probability nk/(n+α)nk/(n+α), where nk is the number of people currently sitting at table k.

The question is:

Initially, the first person will join, say G1 (i.e. group 1),
Now the second person will join

new group      = G2 with probability α/(1+α) = P(N)  
existing group = G1 with probability 1/(1+α) = P(E)

Now if I calculate the probabilities for new entry, I'll have values for both i.e. P(N) and P(E). Then,

  • How will I decide that new entry will join which group G1 or G2?
  • Would it be decided on basis of values of both probabilities?

As,

If (P(N) > P(E))  
then  
   _new entry_ will join G2    
AND  
If (P(E) > P(N))  
then
_new entry_ will join G1  

回答1:

Based on the CRP representation,

  • customer 1 sits at table 1
  • customer i, sits at pre-occupied table k with p_k and at a new table with p_new where


Note that the sum of the probabilities is equal to 1. To find the table assignment, all you have to do is toss a coin and select the relevant table.

For example for customer i, assume you have the following probability vector

which means the probability of sitting at table 1 is 0.2, table 2 is 0.4, table 3 is 0.3, and a new table is 0.1. By constructing the cumulative probability vector and drawing a random number, you can sample the table. Let's say the random number 0.81, therefore your customer sits at table 3.