I am having trouble understanding the likelihood function for GDA given in Andrew Ng's CS229 notes.
l(φ,µ0,µ1,Σ) = log (product from i to m) {p(x(i)|y(i);µ0,µ1,Σ)p(y(i);φ)}
The link is http://cs229.stanford.edu/notes/cs229-notes2.pdf Page 5.
For Linear regression the function was product from i to m p(y(i)|x(i);theta)
which made sense to me.
Why is there a change here saying it is given by p(x(i)|y(i) and that is multiplied by p(y(i);phi)?
Thanks in advance
The starting formula on page 5 is
l(φ,µ0,µ1,Σ) = log <product from i to m> p(x_i, y_i;µ0,µ1,Σ,φ)
leaving out the parameters φ,µ0,µ1,Σ
for now, that can be simplified to
l = log <product> p(x_i, y_i)
using the chain rule you can convert that to either
l = log <product> p(x_i|y_i)p(y_i)
or
l = log <product> p(y_i|x_i)p(x_i).
In the page 5 formula, the φ
is moved to p(y_i)
, because only p(y)
depends on it.
The likelihood starts with the joint probability distribution p(x,y)
instead of the conditional probability distribution p(y|x)
, which is why GDA is called a generative model (models from x to y and from y to x), while logistic regression is considered a discriminatory model (models from x to y, one-way). Both have their advantages and disadvantages. There seems to be a chapter about that further below.