Bayesian Discriminant Analysis

<aside> 💡 When we use normal(Gaussian) distribution for each class, this leads to linear or quadratic discriminant Analysis.

</aside>

$$ Posterior = \frac{Prior × likelihood}{Evidence} = \frac{P(Y=k) × P(X=x∣Y=k)}{P(X=x)} = P(Y=k|X=x) $$

$f_k(x) = P(X=x|Y=k)\;$ density of X in class k.

$\pi_k = P(Y=k)\;$ marginal or prior probability for class k.

$$ \therefore \pi_1f_1(x)> \pi_2f_2(x) \; class_1 \newline \pi_1f_1(x) < \pi_2f_2(x) \; class_2 $$

<aside> 💡 When the classes are well-separated, the parameter estimates for the logistic regression model are surprisingly unstable. Linear discriminant analysis does not suffer from this problem. Why?

</aside>

<aside> 💡 Answer: If the data is linearly separable the Negative Log Loss Function(NLL) becomes more and more negative, when solving for ß numerically. The algorithm does not converge. We terminate the algorithm after a fixed iterations.

</aside>

Why Discriminant Analysis?

If $n$ is small and the distribution of predictors $X$ is approximately normal in each of the class. LDA is more stable then Logistic Regression. since $p>n$.