<aside> 💡 When we use normal(Gaussian) distribution for each class, this leads to linear or quadratic discriminant Analysis.
</aside>
$$ Posterior = \frac{Prior × likelihood}{Evidence} = \frac{P(Y=k) × P(X=x∣Y=k)}{P(X=x)} = P(Y=k|X=x) $$
$f_k(x) = P(X=x|Y=k)\;$ density of X in class k.
$\pi_k = P(Y=k)\;$ marginal or prior probability for class k.
$$ \therefore \pi_1f_1(x)> \pi_2f_2(x) \; class_1 \newline \pi_1f_1(x) < \pi_2f_2(x) \; class_2 $$
<aside> 💡 When the classes are well-separated, the parameter estimates for the logistic regression model are surprisingly unstable. Linear discriminant analysis does not suffer from this problem. Why?
</aside>
<aside> 💡 Answer: If the data is linearly separable the Negative Log Loss Function(NLL) becomes more and more negative, when solving for ß numerically. The algorithm does not converge. We terminate the algorithm after a fixed iterations.
</aside>
If $n$ is small and the distribution of predictors $X$ is approximately normal in each of the class. LDA is more stable then Logistic Regression. since $p>n$.