ISYE 6740 - Homework 3

Date: 6/20/2021

By: Erik Magnusson

1. Density estimation: Psychological experiments

Imports

1 (a): 1-dimensional histogram and KDE

1-dimensional histogram for amygdala

Bins are set to 7 to show the higher density of values at the left portion of the histogram

1-dimensional KDE for amygdala

Bandwidth is set to 0.3 to show the smaller density peak at the left of the distribution

1-dimensional histogram for acc

Bins are set to 9 to show the higher density of values at the right portion of the graph

1-dimensional KDE for acc

Bandwidth is set to 0.3 to show the smaller density peak at the right of the distribution

1 (b): 2-dimensional histogram

The bin sizes for amygdala and acc are 7 and 9, respectively, because those values show the peak around (-0.01, -0.01).

Additionally, those bins sizes show the smaller amygdala peak around -0.05 and smaller acc peak around 0.05.

1 (c): 2-dimensional KDE

Based on the 2-dimensional kernel-density-estimation below, amygdala and acc do appear to be independent. Although the KDE shows bimodal peaks around 0 for both, the values of amygdala and acc do not follow a clear relationship with one another.

1 (d): Conditional KDEs with Orientation

Estimated conditional distribution of amygdala conditioning on political orientation

Estimated conditional distribution of acc conditioning on political orientation

Explanation

Political orientation is ranked from 1 (very conservative) to 5 (very liberal). In the conditional distributions above, one can see that more conservative orientations are associated with smaller acc brain regions and larger amygdala regions. On the other hand, liberal orientations appear to have regular acc and amygdala brain region volumes. A kernel bandwidth of 0.4 was effective when viewing these associations.

Conditional sample means for amygdala and acc

1 (e): Conditional Joint Distribution of the volume of the amygdala and acc

Using a kernel bandwidth of 0.7, I was able to limit the noise in the joint distribution and view the overall associations between orientation and amygdala/acc volumes. The joint distribution above confirms my earlier conclusions when analyzing the conditional distributions for acc and amygdala volumes with orientation. One can see the lighter-colored orientation contours (i.e., those associated with conservative orientations) located among larger amygdala volumes and lower acc volumes than more liberal orientations. Like before, the contours associated with liberal orientations are located more centrally in terms of both acc and amygdala volume.

2. Implementing EM for MNIST dataset

Import data (already formatted as 28x28 pixel images in each column (1990 columns))

Data is already scaled to between (0, 1)

2 (a): Detailed expression of the E-step and M-step in the EM algorithm

*from lecture slides

E-Step

$$\tau_k^i = p(z^i=k|D, \mu, \Sigma)$$$$ = \frac{\pi_kN(x^i|\mu_k,\Sigma_k)}{\sum_{k'}\pi_{k'}N(x^i|\mu_{k'},\Sigma_{k'})}$$$$ = \frac{\pi_kexp((x^i-\mu_k)^T\Sigma_k^{-1}(x-\mu_k))}{\sum_{k'}\pi_{k'}exp((x^i-\mu_{k'})^T\Sigma_{k'}^{-1}(x-\mu_{k'}))}$$$$(k=1,...,K, i=1,...,m)$$

M-Step

$$\pi_k=\frac{\sum_i\tau_k^i}{m},$$$$\mu_k=\frac{\sum_i\tau_k^ix^i}{\sum_i\tau_k^i},$$$$\Sigma_k=\frac{\sum_i\tau_k^i(x^i-\mu_k)(x^i-\mu_k)^T}{\sum_i\tau_k^i}$$$$(k=1,...,K, i=1,...,m)$$

2 (b): Implement EM algorithm

PCA and project data to the top 4 principal directions

2 (c): Report the fitted GMM model

posterior means of each latent Gaussian

posterior weights

posterior covariance matrix

First average image

First covariance matrix as a heatmap

Second average image

Second covariance matrix as a heatmap

2 (d): Infer the labels of the images

$\tau_k^i$ mis-classification rate for digit "2"

$\tau_k^i$ mis-classification rate for digit "6"

K-means clustering with K=2

K-means mis-classification rate for digit "2"

K-means mis-classification rate for digit "6"

Overall

Performance Comparison

When classifying the digit "2", GMM had a mis-classification rate of 0.9%, whereas K-means had a mis-classification rate of 6.5%. From these results, we can conclude that GMM achieved better performance than K-means when classifying the digit "2".

When classifying the digit "6", GMM had a mis-classification rate of 6.5%, whereas K-means had a mis-classification rate of 5.9%. From these results, we can conclude that K-means achieved better performance than GMM when classifying the digit "6".

Overall, when looking at total mis-classification rate, GMM outperforms K-means with a mis-classification rate of 3.8% versus 6.2%, respectively.