# GaMPEN

### Galaxy Morphology Posterior Estimation Network

The first ML framework for automatic cropping & estimating posteriors of different galaxy morphological parameters

### What does GaMPEN do & Why was it needed?

Although Convolutional Neural Networks (CNNs) have used for galaxy morphology determination
for quite some time now, a few challenges associated with the process has persisted. From the
early attempts at using a CNN to classify galaxies morphologically to the largest ML-produced
morphological catalogs currently available, ** most CNNs have provided broad morphological
classifications.** There has been very limited work on estimating morphological parameters
using CNNs; and

*. Meanwhile, the computation of full Bayesian posteriors for different morphological parameters is crucial for drawing scientific inferences that account for uncertainty and thus are indispensable in the derivation of robust scaling relations or tests of theoretical models using morphology.*

**there has been no work on estimating robust uncertainties of CNN determined morphological parameters. Even popular non-ML tools like Galfit severely underestimate uncertainties by values as high as $\sim75\%$** One other challenge of using CNNs in astronomy, is the necessity ot use fixed
cutout sizes. Most previous work has resorted to selecting a large cutout size for which "most galaxies"
would remain in the frame. However, this means that for many galaxies in the dataset, especially smaller
ones, typical cutouts contain other galaxies in the frame, often leading to less accurate results. *
Thus, this becomes a problem when applying CNNs to galaxies over a wide magnitude and redshift range.*

In order to address both these challenges, we developed GaMPEN.

**GaMPEN automatically crops input images to an optimal size before determining their morphology.****GaMPEN estimates Bayesian posterior distributions for a galaxy's bulge-to-total light ratio ($L_B/L_T$), radius($R_e$), and flux. (It can be easily adapted to predict other parameters).**- GaMPEN takes into account both aleatoric & epistemic uncertainties
- GaMPEN's produced uncertainties/posteriors are extremely well-calibrated and accurate. ($\lesssim 5\%$ derivation)
- GaMPEN incorporates the full covariance matrix in its loss function allowing it to achieve well-calibrated uncertainties for all three variables simultaneously.

*GaMPEN is the first machine learning framework for determining joint posterior distributions of multiple morphological parameters and is also the first application of an STN to optical imaging in astronomy.*### GaMPEN's Architecture

GaMPEN's design is based on our previously successful classification CNN, GaMorNet, as well as as different variants of the Oxford Visual Geometry Group networks. We tried a variety of different architectures before finally converging on this design.

### How does GaMPEN automatically crop galaxies?

In GaMPEN, the STN is upstream of the CNN, where it applies a two-dimensional affine
transformation to the input image. *Each input image is transformed differently by the STN,
which learns the appropriate cropping during the training of the downstream CNN without
additional supervision.*

To perform the transformation the STN predicts the six-parameter matrix of the affine transformation $A_{\theta}$ to be applied to the input image. $$ \left(\begin{array}{c} x_{i}^{s} \\ y_{i}^{s} \end{array}\right) = A_{\theta}\left(\begin{array}{c} x_{i}^{t} \\ y_{i}^{t} \\ 1 \end{array}\right) = \left[\begin{array}{lll} \theta_{11} & \theta_{12} & \theta_{13} \\ \theta_{21} & \theta_{22} & \theta_{23} \end{array}\right]\left(\begin{array}{c} x_{i}^{t} \\ y_{i}^{t} \\ 1 \end{array}\right) $$ where $ \left(x_i^s,y_i^s\right)$ and $ \left( x_i^t, y_i^t \right) $ are the source and target co-ordinates respectively. As the network trains, the predicted transformation alters slowly based on the loss function.

*Placing the STN upstream in the GaMPEN framework allows the network to learn how to
actively transform the input image in such a way that minimizes the overall loss
function (i.e., the predicted transformation is guided by and helps the downstream
morphological parameter estimation).* Because the transformation we use is differentiable with respect to the parameters,
gradients can be backpropagated to the predicted parameters $A_{\theta}$.
*This crucial property allows the STN to be trained using standard backpropagation
along with the downstream CNN, without any additional supervision.*

The primary benefit of using an STN is that:- *without any additional training needed,
the STN learns to systematically crop out secondary galaxies in the cutouts and focus on
the galaxy of interest at the center of the cutout.*

### Prediction Stability Against Rotational Transformations

Although CNNs learn to recognize features that are
invariant under translation -- the learned features are typically not rotationally
invariant. However, this is a problem if CNNs are to be used in astronomy -- especially,
for determining the morphology of galaxies. A CNN should be able to identify the same
galaxy at two different orientations and return the same values. *But is this true?
To what level are the predictions stable? *

The above video shows the stability of predictions by GaMPEN when an input galaxy image
is rotated through various angles. GaMPEN's predictions of all three
output parameters -- bulge-to-total light ratio ($L_B/L_T$), effective radius ($R_e$),
and flux -- are stable against rotations.The modes of the predicted values deviate by
$\lesssim 5\%$. *This importantly shows to what level the predictions are stable
against rotations.
*

### Posterior Estimation/Uncertainty Prediction

**However, only performing Monte Carlo dropout is not enough!**

Our training data consists of noisy input images by design, but we know the corresponding morphological parameters with perfect accuracy. However, due to the different amounts of noise in each image, the predictions of GaMPEN at test time should have different levels of uncertainties. Thus, in this situation, we train GaPEN to predict aleatoric uncertainties.

Although we would like to use GaMPEN to predict aleatoric uncertainties, the covariance matrix,
$\boldsymbol{\Sigma}$, is not known {\it a priori}. Instead, we train GaMPEN to learn these
values by minimizing the negative log-likelihood of the output parameters for the training set,
which can be written as
$$
\begin{split}
- \log \mathcal{L}_{VI} \propto \sum_{n} & \frac{1}{2}\left[\boldsymbol{Y}_{n}-\boldsymbol{\hat{\mu}}_{n}\right]^{\top} \boldsymbol{\hat{\Sigma_n}}^{-1}\left[\boldsymbol{Y}_{n}-\boldsymbol{\hat{\mu}}_{n}\right] \\
& + \frac{1}{2} \log [\operatorname{det}(\boldsymbol{\hat{\Sigma_n}})] + \lambda \sum_{i}\left\|\boldsymbol{\omega_{i}}\right\|^{2} .
\end{split}
$$
where $\boldsymbol{\hat{\mu}}_n$ and $\boldsymbol{\hat{\Sigma}}_n$ are the mean and covariance matrix of the multivariate
Gaussian distribution predicted by GaMPEN for an image, $\boldsymbol{X}_n$. $\lambda$ is the strength of the regularization
term, and $\boldsymbol{\omega}_i$ are sampled from $q(\boldsymbol{\omega})$. *Note that the above function contains the
inverse and determinant of the covariance matrix -- calculating can be numerically unstable. Refer to the GaMPEN paper on
how to navigate this.
*

Thus to predict uncertainties:-

- For every image, GaMPEN predicts the parameters of a multivariate Gaussian distribution ($\boldsymbol{\mu}$,$\boldsymbol{\Sigma}$). We then draw a sample from this distribution.
- Now the network is slightly altered using Monte Carlo Dropout, and the above step is re-performed with a slightly different estimate of ($\boldsymbol{\mu}$,$\boldsymbol{\Sigma}$).
- The last step is now repeated 1000 times for each galaxy

The combination of the above two steps allows us to estimate robust uncertainties. By testing
GaMPEN on simulated Hyper Suprime-Cam galaxies, **we have verified that GaMPEN's predicted
posteriors are well-calibrated and accurate. ($\lesssim 5\%$ derivation).
**

### Data & Code Release

GaMPEN's source code and documentation can be accessed on GitHub.

The HSC morphological parameter catalogs are accessible at this link