Models Module#

Module for defining models for landmark detection.

class landmarker.models.AddCoordChannels[source]#

Adds the x and y coordinates of each pixel as additional channels to the input tensor. Optionally, it can also add the radial distance of each pixel to the center of the image as an additional channel. This is done to provide the network with spatial information.

Parameters:: radial_channel (bool, optional) – whether to add the radial distance of each pixel to the center of the image as an additional channel. Defaults to False.

__init__(radial_channel=False)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:: radial_channel (bool) –
Return type:: None

forward(x)[source]#

Parameters:: x (Tensor) – input tensor of shape (batch_size, in_channels, *spatial_dims)
Returns:: output tensor of shape (batch_size, in_channels + 2 + radial_channel, *spatial_dims)
Return type:: Tensor

class landmarker.models.CholeskyHourglass[source]#

Proposed in “UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss” - Kumar et al. (2019) # TODO: Note that the implementation of Kumar et al. use DU-Net as the backbone. # We use the residual hourglass.

Parameters:

img_size (tuple[int, int]) – size of the input image.
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
channels (Sequence[int], optional) – number of output channels for each convolutional layer.
subunits (int, optional) – number of subunits in each convolutional layer.
up_sample_mode (str, optional) – upsampling mode. Defaults to ‘nearest’.

Returns:

predicted heatmaps of shape (batch_size, out_channels, *img_dims) cen: covariance matrices of the predicted heatmaps of shape (batch_size,

out_channels, 2, 2)

Return type:

pred

__init__(img_size, in_channels, out_channels, channels=[64, 128, 256, 512], conv_block=<class 'monai.networks.blocks.convolutions.ResidualUnit'>, up_sample_mode='nearest')[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

img_size (tuple[int, int]) –
in_channels (int) –
out_channels (int) –
channels (Sequence[int]) –
conv_block (Module) –
up_sample_mode (str) –

Return type:

None

forward(x)[source]#

Parameters:

x (Tensor) – input tensor of shape (batch_size, in_channels, *img_dims)

Returns:

predicted heatmaps of shape (batch_size, out_channels, *img_dims) cen: covariance matrices of the predicted heatmaps of shape (batch_size,

out_channels, 2, 2)

Return type:

pred

class landmarker.models.CoordConvLayer[source]#

CoordConv is a convolutional layer that adds the x and y coordinates of each pixel as additional channels to the input tensor. Optionally, it can also add the radial distance of each pixel to the center of the image as an additional channel. This is done to provide the network with spatial information.

source: “An intriguing failing of convolutional neural networks and the CoordConv
solution” - Liu et al.

Parameters:

spatial_dims (int) – number of spatial dimensions of the input image.
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
radial_channel (bool, optional) – whether to add the radial distance of each pixel to the center of the image as an additional channel. Defaults to False.
conv_block (nn.Module, optional) – convolutional block to use. Defaults to ResidualUnit.

__init__(spatial_dims, in_channels, out_channels, radial_channel=False, conv_block=<class 'monai.networks.blocks.convolutions.ResidualUnit'>)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

spatial_dims (int) –
in_channels (int) –
out_channels (int) –
radial_channel (bool) –
conv_block (Module) –

forward(x)[source]#

Parameters:: x (Tensor) – input tensor of shape (batch_size, in_channels, *spatial_dims)
Returns:: output tensor of shape (batch_size, out_channels, *spatial_dims)
Return type:: Tensor

class landmarker.models.Hourglass[source]#

Hourglass network is a network with symmetrical encoder and decoder paths. The encoder path downsamples the input image while the decoder path upsamples the image. Skip connections are added between the encoder and decoder paths to preserve spatial information. This network is used for pose estimation.

Proposed in: “Stacked Hourglass Networks for Human Pose Estimation” - Newell et al. (2016)

Parameters:

spatial_dims (int) – number of spatial dimensions of the input image.
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
channels (Sequence[int], optional) – number of output channels for each convolutional layer.
conv_block (nn.Module, optional) – convolutional block to use. Defaults to ResidualUnit.
pooling (nn.Module, optional) – pooling layer to use. Defaults to nn.MaxPool2d.
up_sample_mode (str, optional) – upsampling mode. Defaults to ‘nearest’.

__init__(spatial_dims, in_channels, out_channels, channels=[64, 128, 256, 512], conv_block=<class 'monai.networks.blocks.convolutions.ResidualUnit'>, pooling=<class 'torch.nn.modules.pooling.MaxPool2d'>, up_sample_mode='nearest')[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

spatial_dims (int) –
in_channels (int) –
out_channels (int) –
channels (Sequence[int]) –
conv_block (Module) –
pooling (Module) –
up_sample_mode (str) –

forward(x)[source]#

Parameters:: x (Tensor) – input tensor of shape (batch_size, in_channels, *img_dims)
Returns:: output tensor of shape (batch_size, out_channels, *img_dims)
Return type:: Tensor

class landmarker.models.OriginalSpatialConfigurationNet[source]#

Implementation of the Spatial Configuration Network (SCN) from the paper “Integrating spatial configuration into heatmap regression based CNNs for landmark localization” by Payer et al. (2019). https://www.sciencedirect.com/science/article/pii/S1361841518305784

Parameters:

in_channels (int, optional) – number of input channels. Defaults to 1.
out_channels (int, optional) – number of output channels. Defaults to 4.
la_channels (int, optional) – number of output channels for each convolutional layer. Defaults to 128.
la_depth (int, optional) – number of convolutional layers. Defaults to 3.
la_kernel_size (int, optional) – kernel size for the convolutional layers. Defaults to 3.
la_dropout (float, optional) – dropout probability. Defaults to 0.5.
sp_channels (int, optional) – number of channels for the convolutional layers. Defaults to 128.
sp_kernel_size (int, optional) – kernel size for the convolutional layers. Defaults to 11.
sp_downsample (int, optional) – factor by which the image is downsampled. Defaults to 16.
init_weights (bool, optional) – whether to initialize the weights of the convolutional layers.

__init__(in_channels=1, out_channels=4, la_channels=128, la_depth=3, la_kernel_size=3, la_dropout=0.5, sp_channels=128, sp_kernel_size=11, sp_downsample=16, init_weigths=False, spatial_dim=2)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

in_channels (int) –
out_channels (int) –
la_channels (int) –
la_depth (int) –
la_kernel_size (int | tuple[int, ...]) –
la_dropout (float) –
sp_channels (int) –
sp_kernel_size (int) –
sp_downsample (int) –
init_weigths (bool) –
spatial_dim (int) –

Return type:

None

forward(x)[source]#

Parameters:: x (Tensor) – input tensor of shape (batch_size, in_channels, *img_dims)
Returns:: output tensor of shape (batch_size, out_channels, *img_dims)
Return type:: torch.Tensor

class landmarker.models.OriginalSpatialConfigurationNet3d[source]#

This is the 3D version of the original SCN.

__init__(in_channels=1, out_channels=4, la_channels=64, la_depth=3, la_kernel_size=3, la_dropout=0.5, sp_channels=64, sp_kernel_size=7, sp_downsample=4)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

in_channels (int) –
out_channels (int) –
la_channels (int) –
la_depth (int) –
la_kernel_size (int | tuple[int, ...]) –
la_dropout (float) –
sp_channels (int) –
sp_kernel_size (int) –
sp_downsample (int) –

class landmarker.models.ProbSpatialConfigurationNet[source]#

Probabilistic Spatial Configuration Network (PSCN)

Adapted implementation of the Probabilistic Spatial Configuration Network (PSCN) from the paper “Integrating spatial configuration into heatmap regression based CNNs for landmark localization” by Payer et al. (2019). This is the same as the Spatial Configuration Network (SCN), but with a different last layer. Instead of multiplying the output of the SCN with the output of the spatial configuration network, we add them together, since the output of the spatial configuration network is a probability distribution in the logit space.

Parameters:

spatial_dims (int, optional) – number of spatial dimensions of the input image. Defaults to 2.
in_channels (int, optional) – number of input channels. Defaults to 1.
out_channels (int, optional) – number of output channels. Defaults to 4.
la_channels (Sequence[int], optional) – number of output channels for each convolutional layer. Defaults to (128, 128, 128, 128).
la_kernel_size (int | tuple[int, int], optional) – kernel size for the convolutional layers. Defaults to 3.
la_strides (Sequence[int], optional) – strides for the convolutional layers. Defaults to (2, 2, 2).
la_num_res_units (int, optional) – number of residual units in the convolutional layers. Defaults to 2.
la_norm (str, optional) – type of normalization to use. Defaults to “instance”.
la_activation (str, optional) – type of activation to use. Defaults to “PRELU”.
la_adn_ordering (str, optional) – ordering of the layers in the residual units. Defaults to “NDA”.
la_dropout (float, optional) – dropout probability. Defaults to 0.0.
sp_channels (int, optional) – number of channels for the convolutional layers. Defaults to 128.
sp_kernel_size (int, optional) – kernel size for the convolutional layers. Defaults to 11.
sp_downsample (int, optional) – factor by which the image is downsampled. Defaults to 16.
sp_image_input (bool, optional) – whether to use the input image as input for the spatial

__init__(spatial_dims=2, in_channels=1, out_channels=4, la_channels=(128, 128, 128, 128, 128), la_kernel_size=3, la_strides=(2, 2, 2, 2), la_num_res_units=2, la_norm='instance', la_activation='PRELU', la_adn_ordering='NDA', la_dropout=0.0, sp_channels=128, sp_kernel_size=11, sp_downsample=16, sp_image_input=True)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

spatial_dims (int) –
in_channels (int) –
out_channels (int) –
la_channels (Sequence[int]) –
la_kernel_size (int | tuple[int, int]) –
la_strides (Sequence[int]) –
la_num_res_units (int) –
la_norm (str) –
la_activation (str) –
la_adn_ordering (str) –
la_dropout (float) –
sp_channels (int) –
sp_kernel_size (int) –
sp_downsample (int) –
sp_image_input (int) –

Return type:

None

forward(x)[source]#

Parameters:: x (Tensor) – input tensor of shape (batch_size, in_channels, *img_dims)
Returns:: output tensor of shape (batch_size, out_channels, *img_dims)
Return type:: torch.Tensor

class landmarker.models.SpatialConfigurationNet[source]#

Adapted implementation of the Spatial Configuration Network (SCN) from the paper “Integrating spatial configuration into heatmap regression based CNNs for landmark localization” by Payer et al. (2019). https://www.sciencedirect.com/science/article/pii/S1361841518305784

Parameters:

spatial_dims (int) – number of spatial dimensions of the input image.
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
la_channels (Sequence[int], optional) – number of output channels for each convolutional layer.
la_kernel_size (int, optional) – kernel size for the convolutional layers.
la_strides (Sequence[int], optional) – strides for the convolutional layers.
la_num_res_units (int, optional) – number of residual units in the convolutional layers.
la_norm (str, optional) – type of normalization to use. Defaults to “INSTANCE”.
la_activation (str, optional) – type of activation to use. Defaults to “PRELU”.
la_adn_ordering (str, optional) – ordering of the layers in the residual units. Defaults to “ADN”.
la_dropout (float, optional) – dropout probability. Defaults to 0.0.
sp_channels (int, optional) – number of channels for the convolutional layers.
sp_kernel_size (int, optional) – kernel size for the convolutional layers.
sp_downsample (int, optional) – factor by which the image is downsampled.
sp_image_input (bool, optional) – whether to use the input image as input for the spatial configuration network.

__init__(spatial_dims=2, in_channels=1, out_channels=4, la_channels=(128, 128, 128, 128), la_kernel_size=3, la_strides=(2, 2, 2), la_num_res_units=2, la_norm='INSTANCE', la_activation='PRELU', la_adn_ordering='ADN', la_dropout=0.0, sp_channels=128, sp_kernel_size=11, sp_downsample=16, sp_image_input=True)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

spatial_dims (int) –
in_channels (int) –
out_channels (int) –
la_channels (Sequence[int]) –
la_kernel_size (int | tuple[int, int]) –
la_strides (Sequence[int]) –
la_num_res_units (int) –
la_norm (str) –
la_activation (str) –
la_adn_ordering (str) –
la_dropout (float) –
sp_channels (int) –
sp_kernel_size (int) –
sp_downsample (int) –
sp_image_input (bool) –

forward(x)[source]#

Parameters:: x (Tensor) – input tensor of shape (batch_size, in_channels, *img_dims)
Returns:: output tensor of shape (batch_size, out_channels, *img_dims)
Return type:: torch.Tensor

class landmarker.models.StackedCholeskyHourglass[source]#

Stacked Cholesky Hourglass Network as proposed in “UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss” - Kumar et al. (2019). It is a stack of hourglass networks with a Cholesky Estimator Network at the bottleneck of each hourglass. The output of the Cholesky Estimator Network is a lower triangular matrix that is used to estimate the covariance matrix of the Gaussian distribution of the predicted heatmaps. The covariance matrix is then used to compute the Gaussian Log-Likelihood Loss.

Parameters:

nb_stacks (int) – number of hourglass networks to stack.
img_size (tuple[int, int]) – size of the input image.
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
channels (Sequence[int], optional) – number of output channels for each convolutional layer.
conv_block (nn.Module, optional) – convolutional block to use. Defaults to ResidualUnit.
up_sample_mode (str, optional) – upsampling mode. Defaults to ‘nearest’.

__init__(nb_stacks, img_size, in_channels, out_channels, channels=[64, 128, 256, 512], conv_block=<class 'monai.networks.blocks.convolutions.ResidualUnit'>, up_sample_mode='nearest')[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

nb_stacks (int) –
img_size (tuple[int, int]) –
in_channels (int) –
out_channels (int) –
channels (Sequence[int]) –
conv_block (Module) –
up_sample_mode (str) –

Return type:

None

forward(x)[source]#

Parameters:

x (Tensor) – input tensor of shape (batch_size, in_channels, *img_dims)

Returns:

list of predicted heatmaps of shape (batch_size, out_channels, *img_dims) cens: list of covariance matrices of the predicted heatmaps of shape (batch_size,

out_channels, 2, 2)

Return type:

heatmaps

class landmarker.models.StackedHourglass[source]#

Stacked hourglass.

Parameters:

nb_stacks (int) – number of hourglass modules to stack.
spatial_dims (int) – number of spatial dimensions of the input image.
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
channels (Sequence[int], optional) – number of output channels for each convolutional layer.
up_sample_mode (str, optional) – upsampling mode. Defaults to ‘nearest’.

__init__(nb_stacks, spatial_dims, in_channels, out_channels, channels=[64, 128, 256, 512], conv_block=<class 'monai.networks.blocks.convolutions.ResidualUnit'>, pooling=<class 'torch.nn.modules.pooling.MaxPool2d'>, up_sample_mode='nearest')[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

nb_stacks (int) –
spatial_dims (int) –
in_channels (int) –
out_channels (int) –
channels (Sequence[int]) –
conv_block (Module) –
pooling (Module) –
up_sample_mode (str) –

forward(x)[source]#

Parameters:: x (Tensor) – input tensor of shape (batch_size, in_channels, *img_dims)
Returns:: list of output tensors of shape (batch_size, out_channels, *img_dims)
Return type:: list[Tensor]