Image Classification Models

Here we list all the ML models presented in the cucaracha library for the image classification task.

Model: Small Xception

Bases: ModelArchitect

SmallXception is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is a smaller version of the Xception architecture, designed to be lightweight and efficient for smaller datasets or less computationally intensive tasks.

Attributes:

Name	Type	Description
`img_shape`	`tuple`	The shape of the input images (height, width).
`num_classes`	`int`	The number of output classes for classification.

Methods:

Name	Description
`get_model`	Builds and returns the Keras model based on the SmallXception architecture.
`__str__`	Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/small_xception.py

class SmallXception(ModelArchitect):
    """
    SmallXception is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is a smaller version
    of the Xception architecture, designed to be lightweight and efficient for
    smaller datasets or less computationally intensive tasks.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the SmallXception architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        inputs = keras.Input(shape=input_shape)

        # Entry block
        x = layers.Rescaling(1.0 / 255)(inputs)
        x = layers.Conv2D(128, 3, strides=2, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)

        previous_block_activation = x  # Set aside residual

        for size in [256, 512, 728]:
            x = layers.Activation('relu')(x)
            x = layers.SeparableConv2D(size, 3, padding='same')(x)
            x = layers.BatchNormalization()(x)

            x = layers.Activation('relu')(x)
            x = layers.SeparableConv2D(size, 3, padding='same')(x)
            x = layers.BatchNormalization()(x)

            x = layers.MaxPooling2D(3, strides=2, padding='same')(x)

            # Project residual
            residual = layers.Conv2D(size, 1, strides=2, padding='same')(
                previous_block_activation
            )
            x = layers.add([x, residual])  # Add back residual
            previous_block_activation = x  # Set aside next residual

        x = layers.SeparableConv2D(1024, 3, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)

        x = layers.GlobalAveragePooling2D()(x)

        x = layers.Dropout(0.25)(x)
        outputs = layers.Dense(self.num_classes, activation='softmax')(x)

        return keras.Model(inputs, outputs)

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: Alex Net

Bases: ModelArchitect

AlexNet is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the original AlexNet architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105.

Attributes:

Name	Type	Description
`img_shape`	`tuple`	The shape of the input images (height, width).
`num_classes`	`int`	The number of output classes for classification.

Methods:

Name	Description
`get_model`	Builds and returns the Keras model based on the AlexNet architecture.
`__str__`	Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/alex_net.py

class AlexNet(ModelArchitect):
    """
    AlexNet is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    original AlexNet architecture, designed to handle large-scale image classification
    tasks with high computational efficiency.

    Reference:
        Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
        ImageNet Classification with Deep Convolutional Neural Networks.
        Advances in Neural Information Processing Systems, 25, 1097-1105.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the AlexNet architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        inputs = keras.Input(shape=input_shape)

        # x = keras.models.Sequential()

        # Entry block
        # Layer 1: Convolutional layer with 64 filters of size 11x11x3
        x = layers.Rescaling(1.0 / 255)(inputs)
        x = layers.Conv2D(
            filters=64,
            kernel_size=(11, 11),
            strides=(4, 4),
            padding='valid',
            activation='relu',
        )(x)

        # Layer 2: Max pooling layer with pool size of 3x3
        x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)

        # Layer 3-5: 3 more convolutional layers with similar structure as Layer 1
        x = layers.Conv2D(
            filters=192, kernel_size=(5, 5), padding='same', activation='relu'
        )(x)
        x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)
        x = layers.Conv2D(
            filters=384, kernel_size=(3, 3), padding='same', activation='relu'
        )(x)
        x = layers.Conv2D(
            filters=256, kernel_size=(3, 3), padding='same', activation='relu'
        )(x)
        x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)

        # Layer 6: Fully connected layer with 4096 neurons
        x = layers.Flatten()(x)
        x = layers.Dense(4096, activation='relu')(x)

        # Layer 7: Fully connected layer with 4096 neurons
        x = layers.Dense(4096, activation='relu')(x)
        outputs = layers.Dense(self.num_classes, activation='softmax')(x)

        return keras.Model(inputs, outputs)

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: Dense Net 121

Bases: ModelArchitect

DenseNet121 is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the DenseNet121 architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely Connected Convolutional Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269.

Attributes:

Name	Type	Description
`img_shape`	`tuple`	The shape of the input images (height, width).
`num_classes`	`int`	The number of output classes for classification.

Methods:

Name	Description
`get_model`	Builds and returns the Keras model based on the DenseNet121 architecture.
`__str__`	Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/dense_net_121.py

class DenseNet121(ModelArchitect):
    """
    DenseNet121 is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    DenseNet121 architecture, designed to handle large-scale image classification
    tasks with high computational efficiency.

    Reference:
        Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017).
        Densely Connected Convolutional Networks.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the DenseNet121 architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        return keras.applications.densenet.DenseNet121(
            weights=None,
            input_shape=(self.img_shape[0], self.img_shape[1], 3),
            classes=self.num_classes,
        )

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: GoogleLeNet

Bases: ModelArchitect

GoogleLeNet is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the GoogleLeNet (Inception v1) architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9.

Attributes:

Name	Type	Description
`img_shape`	`tuple`	The shape of the input images (height, width).
`num_classes`	`int`	The number of output classes for classification.

Methods:

Name	Description
`get_model`	Builds and returns the Keras model based on the GoogleLeNet architecture.
`__str__`	Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/google_le_net.py

class GoogleLeNet(ModelArchitect):
    """
    GoogleLeNet is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    GoogleLeNet (Inception v1) architecture, designed to handle large-scale image
    classification tasks with high computational efficiency.

    Reference:
        Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015).
        Going deeper with convolutions.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the GoogleLeNet architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        # input layer
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        input_layer = keras.Input(shape=input_shape)

        # convolutional layer: filters = 64, kernel_size = (7,7), strides = 2
        X = layers.Conv2D(
            filters=64,
            kernel_size=(7, 7),
            strides=2,
            padding='valid',
            activation='relu',
        )(input_layer)

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # convolutional layer: filters = 64, strides = 1
        X = layers.Conv2D(
            filters=64,
            kernel_size=(1, 1),
            strides=1,
            padding='same',
            activation='relu',
        )(X)

        # convolutional layer: filters = 192, kernel_size = (3,3)
        X = layers.Conv2D(
            filters=192, kernel_size=(3, 3), padding='same', activation='relu'
        )(X)

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # 1st Inception block
        X = Inception_block(
            X,
            f1=64,
            f2_conv1=96,
            f2_conv3=128,
            f3_conv1=16,
            f3_conv5=32,
            f4=32,
        )

        # 2nd Inception block
        X = Inception_block(
            X,
            f1=128,
            f2_conv1=128,
            f2_conv3=192,
            f3_conv1=32,
            f3_conv5=96,
            f4=64,
        )

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # 3rd Inception block
        X = Inception_block(
            X,
            f1=192,
            f2_conv1=96,
            f2_conv3=208,
            f3_conv1=16,
            f3_conv5=48,
            f4=64,
        )

        # Extra network 1:
        X1 = layers.AveragePooling2D(pool_size=(5, 5), strides=3)(X)
        X1 = layers.Conv2D(
            filters=128, kernel_size=(1, 1), padding='same', activation='relu'
        )(X1)
        X1 = layers.Flatten()(X1)
        X1 = layers.Dense(1024, activation='relu')(X1)
        X1 = layers.Dropout(0.7)(X1)
        X1 = layers.Dense(5, activation='softmax')(X1)

        # 4th Inception block
        X = Inception_block(
            X,
            f1=160,
            f2_conv1=112,
            f2_conv3=224,
            f3_conv1=24,
            f3_conv5=64,
            f4=64,
        )

        # 5th Inception block
        X = Inception_block(
            X,
            f1=128,
            f2_conv1=128,
            f2_conv3=256,
            f3_conv1=24,
            f3_conv5=64,
            f4=64,
        )

        # 6th Inception block
        X = Inception_block(
            X,
            f1=112,
            f2_conv1=144,
            f2_conv3=288,
            f3_conv1=32,
            f3_conv5=64,
            f4=64,
        )

        # Extra network 2:
        X2 = layers.AveragePooling2D(pool_size=(5, 5), strides=3)(X)
        X2 = layers.Conv2D(
            filters=128, kernel_size=(1, 1), padding='same', activation='relu'
        )(X2)
        X2 = layers.Flatten()(X2)
        X2 = layers.Dense(1024, activation='relu')(X2)
        X2 = layers.Dropout(0.7)(X2)
        X2 = layers.Dense(1000, activation='softmax')(X2)

        # 7th Inception block
        X = Inception_block(
            X,
            f1=256,
            f2_conv1=160,
            f2_conv3=320,
            f3_conv1=32,
            f3_conv5=128,
            f4=128,
        )

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # 8th Inception block
        X = Inception_block(
            X,
            f1=256,
            f2_conv1=160,
            f2_conv3=320,
            f3_conv1=32,
            f3_conv5=128,
            f4=128,
        )

        # 9th Inception block
        X = Inception_block(
            X,
            f1=384,
            f2_conv1=192,
            f2_conv3=384,
            f3_conv1=48,
            f3_conv5=128,
            f4=128,
        )

        # Global Average pooling layer
        X = layers.GlobalAveragePooling2D(name='GAPL')(X)

        # Dropoutlayer
        X = layers.Dropout(0.4)(X)

        # output layer
        X = layers.Dense(1000, activation='softmax')(X)

        # model
        return keras.Model(input_layer, [X, X1, X2], name='GoogLeNet')

    def __str__(self):
        super().__str__()
        self.get_model().summary(show_trainable=True)

Model: Model Soup

Bases: ModelArchitect

ModelSoup is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the ResNet50 architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

Attributes:

Name	Type	Description
`img_shape`	`tuple`	The shape of the input images (height, width).
`num_classes`	`int`	The number of output classes for classification.

Methods:

Name	Description
`get_model`	Builds and returns the Keras model based on the ResNet50 architecture, as created by the Model Soup.
`__str__`	Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/model_soup.py

class ModelSoup(ModelArchitect):
    """
    ModelSoup is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    ResNet50 architecture, designed to handle large-scale image classification
    tasks with high computational efficiency.

    Reference:
        He, K., Zhang, X., Ren, S., & Sun, J. (2016).
        Deep residual learning for image recognition.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the ResNet50 architecture, as created by the Model Soup.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        model = keras.applications.ResNet50(
            input_shape=input_shape, include_top=False, weights=None
        )
        flatten = keras.layers.GlobalAveragePooling2D()(model.output)
        drop_out = keras.layers.Dropout(0.5)(flatten)
        dense = keras.layers.Dense(2048, activation='relu')(drop_out)
        prediction = keras.layers.Dense(
            self.num_classes, activation='softmax', name='prediction'
        )(dense)

        return keras.Model(model.input, prediction)

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: Res Net 50

Bases: ModelArchitect

ResNet50 is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the ResNet50 architecture, designed to handle large-scale image classification tasks with high accuracy and efficiency.

Reference

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

Attributes:

Name	Type	Description
`img_shape`	`tuple`	The shape of the input images (height, width).
`num_classes`	`int`	The number of output classes for classification.

Methods:

Name	Description
`get_model`	Builds and returns the Keras model based on the ResNet50 architecture.
`__str__`	Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/res_net_50.py

class ResNet50(ModelArchitect):
    """
    ResNet50 is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    ResNet50 architecture, designed to handle large-scale image classification
    tasks with high accuracy and efficiency.

    Reference:
        He, K., Zhang, X., Ren, S., & Sun, J. (2016).
        Deep residual learning for image recognition.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the ResNet50 architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        return keras.applications.resnet50.ResNet50(
            weights=None,
            input_shape=(self.img_shape[0], self.img_shape[1], 3),
            classes=self.num_classes,
        )

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output