Skip to content

Image Classification Models

Here we list all the ML models presented in the cucaracha library for the image classification task.

Model: Small Xception

Bases: ModelArchitect

SmallXception is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is a smaller version of the Xception architecture, designed to be lightweight and efficient for smaller datasets or less computationally intensive tasks.

Attributes:

Name Type Description
img_shape tuple

The shape of the input images (height, width).

num_classes int

The number of output classes for classification.

Methods:

Name Description
get_model

Builds and returns the Keras model based on the SmallXception architecture.

__str__

Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/small_xception.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
class SmallXception(ModelArchitect):
    """
    SmallXception is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is a smaller version
    of the Xception architecture, designed to be lightweight and efficient for
    smaller datasets or less computationally intensive tasks.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the SmallXception architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        inputs = keras.Input(shape=input_shape)

        # Entry block
        x = layers.Rescaling(1.0 / 255)(inputs)
        x = layers.Conv2D(128, 3, strides=2, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)

        previous_block_activation = x  # Set aside residual

        for size in [256, 512, 728]:
            x = layers.Activation('relu')(x)
            x = layers.SeparableConv2D(size, 3, padding='same')(x)
            x = layers.BatchNormalization()(x)

            x = layers.Activation('relu')(x)
            x = layers.SeparableConv2D(size, 3, padding='same')(x)
            x = layers.BatchNormalization()(x)

            x = layers.MaxPooling2D(3, strides=2, padding='same')(x)

            # Project residual
            residual = layers.Conv2D(size, 1, strides=2, padding='same')(
                previous_block_activation
            )
            x = layers.add([x, residual])  # Add back residual
            previous_block_activation = x  # Set aside next residual

        x = layers.SeparableConv2D(1024, 3, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)

        x = layers.GlobalAveragePooling2D()(x)

        x = layers.Dropout(0.25)(x)
        outputs = layers.Dense(self.num_classes, activation='softmax')(x)

        return keras.Model(inputs, outputs)

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: Alex Net

Bases: ModelArchitect

AlexNet is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the original AlexNet architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105.

Attributes:

Name Type Description
img_shape tuple

The shape of the input images (height, width).

num_classes int

The number of output classes for classification.

Methods:

Name Description
get_model

Builds and returns the Keras model based on the AlexNet architecture.

__str__

Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/alex_net.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
class AlexNet(ModelArchitect):
    """
    AlexNet is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    original AlexNet architecture, designed to handle large-scale image classification
    tasks with high computational efficiency.

    Reference:
        Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
        ImageNet Classification with Deep Convolutional Neural Networks.
        Advances in Neural Information Processing Systems, 25, 1097-1105.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the AlexNet architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        inputs = keras.Input(shape=input_shape)

        # x = keras.models.Sequential()

        # Entry block
        # Layer 1: Convolutional layer with 64 filters of size 11x11x3
        x = layers.Rescaling(1.0 / 255)(inputs)
        x = layers.Conv2D(
            filters=64,
            kernel_size=(11, 11),
            strides=(4, 4),
            padding='valid',
            activation='relu',
        )(x)

        # Layer 2: Max pooling layer with pool size of 3x3
        x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)

        # Layer 3-5: 3 more convolutional layers with similar structure as Layer 1
        x = layers.Conv2D(
            filters=192, kernel_size=(5, 5), padding='same', activation='relu'
        )(x)
        x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)
        x = layers.Conv2D(
            filters=384, kernel_size=(3, 3), padding='same', activation='relu'
        )(x)
        x = layers.Conv2D(
            filters=256, kernel_size=(3, 3), padding='same', activation='relu'
        )(x)
        x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)

        # Layer 6: Fully connected layer with 4096 neurons
        x = layers.Flatten()(x)
        x = layers.Dense(4096, activation='relu')(x)

        # Layer 7: Fully connected layer with 4096 neurons
        x = layers.Dense(4096, activation='relu')(x)
        outputs = layers.Dense(self.num_classes, activation='softmax')(x)

        return keras.Model(inputs, outputs)

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: Dense Net 121

Bases: ModelArchitect

DenseNet121 is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the DenseNet121 architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely Connected Convolutional Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269.

Attributes:

Name Type Description
img_shape tuple

The shape of the input images (height, width).

num_classes int

The number of output classes for classification.

Methods:

Name Description
get_model

Builds and returns the Keras model based on the DenseNet121 architecture.

__str__

Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/dense_net_121.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class DenseNet121(ModelArchitect):
    """
    DenseNet121 is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    DenseNet121 architecture, designed to handle large-scale image classification
    tasks with high computational efficiency.

    Reference:
        Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017).
        Densely Connected Convolutional Networks.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the DenseNet121 architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        return keras.applications.densenet.DenseNet121(
            weights=None,
            input_shape=(self.img_shape[0], self.img_shape[1], 3),
            classes=self.num_classes,
        )

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: GoogleLeNet

Bases: ModelArchitect

GoogleLeNet is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the GoogleLeNet (Inception v1) architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9.

Attributes:

Name Type Description
img_shape tuple

The shape of the input images (height, width).

num_classes int

The number of output classes for classification.

Methods:

Name Description
get_model

Builds and returns the Keras model based on the GoogleLeNet architecture.

__str__

Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/google_le_net.py
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
class GoogleLeNet(ModelArchitect):
    """
    GoogleLeNet is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    GoogleLeNet (Inception v1) architecture, designed to handle large-scale image
    classification tasks with high computational efficiency.

    Reference:
        Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015).
        Going deeper with convolutions.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the GoogleLeNet architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        # input layer
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        input_layer = keras.Input(shape=input_shape)

        # convolutional layer: filters = 64, kernel_size = (7,7), strides = 2
        X = layers.Conv2D(
            filters=64,
            kernel_size=(7, 7),
            strides=2,
            padding='valid',
            activation='relu',
        )(input_layer)

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # convolutional layer: filters = 64, strides = 1
        X = layers.Conv2D(
            filters=64,
            kernel_size=(1, 1),
            strides=1,
            padding='same',
            activation='relu',
        )(X)

        # convolutional layer: filters = 192, kernel_size = (3,3)
        X = layers.Conv2D(
            filters=192, kernel_size=(3, 3), padding='same', activation='relu'
        )(X)

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # 1st Inception block
        X = Inception_block(
            X,
            f1=64,
            f2_conv1=96,
            f2_conv3=128,
            f3_conv1=16,
            f3_conv5=32,
            f4=32,
        )

        # 2nd Inception block
        X = Inception_block(
            X,
            f1=128,
            f2_conv1=128,
            f2_conv3=192,
            f3_conv1=32,
            f3_conv5=96,
            f4=64,
        )

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # 3rd Inception block
        X = Inception_block(
            X,
            f1=192,
            f2_conv1=96,
            f2_conv3=208,
            f3_conv1=16,
            f3_conv5=48,
            f4=64,
        )

        # Extra network 1:
        X1 = layers.AveragePooling2D(pool_size=(5, 5), strides=3)(X)
        X1 = layers.Conv2D(
            filters=128, kernel_size=(1, 1), padding='same', activation='relu'
        )(X1)
        X1 = layers.Flatten()(X1)
        X1 = layers.Dense(1024, activation='relu')(X1)
        X1 = layers.Dropout(0.7)(X1)
        X1 = layers.Dense(5, activation='softmax')(X1)

        # 4th Inception block
        X = Inception_block(
            X,
            f1=160,
            f2_conv1=112,
            f2_conv3=224,
            f3_conv1=24,
            f3_conv5=64,
            f4=64,
        )

        # 5th Inception block
        X = Inception_block(
            X,
            f1=128,
            f2_conv1=128,
            f2_conv3=256,
            f3_conv1=24,
            f3_conv5=64,
            f4=64,
        )

        # 6th Inception block
        X = Inception_block(
            X,
            f1=112,
            f2_conv1=144,
            f2_conv3=288,
            f3_conv1=32,
            f3_conv5=64,
            f4=64,
        )

        # Extra network 2:
        X2 = layers.AveragePooling2D(pool_size=(5, 5), strides=3)(X)
        X2 = layers.Conv2D(
            filters=128, kernel_size=(1, 1), padding='same', activation='relu'
        )(X2)
        X2 = layers.Flatten()(X2)
        X2 = layers.Dense(1024, activation='relu')(X2)
        X2 = layers.Dropout(0.7)(X2)
        X2 = layers.Dense(1000, activation='softmax')(X2)

        # 7th Inception block
        X = Inception_block(
            X,
            f1=256,
            f2_conv1=160,
            f2_conv3=320,
            f3_conv1=32,
            f3_conv5=128,
            f4=128,
        )

        # max-pooling layer: pool_size = (3,3), strides = 2
        X = layers.MaxPooling2D(pool_size=(3, 3), strides=2)(X)

        # 8th Inception block
        X = Inception_block(
            X,
            f1=256,
            f2_conv1=160,
            f2_conv3=320,
            f3_conv1=32,
            f3_conv5=128,
            f4=128,
        )

        # 9th Inception block
        X = Inception_block(
            X,
            f1=384,
            f2_conv1=192,
            f2_conv3=384,
            f3_conv1=48,
            f3_conv5=128,
            f4=128,
        )

        # Global Average pooling layer
        X = layers.GlobalAveragePooling2D(name='GAPL')(X)

        # Dropoutlayer
        X = layers.Dropout(0.4)(X)

        # output layer
        X = layers.Dense(1000, activation='softmax')(X)

        # model
        return keras.Model(input_layer, [X, X1, X2], name='GoogLeNet')

    def __str__(self):
        super().__str__()
        self.get_model().summary(show_trainable=True)

Model: Model Soup

Bases: ModelArchitect

ModelSoup is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the ResNet50 architecture, designed to handle large-scale image classification tasks with high computational efficiency.

Reference

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

Attributes:

Name Type Description
img_shape tuple

The shape of the input images (height, width).

num_classes int

The number of output classes for classification.

Methods:

Name Description
get_model

Builds and returns the Keras model based on the ResNet50 architecture, as created by the Model Soup.

__str__

Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/model_soup.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class ModelSoup(ModelArchitect):
    """
    ModelSoup is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    ResNet50 architecture, designed to handle large-scale image classification
    tasks with high computational efficiency.

    Reference:
        He, K., Zhang, X., Ren, S., & Sun, J. (2016).
        Deep residual learning for image recognition.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the ResNet50 architecture, as created by the Model Soup.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        input_shape = (self.img_shape[0], self.img_shape[1], 3)
        model = keras.applications.ResNet50(
            input_shape=input_shape, include_top=False, weights=None
        )
        flatten = keras.layers.GlobalAveragePooling2D()(model.output)
        drop_out = keras.layers.Dropout(0.5)(flatten)
        dense = keras.layers.Dense(2048, activation='relu')(drop_out)
        prediction = keras.layers.Dense(
            self.num_classes, activation='softmax', name='prediction'
        )(dense)

        return keras.Model(model.input, prediction)

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output

Model: Res Net 50

Bases: ModelArchitect

ResNet50 is a custom model architecture for image classification tasks, inheriting from the ModelArchitect base class. This model is based on the ResNet50 architecture, designed to handle large-scale image classification tasks with high accuracy and efficiency.

Reference

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

Attributes:

Name Type Description
img_shape tuple

The shape of the input images (height, width).

num_classes int

The number of output classes for classification.

Methods:

Name Description
get_model

Builds and returns the Keras model based on the ResNet50 architecture.

__str__

Returns a string representation of the model, including a summary of the model architecture with trainable parameters.

Source code in cucaracha/ml_models/image_classification/res_net_50.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class ResNet50(ModelArchitect):
    """
    ResNet50 is a custom model architecture for image classification tasks,
    inheriting from the ModelArchitect base class. This model is based on the
    ResNet50 architecture, designed to handle large-scale image classification
    tasks with high accuracy and efficiency.

    Reference:
        He, K., Zhang, X., Ren, S., & Sun, J. (2016).
        Deep residual learning for image recognition.
        Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.

    Attributes:
        img_shape (tuple): The shape of the input images (height, width).
        num_classes (int): The number of output classes for classification.

    Methods:
        get_model():
            Builds and returns the Keras model based on the ResNet50 architecture.
        __str__():
            Returns a string representation of the model, including a summary of the
            model architecture with trainable parameters.
    """

    def __init__(self, **kwargs):
        super().__init__(modality='image_classification', **kwargs)
        self.img_shape = kwargs.get('img_shape')
        self.num_classes = kwargs.get('num_classes')

    def get_model(self):
        return keras.applications.resnet50.ResNet50(
            weights=None,
            input_shape=(self.img_shape[0], self.img_shape[1], 3),
            classes=self.num_classes,
        )

    def __str__(self):
        output = super().__str__()
        self.get_model().summary(show_trainable=True)
        return output