Skip to content

Noise removal

This module is dedicated to algorithms that performs noise removal at the document image.

All the methods has a pattern to be an image filter, i.e. all the method must have an image as input and offer at least one image as output.

Additional data can be provided, depending on the method, which can be seen in the dedicated documentation.

sparse_dots(input, kernel_size=1)

Applies a median filter to remove sparse dots in the document.

Usually, due to digitalization artifacts, there are some appearence of high contrast and sparse noise (also known as salt and pepper noise).

This noise removal filter applies a median calculation though a kernel size defined by the user to locate these small black/white dots and correct then based on the neighboor values.

Note

The kernel size is defines as a squared-centered area, which the values allowed is only odd sequence. For instance, 1, 3, 5 and so on.

Parameters:

Name Type Description Default
input ndarray

Input image with sparse dots noise (salt and pepper)

required
kernel_size int

Kernel size in pixels. Defaults to 1.

1

Raises:

Type Description
ValueError

Kernel size must be an odd value

Returns:

Type Description
(ndarray, dict)

Output image without major sparse dots noise. This method does not return and extra information, then get an empty dict.

Source code in cucaracha/tasks/noise_removal.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def sparse_dots(input: np.ndarray, kernel_size: int = 1):
    """Applies a median filter to remove sparse dots in the document.

    Usually, due to digitalization artifacts, there are some appearence of
    high contrast and sparse noise (also known as salt and pepper noise).

    This noise removal filter applies a median calculation though a kernel size
    defined by the user to locate these small black/white dots and correct
    then based on the neighboor values.

    Note:
        The kernel size is defines as a squared-centered area, which the values
        allowed is only odd sequence. For instance, 1, 3, 5 and so on.

    Args:
        input (np.ndarray): Input image with sparse dots noise (salt and pepper)
        kernel_size (int, optional): Kernel size in pixels. Defaults to 1.

    Raises:
        ValueError: Kernel size must be an odd value

    Returns:
        (np.ndarray, dict): Output image without major sparse dots noise. This method does not return and extra information, then get an empty dict.
    """
    if kernel_size % 2 == 0:
        raise ValueError('Kernel size must be an odd value.')

    return cv.medianBlur(input, kernel_size), {}