Tag Archives: deep learning

Deep learning and vision, from simple manipulation to image classification: Part 2

Introduction:

Now that we have revisited some basic concepts related to computer vision in our previous post, it is time to move forward and explore more sophisticated algorithms that will recognize either a dog or cat in a given image.

Through this post, we will work with the dogs vs. cats problem from Kaggle and its data, which can be found here. You’ll need to register with Kaggle in order to download the train and test data.Convolutional neural network for binomial classification - Dogs vs Cats Kaggle

After you register and download the data, we’ll perform an exploratory analysis and then build, train, and evaluate a convolutional neural network for binomial classification. The model will output 0 or 1 for the cases where it determines that the image contains a dog or cat respectively.

[Step 1] Data exploration:

As stated before, data exploration is, most of the time, the first step we need to take before we even try to come up with preliminary experiments. By just looking at the files in each of the files, train.zip and test1.zip, we’ve downloaded, we can spot the following details:

Table 1: Initial dataset observations

nitial dataset observations for a convolutional neural network model

As our test set is not labeled, it will not be possible for us to use it for getting performance metrics. The files will, therefore, be only used to generate the final submission file for the Kaggle judge.

Another important observation we can make by opening some of the images in the test and train sets is that they seem to be different in size and aspect ratio. In order to confirm this, we’ll randomly plot and compare some of them.

Snippet 1: Randomly plot images from the train set

train_path = "data/train"
images = glob.glob(os.path.join(train_path, "*.jpg"))

plt.figure(figsize=(16, 8))
for index in range(6):
    plt.subplot(2, 3, index+1)
    img_index=int(np.random.uniform(0, 24999))
    plt.imshow(plt.imread(images[img_index]))
    plt.title(images[img_index])

Figure 1: Sample images from the training set
 Sample images from the training set for a convolutional neural network model

As we run the above script several times, we observe that our intuition was right: images differ from each other in size and aspect ratio. Normalization seems to be needed but several questions arise almost immediately: What size would we use for resizing and normalizing all the images so they can later be used to train our model? Wouldn’t the new size need to be determined so it works for both larger and smaller images? Finally, what proportion of images are small, medium, or large?

To address those questions, we prepare the following script to get the distribution over height and width (in 100-pixel ranges) for each image in the train set:

Snippet 2: Distribution oversize in the training set

max_w=0
max_h=0
min_w=2048
min_h=2048

arr_h=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
arr_w=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

for img_index in range(len(images)):
    img=Image.open(images[img_index]).size
    img_w=img[0]
    img_h=img[1]
    
    arr_w[int(img_w / 100)-1] += 1
    arr_h[int(img_h / 100)-1] += 1

    if img_w > max_w: max_w = img_w
    elif img_w < min_w: min_w = img_w if img_h > max_h: max_h = img_h
    elif img_h < min_h: min_h = img_h

print("Max Width: %i - Min Width: %i \nMax Height: %i - Min Height: %i" % (max_w, min_w, max_h, min_h))

If we plot the arr_w and arr_h vectors containing the number of images with width and height ranging from 0 to 1,000 pixels (in 100-pixel intervals), we observe that the majority of them are smaller than 400 x 400 pixels.

Figure 2: Height and width distributions

Height and width distributions for a convolutional neural network model

We can now come up with a strategy for resizing and padding our images. This is the only preprocessing task we’ll do before training our convolutional neural network. The resizeImg and padImg functions will maintain the original aspect ratio for each image while padding, if necessary, for images with different aspect ratios:

Snippet 3: Resizing and padding functions

resize_default=64

def resizeImg(image):
    
    img_w=image.size[0]
    img_h=image.size[1]
    slot = int(img_w/100) +1 if img_w > img_h else int(img_h/100) +1 
    
    if slot!=0:
        if img_w >= img_h:
            img = image.resize((resize_default,int(resize_default*img_h/img_w)), Image.ANTIALIAS)
        else:
            img = image.resize((int(resize_default*img_w/img_h),resize_default), Image.ANTIALIAS)
        
    return img;

def padImg(image):
    
    img_w=image.size[0]
    img_h=image.size[1]
    
    if img_w > resize_default or img_h > resize_default:
        if img_w >= img_h:
            new_size = (img_w, img_w)
        else:
            new_size = (img_h, img_h)
    else:
        new_size = (resize_default, resize_default)
        
    img = Image.new("RGB", new_size)
    img.paste(image, (int((new_size[0]-img_w)/2),int((new_size[1]-img_h)/2)))
        
    return img;


#testImage = Image.open(images[int(np.random.uniform(0, 24999))])
testImage = Image.open(images[468])
resized = resizeImg(testImage)
padded = padImg(resized)

plt.figure(figsize=(12, 8))
plt.subplot(1, 3, 1)
plt.imshow(testImage)
plt.title("Original")
plt.subplot(1, 3, 2)
plt.imshow(resized)
plt.title("Resized")
plt.subplot(1, 3, 3)
plt.imshow(padded)
plt.title("Padded")

Calling both functions will have the following output:

Figure 3: Padding and resizing of images

Padding and resizing of images for a convolutional neural network model

All images will be resized to 64×64 pixels and padded vertically or horizontally, if necessary. We can batch process all images as a preliminary step or include the functions right before we provide the samples to the trainer when fitting the model.

[Step 2] Building the convolutional neural network:

Up to this point, we’re familiar with convolutions for image processing. We’ve also explored the data we have available and decided that padding and resizing are needed in order to provide our model with a normalized input pattern. The 64×64 pixel image equals to 4,096 features (input neurons), which means we need to fit it into a 2-class classifier. It means that for every 64×64 pixel image we feed into the convolutional network, it’ll try to predict whether the input data belong to the classes cat or dog.

In addition to the two functions we’ve already seen for resizing and padding, we’ll need some other ones before we train the network. The get_label and getXYBatch functions shown in Snippet 4 are explained below:

Get_label: as we’ll get an output vector for every input pattern (or image), it will have a 2-element vector shape. There are only two possible values for the resulting vector: [0, 1] and [1, 0]. The first one will count as “cat” whereas the second one will count as “dog” in terms of the result the network is predicting.

getXYBatch: given our computers don’t have infinite memory, allocating all 25,000 images for training is just not possible. We will resize and pad batches of 60-to-500 images and then feed the trainer with them in the training steps.

Snippet 4: get_label and getXYBatch functions

# extract labels
# positives = [1, 0], negatives = [0, 1]
def get_label(path):
    if path.split('/')[-1:][0].startswith('cat'): 
        return np.array([1, 0])
    else:
        return np.array([0, 1])

def getXYBatch(X_input, Y_input, batch_size):   
    X_array = np.array(padImg(resizeImg(Image.open(X_input[0])))).reshape([-1]) / 255
    Y_array = Y_input[0]

    choice = np.random.choice(range(len(X_input)), batch_size, replace=False)
    for item in choice:
        tmpimg = np.array(padImg(resizeImg(Image.open(X_input[item])))).reshape([-1]) / 255
        X_array = np.vstack((X_array, tmpimg))
        Y_array = np.vstack((Y_array,Y_input[item]))

    X_array = X_array[1:]
    Y_array = Y_array[1:]
    
    X_array = X_array.reshape([-1,resize_default,resize_default,3])
    
    return X_array, Y_array;

Now we split the train set into two parts for actual training but also for validation. We’ll use 10% of the training images to measure how well the model is performing after, let’s say, 100 iterations. The following code will do it for us:

Snippet 5: Splitting the training set

train_path = "data/train"
images = glob.glob(os.path.join(train_path, "*.jpg"))
random.shuffle(images)

# extract pixels
data_images = images
        
data_labels = np.array([get_label(p) for p in images])
data_labels_out = np.argmax(data_labels, 1)

print("Positive samples: %i\nNegative samples: %i \n" % (len(data_labels_out)-np.count_nonzero(data_labels_out)
                                                      , np.count_nonzero(data_labels_out)))
#Split Data Sets
X_train, X_test, y_train, y_test = train_test_split(data_images, data_labels, test_size=0.2)
y_train_out = np.argmax(y_train, 1)
y_test_out = np.argmax(y_test, 1)

Finally, before jumping into the model’s code itself, assuming we’re excited about it, we’ll define some convenience functions to simplify the layers’ construction:

dropout: turn off hidden neurons given a probability (only in the training phase).
weight_variable: variables for the neurons’ weights.
bias_variable: variables for the neurons’ biases.
conv2d: convolution between the input and weights, with strides 1 and padding ‘SAME’
max_pool_2x2: max pooling operation, keeps only the maximum elements after each convolutional layer.

Snippet 6: Common tensorflow methods

def dropout(x, prob, train_phase):
    return tf.cond(train_phase, 
                   lambda: tf.nn.dropout(x, prob),
                   lambda: x)

def weight_variable(shape):
  return tf.Variable(tf.truncated_normal(shape, stddev=0.1))

def bias_variable(shape):
  return tf.Variable(tf.constant(0.1, shape=shape))

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding='SAME')

Now, let’s build the layers of the network. Our model will have an input layer followed by convolution and max-pooling layers. In the last part of the network architecture, we will flatten the feature maps and have a fully connected layer. A representation of the model is shown in Figure 4.

Figure 4: Neural Network Architecture

Convolutional neural network architecture

We define two x and y variables for the 64×64 pixel images. As they use the RGB schema (3 channels), the final shape for the input layer will be 64x64x3.

Snippet 7: Network implementation

sess = tf.InteractiveSession()

# tf Graph Input
x = tf.placeholder(tf.float32, [None,64,64,3]) 
y = tf.placeholder(tf.float32, [None, 2])

# dropout placeholder
keep_prob = tf.placeholder(tf.float32)

# train flag placeholder
train_phase = tf.placeholder(tf.bool) # For Batch Normalization

# Set model weights
W1 = weight_variable([3, 3, 3, 32])
b1 = bias_variable([32])

W2 = weight_variable([3, 3, 32, 64])
b2 = bias_variable([64])

W3 = weight_variable([3, 3, 64, 64])
b3 = bias_variable([64])

W4 = weight_variable([16 * 16 * 64, 512])
b4 = bias_variable([512])

W5 = weight_variable([512, 2])
b5 = bias_variable([2])

# hidden layers
conv1 = tf.nn.relu(conv2d(x, W1) + b1)
maxp1 = max_pool_2x2(conv1)

conv2 = tf.nn.relu(conv2d(maxp1, W2) + b2)
#maxp2 = max_pool_2x2(conv2)

conv3 = tf.nn.relu(conv2d(conv2, W3) + b3)
maxp3 = max_pool_2x2(conv3)

# fully connected
maxp3_flat = tf.reshape(maxp3, [-1, 16 * 16 * 64])

full1 = tf.nn.relu(tf.matmul(maxp3_flat, W4) + b4)
drop1 = tf.nn.dropout(full1, keep_prob)

#output
output = tf.matmul(drop1, W5) + b5
softmax=tf.nn.softmax(output)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y))

all_variables = tf.trainable_variables() 

As describing each of the functions and methods used will be tedious and make this post super long, feel free to browse the Tensor Flow official documentation for those that you are interested in: https://www.tensorflow.org/api_docs/.

You may also want to revisit some concepts related to earning and optimization such as Loss Functions, Stochastic Gradient Descent, and Cross Entropy.

[Step 3] Training time:

Now we just need to define some hyperparameters and let the trainer fit the model to our training data. We’ll display the model accuracy after every 50 steps. Running the snippet below will show the training progress as shown in Figure 5.

Snippet 8: Trainer

# Hyper-parameters
training_steps = 2000
batch_size = 500
display_step = 100

# Mini-batch Gradient Descent
training_accuracy = []
training_loss     = []

for i in range(training_steps):
    
    X,Y = getXYBatch(X_train,y_train,batch_size)
    
    batch_accuracy, batch_loss, _ = sess.run([accuracy, loss, train_step],
                                             feed_dict={x:X, y:Y, train_phase: True, keep_prob: 1.0})
    training_accuracy.append(batch_accuracy)
    training_loss.append(batch_loss)
    # Displaying info
    if (i+1)%display_step == 0 or i == 0:
        print("Step %05d: accuracy=%.4f\tloss=%.6f\tlearning rate=%.6f" %
              (i+1, batch_accuracy, batch_loss, learning_rate.eval()))

save_path = saver.save(sess, "./saved/model2K .ckpt")
print("Model saved in file: %s" % save_path)      
        
plt.figure(figsize=(10,4))
plot_titles = ["Training accuracy", "Training Loss"]
for i, plot_data in enumerate([training_accuracy, training_loss]):
    plt.subplot(1, 2, i+1)
    plt.plot(plot_data)
    plt.title(plot_titles[i])

Figure 5: Progress while training

Progress while training - Convolutional neural network

We can also plot the accuracy and loss at each training step. In an ideal scenario, the accuracy will become incremental over time, whereas the loss will decrease.

Figure 6: Training accuracy and Loss

Training accuracy and Loss - Convolutional neural network

[Step 4] Using the model with different images:

Our final test consists of using the model with a completely new image that the model hasn’t seen before. We can browse for cats or dog on the internet and pass the images to the classifier using the following code:

Snippet 9: Using the model

y=tf.nn.softmax(output)
test_img = Image.open(X_test[int(np.random.uniform(0, len(X_test)))])

input_array = np.array(padImg(resizeImg(test_img))).reshape([-1]) / 255
input_array = input_array.reshape([-1,64,64,3])

softmax=tf.nn.softmax(output)
prediction = sess.run(tf.argmax(y, 1), feed_dict={x: input_array, train_phase: False, keep_prob: 1.0})
print("Predicted: " + ("Cat" if prediction[0]==0 else "Dog"))
test_img

Figure 7: Model output with an unseen image:

Model output with an unseen image - Convolutional neural network

Hopefully, the model will predict accurately the class (cat or dog) for each image we input. However, there are several other techniques we can use from this point in order to make a more precise model.

Discussion:

In this post, we’ve built a convolutional neural network model capable of classifying images based on whether they contain a cat or dog. While we didn’t revisit all the terms and concepts required to fully understand what we coded and why, it’s a good starting point to see how these techniques can be used in real-life scenarios. Have you ever seen a captcha asking you to click on images containing, let’s say, cars, in order to verify you are not a bot? Can you think of other possible use cases for this type of binary classification?

Find the full code of this post at: https://github.com/gariem/samples/tree/master/meetup/santex-machinelearning/cats-dogs

Deep learning and vision: From simple manipulation to image classification – Part 1

Introduction:

When the MANIAC I computer defeated a human in a chess-like game for the first time in 1956, it was a monumental moment in history. The idea of machines being able to complete tasks by replicating how the human brain works started to gain hope and traction. Those, however, were tough times in which to even achieve discrete performance in other tasks due to the lack of data and computing power available.

Since then, a series of so-called “AI Winters” took place one right after another and  the dream of computers performing at similar levels to humans had all but vanished entirely. It wasn’t until the beginning of 2005 that AI started to regain attention with Deep Learning as the singular force propelling its growth.

Today, companies are pouring billions of dollars into AI development and intelligent machines continue to participate in real world activities every day.

In this series of posts, we will review basic concepts about image manipulation, convolutional neural networks, and Deep Learning. We will then dive deeper into the computer vision field and train a Convolutional Neural Network to recognize cats and dogs in arbitrary images, all of this using the Python programming language, TensorFlow, and several other convenience packages. If you are new to TensorFlow, you can browse through its site and examples at https://www.tensorflow.org.

[Getting ready] Setting up our environment

Through this tutorial we will make use of the Python programming language, along with several other packages and tools, including Anaconda (conda) as our environment manager. We will follow these instructions to get our environment ready:

  1. Download and install Anaconda. It is available for free download at https://www.anaconda.com/download/.
  2. Once you’ve installed Anaconda, you will need to create a conda environment and add some packages to it. Let’s call it “deeplearning.” The following commands will complete the task:
  • Download and install Anaconda. Make sure it is registered in your PATH environment variable.
  • Update Anaconda packages: conda update --all
  • Create an Anaconda environment: conda create -n deeplearning python=3.6 jupyter scikit-learn scikit-image
  • Activate the environment: source activate deeplearning
  • Update setuptools: pip install --upgrade -I setuptools
  • Find the right tfBinaryURL for you in this url and run: pip install --upgrade tfBinaryURL

Your environment should now be ready. Let’s test it by running the Jupyter Notebook and executing a simple “hello world” command:

  • Open a terminal and run: jupyter notebook
  • In the browser window opened by the Jupyter server, create a new python 3 notebook using the buttons at the top right section of the page.
  • In the new notebook input box, enter print("hello world") and then press the shift and enter keys to execute the command. You should see the “hello world” message printed right after the command input box.
  • You can also give a name to your notebook by editing the default “untitled” name in the top left section.

Figure 1: Notebook with “hello world” message

Notebook with “hello world” message for image processing

[Warm up] Image processing: Filters and Convolutions

As we will be training and using a “convolutional” neural network, it is a good idea to understand why those types of networks are named what they are. Before we build our CNN model, we will recap some of the concepts needed in image processing.

A convolution is basically a mathematical operation of two functions, having the third one as the result. Convolutions are applied in several fields including image processing and computer vision.

In the field of image processing, a convolution matrix is used for image manipulation like blurring, sharpening, or edge detection. The original image is treated as a matrix with values from 0 to 255, according to the color intensity in each pixel. For grayscale images, this matrix will have only two dimensions: WxH (Width x Height). However, for color images using the RGB scheme there will be a third dimension. The matrix will become a structure with shape WxHx3 (Width x Height x 3 RGB Channels).

Image manipulation and convolutions in practice

Despite the formal definition and all the complicated maths behind convolutions for image processing, we can understand it as a relatively simple operation similar but not equal to matrix multiplication. Let’s see two classic examples with a grayscale image:

Image Equalization:
Before we start with convolutions, we’ll warm up by doing some basic image manipulation by equalizing a grayscale image.

The image on the left has been acquired with a sensor (camera or telescope) and suffers from over-exposure. Therefore, it looks like there is too much lightness in the whole image. We need to enhance the image to the point where it looks like the one on the right side below:

Figure 2: Galaxy image before and after equalization

 Image processing: Galaxy image before and after equalization

Performing exploration of the data we have is a recommended practice in almost any discipline. As we are given an image, calculating and visualizing its histogram seems to be the most common task to start with. To obtain our grayscale image histogram, we need to count how many pixels have an intensity of 0, 1, 2 and so on, up to 255, where 0 is a totally black pixel and 255 is a completely white one. Let’s code it:

# read the image in grayscale mode “L”
img_matrix = imageio.imread('images/galaxia.jpg')
img = Image.fromarray(img_matrix, 'L')

# count pixels at each value in the 0-255 range
rows, cols = img_matrix.shape
histogram = np.zeros(256)
for x in range(0, cols):
&nbsp&nbsp for y in range(0, rows):
&nbsp&nbsp&nbsp&nbsp histogram[img_matrix[y, x]] = histogram[img_matrix[y, x]] + 1


# plot the histogram using pyplot
plt.figure(figsize=(14, 4))
ax = plt.subplot(1, 2, 1)
ax.bar(range(256), histogram)
plt.title("Image histogram")

After running the script above, we can see the histogram with a notable deviation to the right. The total of the pixels have values higher than 100. It is our goal to make the histogram look more evenly distributed or, more properly, equalize it.

Figure 3: Image histograms before and after equalization

 Image processing: Image histograms before and after equalization

The snippet below shows a simple algorithm that can be used to achieve the histogram equalization:

histogram_eq = histogram / (rows * cols)
accum = np.zeros(256)
accum[0] = histogram_eq[0]
for i in range(1, 255):
&nbsp&nbsp accum[i] = accum[i - 1] + histogram_eq[i]
image_new = np.zeros((rows, cols), dtype=np.uint8)
for x in range(rows):
&nbsp&nbsp for y in range(cols):
&nbsp&nbsp&nbsp&nbsp image_new[x, y] = np.abs(np.floor(255 * accum[img_matrix[x, y]]))

Now, to visualize the new equalized image, we need to convert the image_new array back to a grayscale image. Try it yourself!

Want a challenge? You can try applying a similar algorithm to equalize the following color image. While the principle is similar, one cannot just compute and equalize the three RGB channels independently.

Figure 4: Overexposed colorful image

 Image processing: Overexposed colorful image

Test the process yourself and show us what approach you used to address this challenge. We’d love to see your code and discuss your solution!

Image convolutions:

In our previous exercise we did a linear transformation on an image by simply modifying how the gray tones were distributed. However, that’s not all we can do with an image. Another modification we can make is replacing each pixel with the mean value of its neighbors, which is called a median filter. A median filter is a non-linear filter often used to reduce noise in images. You can read more about the median filter in this article in Wikipedia.

As we previously stated, a convolution is an operation between two functions to obtain a third one. In the image processing domain, the first function will be our original image and the second one will be a convolution matrix (also called Kernel or Filter matrix) with shape NxN where N is an even number frequently having values of 3, 5 or 7.

The animation below shows how we compute an output matrix as a result of performing a convolution with an input image matrix and a 3×3 kernel:

Animation 1: Convolution with a 3×3 Kernel

mage processing: Convolution with a 3×3 Kernel

To obtain the modified image in the shape of a numerical matrix containing grayscale values, we start by taking a subsection of the input image with the same shape as our Kernel. Then, we perform element-wise multiplications between our input sample and kernel. Finally, we add the nine products and divide the result by the sum of all values in the kernel. The initial 320 value for our output matrix is, therefore, the result of the following operation:

mage processing: Modified image in numerical matrix with grayscale values

Why do we have a top row and first column with zero values? The answer is that, in order to be able to perform the element-wise operation described before for the input elements at the border of our input matrix, we need to pad the original image with as many rows and columns as the size of our kernel matrix minus one, divided by two. So, in our example, our image will be padded with one column and one row because our kernel matrix size is three: (3-1)/2.

An important element is also introduced here and must be remembered for later: the strides. As we perform the operation in each pixel, we need to follow the same approach to visiting all the pixels in the original image, such as a sliding window. The stride will determine how many pixels we move to the right and to the bottom in each step. Most of the time the strides are one for both horizontal and vertical displacements.

You may also be wondering how one determines the shape the convolution kernel will have? There are several well-known kernels you can use for different purposes. Below you can see some of them and the result they produce when applied:

 Image processing: Kernels and their result when applied

Now, let’s code it. Try it for yourself with this simple function for padding an image and performing a convolution with any kernel matrix:

def filter_simple(source, kernel, mask_rows, mask_cols):
&nbsp&nbsp padding_rows = int((mask_rows - 1) / 2)
&nbsp&nbsp padding_cols = int((mask_cols - 1) / 2)
&nbsp&nbsp rows, cols = source.shape
&nbsp&nbsp padded = np.zeros((rows + 2 * padding_rows, cols + 2 * padding_cols), dtype=np.uint8)
&nbsp&nbsp padded[padding_rows:padding_rows + rows, padding_cols:padding_cols + cols] = source
&nbsp&nbsp result = np.zeros((rows + 2 * padding_rows, cols + 2 * padding_cols), dtype=np.uint8)
&nbsp&nbsp for i in range(padding_rows, padding_rows + rows - 1):
&nbsp&nbsp&nbsp&nbsp for j in range(padding_cols, padding_cols + cols - 1):
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp aux = padded[i - padding_rows:i + padding_rows + 1, j - padding_cols: j + padding_cols + 1]
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp out_value = 0
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp for x in range(mask_rows):
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp for y in range(mask_cols):
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp out_value = out_value + (aux[x,y] * kernel[x,y]) / np.sum(kernel)
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp result[i, j] = out_value
&nbsp&nbspresult = result[padding_rows:padding_rows + rows, padding_cols: padding_cols + cols]
&nbsp&nbspreturn result

The function is as simple as: 

# define our kernel
kernel_blur = np.matrix([[1,1,1],[1,1,1],[1,1,1]])
# kernel_edge = np.matrix([[-1,-1,-1],[-1,9,-1],[-1,-1,-1]])
image_new = filter_simple(img_matrix,kernel_blur, 3, 3)

Figure 5: Simple edge detection with convolution filters

Image processing: Simple edge detection with convolution filters

There are a couple of places where you can improve the code. Have you also noticed that the execution isn’t as fast? Do you think you can improve it? Motivate yourself and read about optimization for convolution operations. Again, we’d love to see your code and discuss any question you may have.

Most image processing and numeric libraries that are in languages such as Python also offer ready-to-use optimized functions for 2D and 3D convolutions. Check out this example in the SciPy documentation!

[Finally] Discussion

Convolutions are a key concept to understand before we move to Convolutional Neural Networks as the kernel, strides, and other parameters have a lot of importance when dealing with them. In the next post, we will see how a neural network can learn its own kernels instead of using predefined ones. The ability to perform transformations such as noise reduction and edge detection that filters add is probably one of the biggest reasons CNN has become so popular and accurate.

References:

  1. Bengio, Y. (2016). Machines Who Learn. Scientific American, 314(6), 46-51. doi:10.1038/scientificamerican0616-46
  2. https://www.tensorflow.org/
  3. https://en.wikipedia.org/wiki/Kernel_(image_processing)