Replace a normal convolutional layer with a depthwise separable convolutional

quangngoc

A depthwise separable convolutional layer (often referred to as Depthwise Separable Convolution or Depthwise Convolution) is a type of convolutional layer that is designed to reduce the number of parameters compared to a traditional convolutional layer. This reduction in parameters is achieved by separating the convolution operation into two distinct stages: depthwise convolution and pointwise convolution. Let's break down how this reduction occurs with an example:

Consider a traditional convolutional layer with the following specifications:

Input feature maps: 64 channels
Kernel size: 3x3
Number of filters: 128

In this scenario, each filter in the traditional convolutional layer processes the entire input volume with all 64 input channels. This results in 3x3x64 = 576 learnable parameters per filter.

Now, let's replace this traditional convolutional layer with a depthwise separable convolutional layer with the same specifications:

Input feature maps: 64 channels
Kernel size: 3x3
Number of filters: 128

A depthwise separable convolutional layer consists of two parts:

Depthwise Convolution: In the depthwise convolution step, each input channel is convolved with its corresponding filter separately. In this example, there are 64 input channels, and each channel is convolved with its own 3x3 depthwise filter. This means there are 64 sets of 3x3 depthwise filters. Since each set has 3x3 parameters, the total number of parameters for depthwise convolution is 64 * (3x3) = 576 parameters.
Pointwise Convolution: After the depthwise convolution, a 1x1 pointwise convolution is applied to combine the output channels from the depthwise step into the desired number of output channels (in this case, 128). The pointwise convolution uses 1x1 filters for this purpose. Each output channel is a linear combination of the depthwise-filtered channels. For each output channel, there are 64 * 1x1 parameters (64 multiplicative weights). Since there are 128 output channels, the total number of parameters for the pointwise convolution is 64 * 1x1 * 128 = 8192 parameters.

Now, let's compare the total number of parameters in the depthwise separable convolutional layer with the traditional convolutional layer:

Traditional Convolutional Layer: 3x3x64x128 = 294,912 parameters
Depthwise Separable Convolutional Layer: 576 (depthwise) + 8192 (pointwise) = 8,768 parameters

As you can see, the depthwise separable convolutional layer significantly reduces the number of parameters compared to the traditional convolutional layer while still capturing spatial and channel-wise information. This reduction in parameters can be advantageous in scenarios where model size, computational efficiency, and memory usage are critical considerations, such as in mobile or edge devices or when designing lightweight neural networks.