Residual Learning on CIFAR-10 with Lasagne

After reading this paper, I decided to implement the network and do some experiments on CIFAR-10 which is small enough. This is the first time for me to do vision tasks(besides MNIST which don’t require much knowledge on vision).

After reading this issue and this issue, I decided to use lasagne library because it seems that there are still some bugs in Keras.

My first implementation was based on this code. It is quite simple and I’m not sure why the softmax layer isn’t used in the last. The implemented model is NIN(which is given by Network in Network). I changed the model to residual block:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def convLayer(l, num_filters, filter_size=(1, 1), stride=(1, 1),
nonlinearity=nonlinearity, pad='same', W=lasagne.init.HeNormal, b=lasagne.init.HeNormal):

l = conv(
l, num_filters=num_filters,
filter_size=filter_size, stride=stride,
nonlinearity=nonlinearity, pad=pad,W=lasagne.init.GlorotUniform(gain='relu'))
l = batchnorm(l)
return l

# Bottleneck architecture as descriped in paper
def bottleneck(l, num_filters, stride=(1, 1), nonlinearity=nonlinearity):
l = convLayer(
l, num_filters=num_filters, stride=stride, nonlinearity=nonlinearity)
l = convLayer(
l, num_filters=num_filters, filter_size=(3, 3), nonlinearity=nonlinearity)
l = convLayer(
l, num_filters=num_filters*4, nonlinearity=None)
return l

# Simply stacks the bottlenecks, makes it easy to model size of architecture with int n
def bottlestack(l, n, num_filters):
for i in range(n):
l = sumlayer([bottleneck(l, num_filters=num_filters), l])
l = NonlinearityLayer(l)
return l

The function bottleneck implement the architecture mentioned in the right part of Figure 5(a “bottleneck” building block).

Unfortunatelly, the accuracy is really low(less than 60%). I changed the training part using flexible learning rate. It improved the result but it is still unacceptable.

Then I started to implement the data preprocessing part, which is mentioned in the origin paper:

We follow the simple data augmentation in [24] for training: 4 pixels are padded on each side, and a 32×32 crop is randomly sampled from the padded image or its horizontal flip.

Although the report I have read given by benanne who is winner of Galaxy Challenge on Kaggle mentioned the importance of image processing, I never read the origin paper and implementation. I referred Alex Krizhevsky’s implementation on Google Code. The following is my implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def crop(X):
#reference: https://code.google.com/p/cuda-convnet/source/browse/trunk/convdata.py
ret=[]
for c in xrange(X.shape[0]):
startY, startX = np.random.randint(0, CROP_SIZE * 2 + 1), np.random.randint(0, CROP_SIZE * 2 + 1)
endY, endX = startY + 32, startX + 32
pic = np.pad(X[c],((0,0),(4,4),(4,4)),'constant')
pic = pic[:,startY:endY,startX:endX]
if np.random.randint(2)==0:
pic = pic[:,:,::-1]
ret.append(pic)
return lasagne.utils.floatX(np.array(ret))

def train_epoch(X, y):
num_samples = X.shape[0]
num_batches = int(np.ceil(num_samples / float(BATCH_SIZE)))
costs = []
correct = 0
for i in range(num_batches):
idx = range(i*BATCH_SIZE, np.minimum((i+1)*BATCH_SIZE, num_samples))
X_batch = crop(X[idx])
y_batch = y[idx]
#print X_batch.shape, y_batch.shape, y_batch.max()
cost_batch = train_fn(X_batch, y_batch)
costs += [cost_batch]

return np.mean(costs)#, correct / float(num_samples)

The realtime augmentation seems to be useful and in the future I’m planning to integrate it.

Adding the preprocessing lead to the accuracy 70%-76%. The main reason for that is the small amount of parameters(about 40k for 32 layers) in bottleneck block. I tried to adjust the number of filter maps from n/4 to n/2 in bottleneck block and the accuracy improved from 75% to 85%. Until now I don’t find the correct way to apply it in the CIFAR-10 corpus. In origin paper they used it only on ImageNet dataset.

The final result(32 layers) on CIFAR-10:
92.64% on validation set
92.38% on test set

Audun Mathias Øygard claimed to achieve 6.88% error rate on the validation set in some configuration but I cannot get this performance using his code. It really depends on the random seed.