Image rotation: model for angle detection using pytorch

Basically I am trying to create a model that will detect angle on which a specific image is rotated. Also I have a dataset of 1500 documents resulting with images rotated on

random.sample([0, 90, -90, 180], 2)

and each of this angles has variation of

random.uniform(-10, 10)

resulting in ~4k rotated images.

So I've come up with current model to predict sin and cos of the desired angle:

class CnnRotateRegression(nn.Module):
    def __init__(self):
        super(CnnRotateRegression, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3)
        self.conv4 = nn.Conv2d(256, 512, kernel_size=3)
        self.conv5 = nn.Conv2d(512, 512, kernel_size=3)
        
        self.bn1 = nn.BatchNorm2d(64)
        self.bn2 = nn.BatchNorm2d(128)
        self.bn3 = nn.BatchNorm2d(256)
        self.bn4 = nn.BatchNorm2d(512)
        self.bn5 = nn.BatchNorm2d(512)
        
        self.activation = nn.ReLU()
        self.pool = nn.AvgPool2d(kernel_size=2)
        self.pool2 = nn.AdaptiveAvgPool2d((8,8))

        self.linear_l1 = nn.Linear(512*8*8, 512)
        self.linear_l2 = nn.Linear(512, 256)
        self.linear_l3 = nn.Linear(256, 2) # sin + cos

    def forward(self, x):
        x = self.activation(self.pool(self.bn1(self.conv1(x))))
        x = self.activation(self.pool(self.bn2(self.conv2(x))))
        x = self.activation(self.pool(self.bn3(self.conv3(x))))
        x = self.activation(self.pool(self.bn4(self.conv4(x))))
        x = self.activation(self.pool(self.bn5(self.conv5(x))))
        
        x = self.pool2(x)
        x = x.view(x.size(0), -1)
        
        x = self.activation(self.linear_l1(x))
        x = self.activation(self.linear_l2(x))
        x = self.linear_l3(x)

        x = F.normalize(x, p=2, dim=1)

        return x

training part:

model = CnnRotateRegression()
model = model.to(device)

loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
num_of_epochs = 11


for epoch in range(num_of_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in tqdm(train_Loader, desc="training loop"):

        images, labels = images.to(device), labels.to(device).float()

        angles = angle_to_sin_cos(labels)
        norm_angles = F.normalize(angles, p=2, dim=1)

        optimizer.zero_grad()
        outputs = model(images)
        loss = loss_function(outputs, norm_angles)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    train_loss = running_loss / len(train_Loader)

functions to convert sin and cos to angle and vice versa:

def angle_to_sin_cos(angle):
    tensor_angle = angle.clone().detach()
    radian = tensor_angle * torch.pi / 180.0
    return torch.stack([torch.cos(radian), torch.sin(radian)], dim=1)

def sin_cos_to_angle(outputs):
    cos_val, sin_val = outputs[:, 0], outputs[:, 1]
    angle_rad = torch.atan2(sin_val, cos_val)
    angle_deg = angle_rad * (180 / torch.pi)
    return angle_deg

My model performs poorly in determining small angles in range +-10 degrees. What would You suggest to improve/enhance to achieve better "small-degree" prediction? Thank you in advance!

Answer

It is very hard to say what exactly is happening here without knowing the exact images, but when it comes to smaller angle rotations there is going to be some data lost. Square images will not have any data loss on multiples of 90 degree rotations, but anything off of 90 degrees will necessarily lose information around two opposing corners. I do not know how you are filling in this data, or if you are simply resizing the existing images till they are square again by some sort of zooming or filling function.

Having trouble with small angles might either mean that the whole image is being memorized and when a part of it is not present that it makes angle calculations harder OR however the missing data is being filled in might be throwing off any ability to recognize angle changes. If I am understanding the torch code correctly there are just a bunch of conv layers and some linear activations. This should mean that every part of every training image is getting the same amount of attention, which may be the thing that is throwing off small angle detection since corners are getting over-weighted if that same corner data is not present in the test data set.

Potential Solution:
If you consider for a moment a square image being freely rotated from 0 to 360 degrees, you will see that a perfect circle of that square is always in frame. The radius of that circle is a half length of the square. The area that is outside of that circle will still be in frame somewhere between 1% of the time (furthest corners that sit right on the edge) to 99% of the time (spot closest to the circle, furthest from the edges). In this case you can see that the circle itself will always be in frame and certain areas outside the circle will be in frame to varying degrees. To improve your approach I would suggest to retrain the model with 100% attention on that inner circle and have progressively less attention towards the edges, with a weighting based on how much a certain pixel location is in-frame for a 0 - 360 degree rotation. When the corners are out of frame, they are not available for the NN to evaluate, thus, giving them progressively lower attention will help train the model on the area that is always available, the inner circle. If the NN can focus all of its learning on that inner circle I would expect it to become better with evaluating smaller angle changes!

Answer

Enjoyed this article?