Autor tekstu eksperckiego: Michał Kędra
Face recognition has become a fierce topic in recent years. Whether it’s used for unlocking our smartphones, enhancing security at airports or aiding forensic investigations, face recognition technology is both widely used and widely discussed. Nowadays, in the majority of workplaces, access to secure areas can be gained using access cards or PINs. Automatic face detection is likely to replace these solutions in the future for our convenience. Siamese Networks, which have been gaining popularity recently, are one of the many possible options that can be used when it comes to the face recognition problem. In this article, I will describe what the Siamese Network is and how it can help us with identity verification.
[https://recfaces.com/articles/what-is-facial-recognition-used-for]
[https://www.baeldung.com/cs/siamese-networks]
Now I will explain how the Siamese Network works in the context of face recognition. From the beginning, it takes two face images as inputs and passes them through the identical sub-networks. In our case, both networks are simply Convolutional Neural Networks (CNN), due to the fact that they are very effective at extracting features from the images we are dealing with. Let’s treat CNN as some magic black boxes – we put there image data and we get a vector with features of this image (its shape, colours etc.). Our sub-networks extract these features from the face images during this phase and produce two image embeddings, which are simply vectors containing information about image features. To make things easier, we can obtain a vector containing 3 elements, where the following elements refer to selected parts of the face (hair, nose, mouth). For instance:
– vector [0.13, 2.21, 1.13] can indicate that a person has no hair, a big nose and a small mouth,
– vector [1.33, 1.03, 2.98] can indicate that a person has black straight hair, a small nose and a big mouth.
These output vectors are then compared using a distance metric to determine how similar or different the face images are. The most commonly used distance metric is the Euclidean distance. In the end, we use a sigmoid function to convert the calculated distance to a similarity score with values between 0 and 1, which indicates how similar or different the two input images are. A similarity score equal to 0 indicates no similarity, a similarity score equal to 1 indicates full similarity and a value between 0 and 1 is interpreted accordingly. In this way, we can predict if two face images belong to the same person or not.
Now, we know how to use the Siamese Network to make predictions, but how can we train our network to make correct predictions? Let’s imagine that we have face images of several people that we want to distinguish. We need to pass positive pairs and negative pairs to the network during the learning phase. Positive pairs are those where both images belong to the same person. In this case, we are passing to our network the information that the similarity should be high (more precisely 1) on the output of the network. On the other hand, in the case of negative pairs, the images should belong to two different people and the network gets information that the similarity should be equal to 0 on the output of the network. Presuming that we have images of several people to train our network on, we could build positive pairs for each person and many negative pairs between each pair of people. If we have a dataset containing 5 classes where each class contains only 20 observations, we can generate up to 4950 unique pairs (positive and negative pairs combined) for the training phase of the Siamese Network.
Based on the data provided, our Siamese Network optimizes the chosen loss function and selects the appropriate parameters for our final model. The loss function tells us how good our model is by measuring the difference between predicted and actual values. Contrastive loss is a popular loss function choice for Siamese Networks, which is given by a formula:
Where:
– Y is either 0 or 1. It will be equal to 0 if our input images belong to different persons and it will be 1 if the images belong to the same person,
– D is our Euclidean distance between images calculated by the Siamese Network,
– Margin is a const value (greater than 0), which does not allow the distance between two images belonging to different classes to have more influence than the margin value.
Usually, in machine learning, we want to minimize our loss function and it applies to our case. Suppose that we have images belonging to the same person, in this case Y=1. It would mean that the second part of the equation would be 0. Then, the result of this formula will derive directly from the left part – it is equal to D2. Our loss function will give us low values (we expect it as we minimize this function) if our Siamese Network returns a small distance between images of the same person. Once the Siamese Network is trained, we can then use it to verify identities.
Siamese Networks have some advantages over the traditional single neural network. Let’s imagine that we want to build an attendance system for a company, and we learned a traditional neural network to predict multiple classes. This poses a problem when a new employee joins the company, and we need to take into account his face images (we need to add a new class). In this case, we have to update the neural network and train it again on the whole dataset. This might not be a perfect solution. Siamese Network instead of classifying a face image to one of the classes, takes a reference image of the person as input and calculates a similarity score, which tells us whether the two input images represent the same person. What’s more, Siamese Networks can usually work well when we have a small amount of data, whereas traditional neural networks require much more data. We can also use the trained Siamese Network as a feature extractor by using one of the sub-networks to process images into the previously mentioned image embeddings and then use some algorithms like Support Vector Machines (SVM) or K-Nearest Neighbors (KNN) to create a classification model.
Hopefully, you find the concept of the Siamese Network interesting. Its application at face recognition problem and many other areas makes it a reliable tool.
Thanks for reading!
References:
Digital Fingerprints S.A. ul. Gliwicka 2, 40-079 Katowice. KRS: 0000543443, Sąd Rejonowy Katowice-Wschód, VIII Wydział Gospodarczy, Kapitał zakładowy: 4 528 828,76 zł – opłacony w całości, NIP: 525-260-93-29
Biuro Informacji Kredytowej S.A., ul. Zygmunta Modzelewskiego 77a, 02-679 Warszawa. Numer KRS: 0000110015, Sąd Rejonowy m.st. Warszawy, XIII Wydział Gospodarczy, kapitał zakładowy 15.550.000 zł opłacony w całości, NIP: 951-177-86-33, REGON: 012845863.
Biuro Informacji Gospodarczej InfoMonitor S.A., ul. Zygmunta Modzelewskiego 77a, 02-679 Warszawa. Numer KRS: 0000201192, Sąd Rejonowy m.st. Warszawy, XIII Wydział Gospodarczy, kapitał zakładowy 7.105.000 zł opłacony w całości, NIP: 526-274-43-07, REGON: 015625240.