Survey Authors: Shervin Minaee, Amirali Abdolrashidi, Hang Su, Mohammed Bennamoun, David Zhang.
Original Article: https://arxiv.org/pdf/1912.00271.pdf
Have you ever wondered how does biometric recognition work? How do our laptops and mobile devices get unlocked using fingerprints and face unlock features? Let us first understand why does a biometric recognition hold a unique place in authentication and security applications such as cellphone authentication, airport security, and forensic science.
Biometrics are physical human attributes that can be utilized to carefully recognize an individual to grant access to frameworks, gadgets or information. Unlike the token based features such as ID cards, passwords, answers to security based questions and keys, they cannot be forgotten and lost. They are almost impossible to be duplicated. Few examples of these biometric features are fingerprints, palmprints, facial features, ears, irises, voice and retinas. A person can also be recognized by using the behavioral features such as signatures, gaits, and keystroke.
Deep learning based models have been successfully achieving state-of-the-art results in many of the computer vision, natural language processing and biometric recognition problems. These models use Deep Neural Networks (DNNs) that provide end to end learning framework which learns the feature representation at multiple levels to uncover underlying patterns of the data. In this article we will see the overview of Deep Learning based works performed in Biometric Recognition.
Deep Neural Network Overview
Below is the overview of some of the most promising deep learning architectures used in Biometric Recognition by Computer vision community.
- Convolutional Neural Networks(CNNs):
Convolutional Neural Networks are one of the widely used architecture in Deep Learning. It comprises of three kind of layers: Convolutional layers, where a sliding bit is applied to the picture so as to separate highlights. Nonlinear layers , which apply an actuation work on the highlights so as to empower the displaying of non-straight capacities by the system; and Pooling layers, which takes a little neighborhood of the element map and replaces it with some factual data of the area.
2. Recurrent Neural Networks (RNN)and LSTM:
RNNs are typically used for processing sequential data like time series, speech. etc. In architecture of RNN, at each timestamp, model gets the input from the current time and hidden state from previous step and generates a hidden state as an output of current stage. The hidden state from the very last time-stamp can then be used to perform a task.
LSTM is nothing but a variety of RNN, which overcomes the gradient vanishing issue faced by RNN. The LSTM architecture consists of three gates, (an input gate,output gate, forget gate) and a memory cell. The cell remembers values over time intervals and the other three gates regulate the information flow into and out of the cell.
Auto-encoders are unsupervised learning algorithms that are used to learn efficient data encoding. They compress the input data into a latent-space representation, and then reconstruct the output from this representation. Auto-encoders are composed of two parts: Encoder and Decoder.
4. Generative Adversarial Networks (GAN):
GANs are unsupervised learning algorithms that are used for generative modeling using deep learning. GAN consists of two networks, the generator and discriminator. We train Generator model to generate new examples, and the discriminator model tries to classify examples as real or fake.
5. Transfer Learning Approach:
In transfer learning, a model trained on one task is reused on another related task as starting point. There are two main ways in which the pre-trained model is used for a different task. In one approach, the pre-trained model, is treated as a feature extractor, and a classifier is trained on top of it to perform classification. In other approach, the whole network is fine tuned on a new task.
Deep Learning Based Works on Biometric Recognition
- Face Recognition:
Face recognition is one of the most popular biometric used for authentication. Facial features like distance between chin and forehead, ear height, nose width, distance between pupils and the triangular shape connecting nose, eye and nose varies person to person.
- Face Datasets: Yale and Yale Face Database B, CMU Multi-PIE, Labeled Face in The Wild (LFW):, PolyU NIR Face Database, YouTube Faces, VGGFace2, CASIA-WebFace, MS-Celeb, CelebA, IJB-C, MegaFace.
- Work performed: DeepFace is one of the earliest Deep Learning work performed on face recognition and it achieved state-of-the-art accuracy. Then several sequences of DeepID feature were proposed for face recognition in which the features were taken from the last hidden layer of CNN. VGGNet and GoogleNet were used as architecture. Further VGGface model was proposed in which VGGNet was trained on the large public dataset and fine tuned the networks via triple loss function which obtained high accuracy of 98.95%.
2. Fingerprint Recognition:
Fingerprint is most commonly used physiological biometric feature. It consists of ridges and valleys, which form unique patterns. Important features exist in a fingerprint include ridge endings, bifurcations, islands, bridges, crossovers, and dots which are used to uniquely identify a person.
- Fingerprint Datasets: FVC Fingerprint Database, PolyU High-resolution Fingerprint Database, CASIA Fingerprint Dataset, NIST Fingerprint Dataset.
- Work performed: A fingerprint minutiae extraction algorithm was proposed based on deep learning models, called as MENet, and achieved promising results on fingerprint images from FVC datasets. Another work proposed a model multi-view deep representation for contact-less and partial 3D fingerprint recognition was proposed that includes a fully convolutional network for fingerprint segmentation and three Siamese networks to learn multi-view 3D fingerprint feature representation. This model achieved high accuracy on various datasets. Further a fingerprint texture learning model was proposed using a deep learning framework. It was evaluated on several benchmarks, and achieved verification accuracies of 100, 98.65, 100 and 98% on the four databases of PolyU2D, IITD, CASIA-BLU and CASIA-WHT, respectively.
3. Iris Recognition:
Iris recognition has gained a lot of popularity in recent years in different security-related fields. Iris contain a rich set of features such as rings, corona, ciliary processes, freckles, and the striated trabecular meshwork of chromatophore and fibroblast cells that are embedded in their texture and patterns which do not change over time.
- Iris Datasets: CASIA-Iris-1000 Database, UBIRIS Dataset, IIT Delhi Iris Dataset, ND Datasets, MICHE Dataset
- Works Performed: In one of the first work performed in the IRIS recognition, features were extracted from a pre-trained CNN model that was trained on ImageNet. The features were derived from different layers of VGGNet, and trained a multi-class SVM on top of it, and showed that the trained model can achieve state-of-the-art accuracy on two iris recognition benchmarks, CASIA-1000 and IIT Delhi databases. In another work, an iris recognition system was developed based on deep features extracted from AlexNet, followed by a multi-class classification, and achieved high accuracy rates on CASIA-Iris-V1, CASIA-Iris-1000 and, CASIA-Iris-V3 Interval databases.
4. Palmprint Recognition:
Palmprint is a biometric which is also seem to be used widely. Each part of a palmprint has different features. It includes texture, ridges, lines and creases. Palmprint features also consist of geometry-based features, delta points, principal lines, and wrinkles.
- Palmprint Datasets : PolyU Multispectral Palmprint Dataset, CASIA Palmprint Database, IIT Delhi Touchless Palmprint Database.
- Works performed: In the very first work performed on palm recognition, a deep belief net was built by top-to-down unsupervised training, and tuned the model parameters toward a robust accuracy on the validation set. In another work, a palmprint recognition algorithm using Siamese network was proposed. Two VGG-16 networks were used to extract features for two input palmprint images, and another network was used on top of them to directly obtain the similarity of two input palmprints according to their convolutional features. This method achieved an Equal Error Rate (EER) of 0.2819% on on PolyU dataset.
5. Ear Recognition:
Ear recognition is a more recent problem that scientists are exploring. Each person has a unique shape of ears, hence they can be used to identify a person from an image.
- Ear Datasets: IIT Ear Database, AWE Ear Dataset, Multi-PIE Ear Dataset, USTB Ear Database,UERC Ear Dataset, AMI Ear Dataset, CP Ear Dataset, WPUT Ear Dataset.
- Works performed: As Ear recognition is not as popular as face, iris, and fingerprint recognition, datasets available for ear recognition are limited in size. To overcome this issue, in the first work few-shot learning methods were used where the network use the limited training and quickly learn to recognize the images. In another work, a model using transfer learning with deep networks for unconstrained ear recognition was proposed. Further a model was proposed that used a fusion of CNNs and handcrafted features for ear recognition which outperformed other state-of-the-art CNN-based works, reaching to the conclusion that handcrafted features can complement deep learning methods.
6. Voice Recognition:
Voice recognition is also known as speaker recognition. In Voice Recognition, we determine a person’s identity using the characteristics of a person’s voice. Voice recognition includes both behavioral and physiological features, such as accent and pitch respectively.
- Voice Datasets: NIST SRE, SITW, VoxCeleb, Switchboard dataset, Librispeech dataset, TIMIT dataset.
- Works performed: Before the era of deep learning, most of the voice recognition systems were built using i-vectors approach. In the first approach using deep learning, the DNN-based model was incorporated into the i-vector framework. A DNN acoustic model trained for Automatic Speech Recognition (ASR) was used to gather speaker statistics for i-vector model training. This method reduced 30% equal error rate. Further, a complementary optimizing goal called intra-class loss was proposed to improve speaker embeddings learned with triplet loss. The model trained using intra class loss yeild 30% reduction in error rare as compared to triple loss. This model was evaluated on VoxCeleb and VoxForge datasets.
7. Signature Recognition:
Signature is considered a behavioral biometric. This is the most common method to check the user’s identity for the purpose of security in both traditional as well as digital format. The features of the written signature are as the thickness of a stroke and the speed of the pen during the signing.
- Signature Datasets: ICDAR 2009 SVC, SVC 2004, Offline GPDS-960 Corpus.
- Works performed: In one of the early works performed in Deep Learning for signature recognition, a Restricted Boltzmann Machine (RBM) was used to both identify a signature’s owner and distinguish an authentic signature from a fake. In some other works, embedding based WI offline signature verification model was proposed in which the input signatures were embedded in a high-dimensional space using a specific training pattern, and the Euclidean distance between the input and the embedded signatures was used to determine the outcome. Later, in recent works various models using GAN network, recurrent neural networks (RNNs), long short term memory (LSTM) and gated recurrent units (GRUs) are proposed for signature recognition.
8. Gait Recognition:
Gait is a person’s manner of walking. A lot of researchers from different communities such as Machine Learning, Forensic studying, Robotics, Computer Vision and Biomedical are working on Gait Recognition. It is one of the challenging behavioral biometric recognition because it is possible to mimic someone’s gait. A person’s gait change due to factors such as injuries, the carried load, clothing,viewing angle, walking speed and weather conditions.
- Gait Datasets: CASIA Gait Database, Osaka Treadmill Dataset, Osaka University Large Population (OULP) Dataset.
- Works Performed: In one of the older works, a gait recognition system using 3D convolutional neural networks was proposed which learns the gait from multiple viewing angles. This model consists of multiple layers of 3D convolutions, max pooling and ReLUs, followed by fully-connected layers. In another work, a hybrid CNN-RNN network was proposed which uses the data from smartphone sensors for gait recognition, particularly from the accelerometer and the gyroscope, and the subjects are not restricted in their walking in any way.
Below table lists the successful methods and datasets used for biometric recognition along with the accuracy achieved.
Although, in past few years a great progress has been achieved using Deep Learning models in biometric recognition, there are still many challenges like challenging datasets, interpretability of data models, biometric fusion, self-supervised learning, Memory efficient models, etc.
In this article, I have provided high level information about Biometric recognition using Deep learning, the common Deep Neural Networks used in Biometrics recognition as well as the summary of works performed in this area using Deep Learning. For more detailed information about these techniques and works, kindly refer to the original article link provided above.