Facial recognition @Dyos
Between 2019 and 2021, I held the position of Lead Machine Learning Scientist at Dyos Technology GmbH, later known as AICOR Verwaltungs GmbH. My role involved leading a multidisciplinary team of scientists, machine learning engineers, and developers in creating Eli-Ident, an automated KYC product. Our objective was to create a product that enables “strong customer authentication” through video for financial institutions like payment providers or banks, as specified by BaFin. This product was developed exclusively using proprietary algorithms and ML-based models and evolved into qundo.de.
Goal
Develop a facial recognition system designed to match images to a specific individual.
Context
An automated KYC process is designed to verify that the ID document presented belongs to the user who is submitting it. A crucial part of this process involves comparing the photo on the ID document with a selfie taken by the user.
Challenges
We need to develop three specialized models for our facial recognition system. The initial phase involves training a highly accurate face detection model. This model is responsible for precisely cropping faces from images and eliminating background noise. The main challenge here is acquiring a sufficiently large and diverse dataset of face images. Once we have this dataset, we can proceed to train the facial recognition model using the cropped face images.
The second step of the system will detect whether the face is alive or spoofed. Finally, the face matching model will perform the final classification task.
Deep FR system architecture. Source: M. Wang, W. Deng, Deep face recognition: A survey, Neurocomputing, Volume 429, 2021
However, compiling these datasets is both costly and time-consuming. It requires a dedicated team of annotators to label the images accurately, ensuring the data’s quality and relevance for training purposes.
Beyond the typical challenges encountered in computer vision, such as varying image quality and environmental conditions, facial recognition presents additional complexities. These include variations in appearance due to age changes, the presence or absence of facial hair, changes in hairstyle, and more. Crucially, the model must be trained on a wide-ranging collection of images representing diverse identities across different ages, genders, and ethnic backgrounds to ensure its effectiveness and fairness.
Tasks and contribution
- Project Management: I oversaw the entire project lifecycle, which involved initial planning, detailed cost estimation, and managing multiple development iterations. As for most ML projects, the goal was to obtain a POC fast and iterate to efficiently curate our proprietary training dataset.
- Enhanced data curation:
- I managed the data collection process which requires implementing various APIs
- I organized the annotation process (extensive labeling requirement definition and quality control) by setting up an external team
- Managed the CICD and model life-cycle
We achieved greater than 99.9% accuracy on the LFW benchmark allowing a highly reliable face-matching model.