tl;dr No amount of talking and reading trumps hands-on experience when learning to work with AI. Find out what really goes into building a face recognition application – even a simple one.
So, you’ve finished MNIST, are ready to conquer the world and you ask yourself “What’s next?”. I say, let’s build a crappy face recognition app for your very own countenance! This project will give you a sobering perspective, providing instructive insights into the following key aspects of computer vision and deep learning:
- Data labelling: To kick things off, you will experience the pain and suffering involved in manually preparing data and drawing hundreds of bounding boxes around your own face. You’re going to use the popular program Labelme for that.
- Data augmentation: Bravo! The dataset you’ve just carefully created is so pointless that it couldn’t even be used to learn how to distinguish faces from pieces of toast. To rectify that, we’re going to use data augmentation with the Python library Albumentations to increase the volume of data we have available.
- Transfer learning: Finally, to demonstrate the simplicity, elegance, and power of deep learning we’ll fit a simple neural network with sigmoid activation to our training data. Just kidding, lol. We’re going to use a convoluted contraption with a gazillion layers and an almost infinite amount of hyperparameters, which the learned scholars determined by a sophisticated, very advanced process of trial and error. Of course, that thing has to be pretrained, because deep learning sucks so much that you literally can’t do anything from scratch. You’re going to fine-tune a VGG16.
- Pytorch: To make things even more opaque, we’re not just going to do some NumPy linear algebra to implement our deep learning routines. Instead, we’re going to use Pytorch, the deep learning masochist’s favorite tensor computation library that abstracts away just the right amount of detail to guarantee that you have no idea to what device which part of your data is going when and what it’s doing there. It achieves this feat by using interfaces that selectively hide or expose hidden states, apparently at random. Plus, its math interface is congruent with that of NumPy, but with subtle differences sprinkled everywhere to make sure that nothing ever just works as intended first try. You’re going to love this!
- Deployment: Piecing all of this together you’re going to load your trained model from memory to have it predict bounding boxes for a live camera feed of yourself. You will be disappointed by the mediocrity of the results and that lingering thought in the back of your mind, that this project was an unmitigated waste of your time, will manifest into painful certainty. Congratulations!
Can’t wait to start? Jump right into the project. We’ve prepared a Jupyter notebook just for you!