Multimodal gesture recognition

using unsupervised learning via sparse autoencoder

This project implements a general purpose classification system. The focus of the project is on achieving robust performance (classification accuracy) with minimal human interaction. Unlike most classification systems, which rely on domain specific heuristics to perform feature selection, Minerva uses an unsupervised learning technique called 'sparse autoencoding' that learns important features automatically over time without any human interaction. The sparse autoencoder is implemented with a convolutional neural network that takes raw data as input and produces a set of features that attempt to capture the essential information in that data. It is trained by streaming through massive unlabeled input data sets. Once the sparse autoencoder has learned useful features, it is connected to a more traditional classification system that is trained using supervised learning. This system is also implemented with a neural network, and it attempts to discover complex relationships between generated features and output classes. Although Minerva is designed to handle arbitrary input data, this project includes additional supporting modules for performing classification on video data. For the first case study, Minerva is used to perform automatic gesture recognition. The labeled dataset used for training was obtained from http://www.kaggle.com/c/multi-modal-gesture-recognition.

Link to the source code