Understanding templates

For the longest time, I felt shaky about using templates in my code. I don’t mean the Standard Template Library. I mean code with

instead of data types.

Additionally, C++11 introduced variadic templates, so I decided it was time to take the time and finally solidify my understanding of templates. This post is an effort to test whether I understand it well enough to explain it to others.

What?

Templates allow programmers to use a series of instructions to manipulate types. This is similar to how programming languages use a series of instructions to manipulate data. One of the most simple (and commonly used) instructions is to copy a type. For example, in the standard template library std::vector class, the type ‘int’ specified in, for example, std::vector, is copied into the implementation of the std::vector class. However, templates are much more powerful than this, and they can perform more complex operations on types as well. In fact, the set of operations allowed on types using C++11 templates is Turing complete. In this post, we will look at a more complex example that uses templates to build a list of types, and iterate over it at compile time. But before that, let’s talk about the basics of templates. Templates can be categorized as:

function templates

These use a placeholder type called template type parameter for some or all of it’s parameters. This is essentially a family of functions supporting various data types for these parameters. When this function is used with a specific type, the compiler uses this stencil function and replaces the placeholder type with the actual type that the function call specified and calls the function template instance for that type.

class templates

These are basically a family of classes. The compiler uses the appropriate class instance depending on the type it is instantiated with. The classic example is the containers in the Standard Template Library eg: std::vector vs. std::vector vs. std::vector

Why?

The intent of using templates is to decouple the concern of code from data. i.e algorithms can be developed independent of the datatypes they manipulate. Templates enable classes and functions to operate on different data types without having to replicate code for each type. A simple example is a function [getMax] that returns the maximum of two numbers. These numbers could have different precision, be integers, doubles or floats. Without templates, we’d have to replicate the code for getMax for each of these types.

When?

Templates are used whenever there is a piece of code that manipulates various kinds of datatypes. And the operations in this code do not depend on the datatypes.

How?

I’m going to try a different format to explain how. Let’s go over a code example to understand the how.

Here is the problem statement for the code example:

Create a list of type new_type_list that can hold a list of types. Provide the following functions

– append_type that adds a given data type to the list
– print_types that traverses this list and prints the data type of each entry in the list

Let’s assume that new_type_list is a linked list for simplicity.
Here is how the calling code looks:

int main(int argc, char** argv) {

typedef typename new_type_list::type empty_list;

typedef typename append_type<empty_list, int>::type temp1;
typedef typename append_type<temp1, float>::type temp2;
typedef typename append_type<temp2, float>::type temp3;
typedef typename append_type<temp3, float>::type temp4;
typedef typename append_type<temp4, char>::type complete_list;

print_types<complete_list>()();

return 0;
}

Let’s start by defining the new_type_list. From its usage in the calling function, we know that new_type_list contains information about the type and the next type (which can be Null when the current entry is the last entry in the list). So we define node as payload type and next node type:

template<typename Payload, typename Next>
class TypeList {
public:
typedef Payload payload_type;
typedef Next next_type;
};

Our new_type_list now contains a type made up of such a TypeList:

[cc lang="cpp" tab_size="4" lines="40"]
class new_type_list {
public:
typedef TypeList<NullType, NullType> type;
};

Now let’s look at the append_type function. This function can be called to append a node to

- a list of nodes or
- a node or
- nothing (first node of a linked list)

Looking at this more closely, it is obvious that each of these cases can be composed of the other (from simplest to the most complex).

The simplest case is for the first node appended to nothing (null). For this case, the TypeList we are attaching to, is nothing (referred to as NullType here onwards). The Value we start with (first node) is the first entry in our list.

[cc lang="cpp" tab_size="4" lines="40"]
template<typename Value>
class append_type<TypeList<NullType, NullType>, Value> {
public:
typedef TypeList<Value, NullType> type;
};

To append a node to a node, we use the target Node (we are attaching to) with PayLoad and next_type NullType (since it was the last node until now and hence pointed to NullType) and the Value of our new node.

template<typename Payload, typename Value>
class append_type<TypeList<Payload, NullType>, Value> {
public:
typedef TypeList<Payload, TypeList<Value, NullType>> type;
};

Next one is the most general call. When a call to append_type is made, the compiler matches it to the most specific function. The above two declarations of append_type are more specific. The following function declaration is the least specific. The compiler matches to this call when they don’t match to either of the above two declarations.

template<typename List, typename Value>
class append_type {
// value not NullType static_assert();
public:
typedef TypeList<typename List::payload_type, typename append_type<typename List::next_type, Value>::type> type;
};

Now let’s write the print_types function. It takes a new_type_list and prints the contents. For each entry in the list, it has to print the Payload type and the following Next type. We can write it as:

template<typename T>
class print_types {
public:
void operator()() const {
typedef typename get_first_type<T>::type first;
typedef typename get_second_type<T>::type second;

print_types<first>()();
print_types<second>()();
}
};

We have helper functions to get the first (Payload) and second (Next) types:

template<typename T>
class get_first_type {
public:
typedef typename T::payload_type type;
};

template<typename T>
class get_second_type {
public:
typedef typename T::next_type type;
};

Now we define what to do in the print_types functions for each of the following types int, float, char and NullType (more types can be easily added if required).

template<>
class print_types<int> {
public:
void operator()() const {
std::cout << "int ";
}
};

template<>
class print_types<float> {
public:
void operator()() const {
std::cout << "float ";
}
};

template<>
class print_types<char> {
public:
void operator()() const {
std::cout << "char\n";
}
};

template<>
class print_types<NullType> {
public:
void operator()() const {
//intentionally blank
}

};

To summarize, we talked about what templates are, why they’re useful. We saw a detailed example where we created a list that held types, appended new types to the list and printed all the types. One application of the example we saw could be to generate code for various types. Imagine a matrix operation that evaluates dot product of two matrices, each matrix could be of different precisions. We could generate this dot product for each combination by iterating over a list of type (from our example).

Design Patterns – An introduction

Design patterns are a quintessential topic that all entry level software developers are intimidated by. This post is my attempt at explaining the concepts in a simple way. I will avoid going into detailed examples since there are a lot of great links, videos and books that do that well. I’d like to explain the core ideas behind design patterns in a concise way. This post will provide a high level summary about the types of design patterns and when they are used.

What?

A design pattern is a general, reusable solution to a commonly occurring problem software design.

There are three main categories design patterns can be divided into:

Creational
Structural
Behavioral

Why?

They speed up development by reusing existing paradigms (tested, proven to be efficient).

How?

Drawing a picture of a few objects takes time. But if we use stencils of objects to compose the drawing, it can be done much faster.

A design pattern is like a template (stencil) to solve problems in software development.

Creational patterns

are used for creating instances of classes and objects. They have two main ideas

Encapsulate the information about which concrete class to use at creation
Hide the nitty gritty details of creation and composition of the concrete classes and objects

Object creational patterns delegate object creation to another object.

Class creational patterns use the principle of inheritance to delegate object creation to subclasses. Some patterns that fall in the category of creational patterns are:

Factory pattern allows a class to delegate instantiation to its subclasses
Abstract factory pattern provides an interface to create a family of objects without specifying the concrete classes
Builder pattern encapsulates object construction from object representation
Prototype pattern clones from a fully initialized instance
Singleton pattern ensures that a class has only one instance during program execution and provide a global point of access to that instance

Structural patterns

simplify composition of entities (their relationships and functionality). Some common patterns in this category are:

Adapter pattern matches interface of a class to what the client expects
Bridge pattern decouples the interface from implementation to provide ‘loose coupling’
Composite pattern is a tree structure of objects with the same interface. Some objects are simple, while others are complex but they all have a uniform interface
Decorator pattern provides a way to extend functionality of a class during runtime without changing the interface
Facade pattern provides a higher level simplified interface to a set of interfaces in a subsystem
Flyweight pattern enables sharing by allowing a lot of objects share common properties
Proxy pattern provides a placeholder for another object to control access

Behavioral patterns

provide flexibility in communication across objects. Some common patterns in this category are:

Chain of responsibility passes the request along a chain of receivers until an object handles it
Command pattern encapsulates a request into an object
Iterator pattern provides a way to access the elements of a collection of objects without exposing the underlying details
Mediator pattern defines how different objects interact without referring to each other explicitly
Null object pattern provides a default object to avoid null references
Observer pattern defines a one to many dependency across objects so that a state change in the one object causes a notification to be sent to all it’s dependents
State pattern allows an object to change behavior when there is an internal state change
Strategy pattern defines a family of algorithms that are encapsulated to be interchangeable. Thus the algorithm can vary independently of the clients that use them
Template pattern lets subclasses redefine (override) some (specific) steps of an algorithm without affecting the high level structure of that algorithm
Visitor pattern allows a new operation to be defined without changing the class definition of the entities on which it operates

Deep Learning

Recently the term deep learning has become a buzz word in the field of technology. People are excited about using deep learning to develop products that solve problems (including those in the field of Artificial Intelligence a.k.a AI) which seemed impossible (or too futuristic) in the past.

A few days ago, there was news about how a computer system trained to play Go using deep learning beat a top human Go player that caught my attention. Until now Go (unlike chess) was a game that computers couldn’t always beat good human players at. So I was intrigued. What is this deep learning? I set out to figure it out using some papers, lectures and this book.

“If you can’t explain it simply, you don’t understand it well enough.”

Albert Einstein.

So this post is an attempt to explain some of these concepts and to ensure that I understand them well.

Let’s start with concisely stating what deep learning is and then go into further details.

What?

Deep Learning is an approach to Machine Learning based on the knowledge of

learning algorithms
statistics
applied mathematics

Why now?

Deep Learning has recently been revived in popularity due to increased computation ability of CPUs and general purpose GPUs, huge amount of datasets (quite a bit of it labeled) and new techniques to train deeper neural networks.

How?

Let’s take a step back to understand how deep learning fits with AI.

Classical algorithms in Computer Science map representations to output.
The goal of AI is to learn how to generate these representations in order to convert them to output, without human intervention.
There are many factors that humans use to understand / classify the observed input data into output.
Usually, humans are required to disentangle these features and remove the redundant ones to specify them as representations in classical algorithms.

Deep Learning organizes these features in a hierarchy where simpler concepts combine to generate a complex one.

Deep learning uses multiple layered neural networks to understand inputs and extract abstract concepts (a.k.a. features) from it. These abstract concepts are combined hierarchically to generate complex representations that in turn generate output.

It is also worth noting that Deep Learning does not attempt to simulate the human brain. Instead it takes the knowledge we have about learning hierarchical representations automatically from data and uses it to build computer systems that solve tasks requiring intelligence.

Minerva – an overview

This blog has not had any posts for a while. The main reasons are that I have been busy reading a lot of stuff (but did not have any additional comments on the books I read to blog about) and I have been working on a side project to get some experience building a machine learning framework from scratch. In the process of building it, I have learned a lot of things that I feel could be useful to someone trying a similar endeavor. So I have decided to blog about it.
The framework is called minerva (from the Roman goddess of wisdom). The source code can be found on my github page. My husband and I were looking for a project idea that had the following attributes:

A generic framework for a classification system that could handle various types of data
A high performance framework
Easily scalable
Implements some ideas that have gained interest in the field of ‘deep learning’ a.k.a automatic feature selection

We decided to implement a configurable neural network library from scratch in C++. We wanted this neural network to be able to perform automatic feature selection. After reading some research papers on automatic feature selection, the sparse autoencoder technique seemed promising enough to try out. Originally, the dataset we used for this project came from a Kaggle competition to perform multi-modal gesture recognition from videos. We used this dataset to setup and pipe-clean the flow (which supports images and video inputs). We tried our best to design a system with clean interfaces. We wanted to ensure that supporting different libraries (matrix, images etc) would be very seamless. This came in handy when we decided to add support to run minerva on GPUs. We simply plugged in support for GPU matrix libraries.

So, how do you use minerva? Well it isn’t as easy as pressing a button or running an executable / script (although I plan to add support for a simpler way by adding a wrapper script in the near future). But for now, there are four main steps.

Creation of the neural network – In this step, we create an initial model of a neural network.
Unsupervised Learning – In this step, we run the model from previous step with a lot of input data in ‘unsupervised learning’ mode. The result of this step is a feature selector neural network. More details follow.
Supervised Learning – Run the feature selector model generated in above step in ‘supervised learning’ mode by running labeled training data. The result of this step is a classifier neural network.
Classify (Test) – In this test we run test data on the classifier neural network and generate output labels for the test data.

1. Creation of the neural network model
The neural network is created with a configuration specified via. command line. It is then serialized and written to disk as a compressed tgz file. The file contains a json file describing the attributes like the number of layers, neurons of the neural network and a bunch of binary files containing randomly initialized matrices which combine together to form the neural network.

2. Unsupervised Learning
Unsupervised learning (or ‘deep learning’) is the process of automatically discovering patterns in large data sets without human interaction. Without unsupervised learning, we would need to manually select features that are indicators of how a particular input would be classified. This requires manually labeling thousands or millions of inputs (video frames) to perform classification (identify what an image contains). Additionally, this would need to be robust w.r.t. variations in color, alignment, and other noise. It is not feasible for most projects to devote the resources required to gather such a large number of labeled inputs.

In order to deal with this problem, many projects rely on building domain-specific feature selection systems. Such specific feature selectors preprocess the input data and produce a set of features which capture the essential information from the input data set into a significantly smaller representation. Feature selection reduces the dependence on labeled data by simplifying the problem to determining the class of a dataset using only the most relevant information instead of all information. Sift is an example of a widely used feature selection system. Most manually-crafted feature selection systems have a fatal flaw; they are designed by developers and tailored to specific problems. As a consequence, they are often brittle (because they are designed using developer / domain expert intuition), and require a complete redesign when moving from one type of input data to another (e.g. from video to text).

We decided to implement a sparse autoencoder technique for our unsupervised learning step. This technique attempts to address the shortcomings of feature selection systems by providing a framework to automatically generate features. At a high level, it uses an artificial neural network trained with unlabaled data and configured in a specific topology. The topology guides the inner layers of the network to respond to patterns in the input data such that the majority of information in the data is captured (essence of the data, so to speak). After training on unlabeled data, the inner layers of this network can be used directly as features; in other words, the network itself becomes a feature selection system.

The following image includes a visualization of some of the low level features (inputs that a pooling layer neuron maximally responds to) that were learned by Minerva after being presented 10,000 random images.

These compare favorably to Gabor Filter function responses in the top-left of the following figure, which have been found in the first layers of the visual cortex in mammalians, and perform well at visual classification tasks:

The accompanying figure on the top-right shows the corresponding spatial domain representation (impulse response) of each of the Gabor Filters.

We mainly referred to the following resources on deep learning during the course of this project:

Sparse autoencoders
Automatic Feature Selection
Learning Feature Hierarchies
Convolutional and Pooling Networks

3. Supervised Learning
In this step, we use the neural network from the earlier unsupervised learning stage as the starting point (instead of a randomly initialized neural network). At the start, this neural network is capable of selecting features. This training step uses labeled data with gesture names for images (which are snapshots of successive video frames). Thus, in this step, we fed in labeled data to this network and calibrate the network with back propagation and the expected output labels. Once the neural network output is within the specified threshold (tolerance), the resulting classifier neural network is written out to file. It is now ready to be used for testing with images.

The following image shows the neural network input that produces the maximum response for a neuron trained using Minerva to recognize images of cats.

4. Classification
In this step, we use both the neural networks viz. the feature selector from the unsupervised learning step and the classifier neural network from the supervised learning step to classify test images.

In the next few posts I plan to explain the details and rationale behind the design decisions we made to accomplish the four goals I mentioned at the beginning of this post. Here is how I tentatively plan to partition the topics:

Sparsity for unsupervised learning
Support for configurable convolutional neural networks (tiling the pixels to capture spatial locality)
Various principles to ensure that we truly built a high performance architecture from the start
Evaluation of various optimizers for the cost function calculation (required in calibrating the network in back propagation)
The various libraries for supporting framework viz. libraries for linear algebra , video, image, serialization.

What & why: machine learning?

I’ve been thinking of a topic/theme for this blog for a while now. For the last few months I have become interested in machine learning, and so have been trying to learn more about the topic. I figured it would be good exercise to summarize things I learn on this blog. They say a good understanding is a pre-requisite to explaining.

What is machine learning?

Machine learning is the science that deals with inferences. It uses statistical methods to make those inferences. These inferences can then be used for forecasting or predictions.

There are different types of forecasting viz. classification, regression.

Classification:

The task of determining which one of the available classes, a variable belongs to, is classification. Think of these values as enumerated types. And usually these classes are disjoint. Eg: A system that detects which blood group a given sample belongs to, is a classification system. The sample can take one of the following values {A, B, AB, O} (let’s forget positive and negative for simplicity).

Regression:

The task of determining the exact value a target variable can take, is regression. This variable can take any value (from the set of real numbers). Eg: A system that predicts the premium for your car insurance, or the cost of a house, or the interest rate on a loan.

Why machine learning?

Traditionally machines have been used to follow instructions, which means we still need to tell them what to do. The next step in the evolution of computing is to enable machines to ‘learn’ just like humans. We can leverage the recent explosion of data-sets (big data) and increased compute ability of processors to ‘teach’ machines to ‘think, predict, estimate, classify’ instead of just following our instructions (executing algorithms we write).

There are a number of ways to ‘teach’ machines. They are broadly categorized as ‘supervised learning’, ‘unsupervised learning’ and ‘reinforcement learning’.

Supervised learning: this is analogous to classroom learning for humans.
Unsupervised learning: this has no external intervention, the system ingests a lot of training samples and figures out what features matter. I will talk about this in depth in another post.
Reinforcement learning: this is a game theory based approach where the system is given rewards or punishments depending on it’s correct or incorrect predictions. The system modifies it’s prediction algorithm according to this feedback.

PS: I haven’t blogged for many years. I hope that the ‘flow’ of my posts improve as I continue blogging.

What are we rushing to?

We rush from the constant stream of emails, to reading feeds, checking notifications, watching videos (clips), sharing them, clicking photos, and checking emails that notify us about reactions what we shared. Why are we hurriedly sprinting through life like there is no tomorrow? Why are we consuming unfiltered media at a rate faster than ever before? Simply because it is available? Why do we share & consume a lot more than we create?

What happened to the days when we sat back & reflected on our thoughts, actions & feelings? When we wrote thoughts, essays & poems during random pondering. We ‘created’ them, didn’t we?

Well, this blog is my attempt at bringing those contemplative times back & creating something (be it a trivial blog post) instead of just being guilty of aforementioned mindless consumption.