Predicting race from face

We present a framework for estimating race from faces in movie data. Please refer to a detailed survey “Learning race from face: A survey”, Fu S. et. al., 2014 for a discussion on studies on recognizing race from faces. The definitions and taxonomy of race from a computer vision perspective are borrowed from this survey. The list of sources we used can be found here.

Data preparation
Sample size
Model architecture
Performance evaluation
i. Confusion matrix and accuracy
ii. ROC curves
What is the network learning?

Data and model architecture

We compiled face data across opensource databases which have clear definitions of labeling races and are consistent with our nomenclature. Additionally, we annotated race for IMDb identities as described in Ramakrishna et. al., 2017, ACL. Sources which required institutional EULA agreements were excluded due to the processing delays involved.

Data preparation

All face images were resized to 100x100 and were converted to grayscale. Since a good number of images in our database were grayscale images, we converted all images to grayscale images

Skin color has been shown to be one of the least important factors in race prediction (See survey)

All images were aligned to achieve in-plane rotation to align landmarks of eyes and nose

Left-right flipping of the images was performed for data augmentation

Test images for evaluation were chosen to have no overlap of the person’s identity with the training data as well as to be variable with respect to pose, illumination, background and occlusions

Due to lack of data from the nativeamerica/pacificislander class, we only considered the other five classes in our prediction model

Summary of the face database

race num_faces num_images

eastasian 14472 28133

caucasian 267478 529741

latino/hispanic 10912 21374

asian-indian 28132 55473

african/african-american 22646 44627

nativeamerican/pacificislander 580 1069

TOTALs 344220 680417

race	num_faces	num_images
eastasian	14472	28133
caucasian	267478	529741
latino/hispanic	10912	21374
asian-indian	28132	55473
african/african-american	22646	44627
nativeamerican/pacificislander	580	1069
TOTALs	344220	680417

Average faces for each race

Model architecture

A simple modified VGG-like architecture was adopted for a 5-class race classification model. The architecture is as shown below.
racenet

Performance Evaluation

Class-wise system performance is tabulated below. Overall system performance accuracy was 82.6% for a training setup of mini-batch size 32, trained for 50 epochs.

Race	african	asianindian	caucasian	eastasian	latino
accuracy (%)	83.2	85.9	93.2	87.4	65.6

Confusion matrix

Note that the model is most confused between latino and caucasian faces which is consistent with previous work as well the structural similarity of the faces between these two races.

Receiver operating characteristics (ROC) curves for race prediction

What is the network learning

We were inspired by visualizing filter activations in a CNN as described here and extended the concept to identify the input (or distorted input) that minimizes the output loss for each class. In this, we initialize the input with an average face and minimize the categorical cross entropy loss for each class and visualize the changes to the input that achieves this minimization. The resulting modified inputs are shown below.
Not surprisingly, the modified inputs resemble the per-class average faces and are quite distinct. The results were similar when the input was initialized by average face from a specific class, rather than across all images.

under the hood

Table of contents

Data and model architecture

Data preparation

Model architecture

Performance Evaluation

What is the network learning