Methods for Machine Recognition of Faces. Face Detection and Feature Extraction
As illustrated in Fig. 4, the problem of automatic face recognition involves three key steps/subtasks:
1. Detection and coarse normalization of faces
2. Feature extraction and accurate normalization of faces
3. Identification and/or verification
Sometimes, different subtasks are not totally separated. For example, facial features (eyes, nose, mouth) are often used for both face recognition and face detection. Face detection and feature extraction can be achieved simultaneously as indicated in Fig. 4.
Figure 4. Mutiresolution seach from a displaced position using a face model
Depending on the nature of the application, e.g., the sizes of the training and testing databases, clutter and variability of the background, noise, occlusion, and speed requirements, some subtasks can be very challenging. A fully automatic face recognition system must perform all three subtasks, and research on each subtask is critical. This is not only because the techniques used for the individual subtasks need to be improved, but also because they are critical in many different applications (Fig. 3).
For example, face detection is needed to initialize face tracking, and extraction of facial features is needed for recognizing human emotion, which in turn is essential in human-computer interaction (HCI) systems. Without considering feature locations, face detection is declared as successful if the presence and rough location of a face has been correctly identified.
Face Detection and Feature Extraction. Segmentation/Detection. Up to the mid-1990s, most work on segmentation was focused on single-face segmentation from a simple or complex background. These approaches included using a whole-face template, a deformable feature-based template, skin color, and a neural network.
Significant advances have been made in recent years in achieving automatic face detection under various conditions. Compared with feature-based methods and template-matching methods, appearance, or image-based methods (2, 23) that train machine systems on large numbers of samples have achieved the best results (refer to Fig. 4). This may not be surprising since complicated face objects are different from non-face objects, although they are very similar to each other. Through extensive training, computers can be good at detecting faces.
Feature Extraction. The importance of facial features for face recognition cannot be overstated. Many face recognition systems need facial features in addition to the holistic face, as suggested by studies in psychology. It is well known that even holistic matching methods, e.g., eigenfaces (15) and Fisherfaces (16), need accurate locations of key facial features such as eyes, nose, and mouth to normalize the detected face (24-26).
Three types of feature extraction methods can be distinguished:
1. Generic methods based on edges, lines, and curves
2. Feature-template-based methods that are used to detect facial features such as eyes
3. Structural matching methods that take into consideration geometrical constraints on the features
Early approaches focused on individual features; for example, a template-based approach is described in Ref. 27 to detect and recognize the human eye in a frontal face. These methods have difficulty when the appearances of the features change significantly, e.g., closed eyes, eyes with glasses, or open mouth. To detect the features more reliably, recent approaches use structural matching methods, for example, the active shape model (ASM) that represents any face shape (a set of landmark points) via a mean face shape and principle components through training (3).
Compared with earlier methods, these recent statistical methods are much more robust in terms of handling variations in image intensity and in feature shape. The advantages of using the so-called ‘‘analysis through synthesis’’
approach come from the fact that the solution is constrained by a flexible statistical model. To account for texture variation, the ASM model has been expanded to statistical appearance models including a flexible appearance model (28) and an active appearance model (AAM)(29). In Ref. 29, the proposed AAM combined a model of shape variation (i.e., ASM) with a model of the appearance variation of shape-normalized (shape-free) textures.
A training set of 400 images of faces, each labeled manually with 68 landmark points and approximately 10,000 intensity values sampled from facial regions were used. To match a given image with a model, an optimal vector of parameters (displacement parameters between the face region and the model, parameters for linear intensity adjustment, and the appearance parameters) are searched by minimizing the difference between the synthetic image and the given image.
After matching, a best-fitting model is constructed that gives the locations of all the facial features so that the original image can be reconstructed. Figure 4 illustrates the optimization/search procedure to fit the model to the image.
Date added: 2024-02-27; views: 135;