Weinberger et al. introduced the classical large margin nearest neighbor (LMNN) algorithm that makes samples from the same class compose k-nearest neighbors and samples belonging to various categories be separated by an appropriate margin. For example, Xing et al. designed metric learning as a convex optimization problem by adopting a semidefinite programming formulation of similarity side information. A variety of metric learning approaches have been proposed over the past decade, and they have been used in various applications of pattern recognition including face recognition, image searching and fine-grained recognition. The target of metric learning or similarity learning is, broadly speaking, to learn a proper similarity measure for increasing the dissimilarity of inter-class samples and increasing the similarity of intra-class samples. For the sake of utilizing multi-view data, many multi-view learning methods have been introduced in the last decade however, there are only a small number of them developed in the multi-view metric learning perspective, and the existing multi-view metric learning methods are mainly formulated in the framework of Mahalanbis distance metric learning. Multi-view learning aims to improve the performance of the classification or recognition tasks by making use of multi-view representations of data. For instance, we can use different feature representations to depict a face image, e.g., scale invariant feature transform (SIFT) , local binary pattern (LBP) and histogram of oriented gradient (HOG) . Multi-view data are very common in the practical applications, and it usually describes the information of the examples more comprehensively than single-view data. Most of the metric learning approaches in face verification are developed for single-view data so that they are not suitable for exploiting multi-view data efficiently. More detailed introductions and developments of face verification can refer to the survey paper . Schroff et al. exploited the deep convolutional neural networks for face verification and clustering and proposed a FaceNet method to measure the distance of face images in Euclidean space. Koestinger et al. introduced a large-scale metric learning method to compute the Mahalanobis distance of images from the statistical inference perspective and achieved the state-of-the-art performance. Guillaumin et al. proposed a logistic discriminant method and a nearest neighbor method to learn a distance metric for calculating the similarity of two face images. A variety of metric learning-based face verification methods have been introduced in the literature to advance the performance of face verification. Face verification has received wide attention since the unconstrained face image datasets were released to the public, for example, labeled faces in the wild (LFW) (LFW) , MegaFace and other benchmark face image datasets . Experiments on fine-grained face verification and kinship verification tasks demonstrate the superiority of our MVCSL approach.įace verification is a representative task of pattern recognition and computer vision its purpose is to decide whether a pair of facial images belongs to the same subject or not. ![]() ![]() Specifically, MVCSL employs the constraints that the joint cosine similarity of positive pairs is greater than that of negative pairs. The proposed MVCSL method is able to leverage both the common information of multi-view data and the private information of each view, which jointly learns a cosine similarity for each view in the transformed subspace and integrates the cosine similarities of all the views in a unified framework. In this paper, we propose a multi-view cosine similarity learning (MVCSL) approach to efficiently utilize multi-view data and apply it for face verification. However, most of the metric learning or similarity learning methods are developed for single-view feature representation over the past two decades, which is not suitable for dealing with multi-view data directly. An instance can be easily depicted from different views in pattern recognition, and it is desirable to exploit the information of these views to complement each other.
0 Comments
Leave a Reply. |