Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis

View on arXiv ← Back to list

Authors: Shukesh Reddy, Nishit Poddar, Srijan Das, Abhijit Das

Published: 2024-09-29 07:03:05+00:00

AI Summary

This paper proposes a self-supervised auxiliary learning approach to improve face analysis by integrating texture-based local descriptors into feature modeling. It uses a masked autoencoder (MAE) as an auxiliary task alongside the primary task to reconstruct texture features, leading to more robust and unbiased face analysis across various paradigms.

Abstract

In this work, we explore Self-supervised Learning (SSL) as an auxiliary task to blend the texture-based local descriptors into feature modelling for efficient face analysis. Combining a primary task and a self-supervised auxiliary task is beneficial for robust representation. Therefore, we used the SSL task of mask auto-encoder (MAE) as an auxiliary task to reconstruct texture features such as local patterns along with the primary task for robust and unbiased face analysis. We experimented with our hypothesis on three major paradigms of face analysis: face attribute and face-based emotion analysis, and deepfake detection. Our experiment results exhibit that better feature representation can be gleaned from our proposed model for fair and bias-less face analysis.

Key findings

The proposed method, especially the RRLnRC variant, outperforms other methods in deepfake detection and achieves better results in facial attribute and emotion recognition. The integration of local pattern features leads to more robust and fair performance compared to methods relying solely on RGB information.

Approach

The authors propose a hybrid approach that combines texture-based local descriptors (LDP/LBP) with a model-based approach (Vision Transformer). Self-supervised learning, specifically masked autoencoders (MAE/VideoMAE), is used as an auxiliary task to reconstruct texture features, improving the robustness and fairness of the resulting feature representations for face analysis tasks.

Datasets

FaceForensics++, DFDC, CelebA, Affectnet

Model(s)

Vision Transformer (ViT-B) with Masked Autoencoder (MAE) and VideoMAE as auxiliary tasks. Local Directional Patterns (LDP) and Local Binary Patterns (LBP) are used as texture descriptors.

Author countries

India, United States

← Previous