Authors: Lord Sen, Shyamapada Mukherjee

Published: 2026-02-22 11:33:23+00:00

Comment: 10 pages

AI Summary

This paper introduces "Mapping Networks" to address the challenge of escalating parameter counts in deep learning models by replacing high-dimensional weight space with a compact, trainable latent vector. Based on the hypothesis that trained parameters reside on low-dimensional manifolds, Mapping Networks generate target network parameters from this latent space. This approach significantly reduces trainable parameters by approximately 500x, mitigates overfitting, and achieves comparable or superior performance across various tasks.

Abstract

The escalating parameter counts in modern deep learning models pose a fundamental challenge to efficient training and resolution of overfitting. We address this by introducing the \\emph{Mapping Networks} which replace the high dimensional weight space by a compact, trainable latent vector based on the hypothesis that the trained parameters of large networks reside on smooth, low-dimensional manifolds. Henceforth, the Mapping Theorem enforced by a dedicated Mapping Loss, shows the existence of a mapping from this latent space to the target weight space both theoretically and in practice. Mapping Networks significantly reduce overfitting and achieve comparable to better performance than target network across complex vision and sequence tasks, including Image Classification, Deepfake Detection etc, with $\\mathbf{99.5\\%}$, i.e., around $500\\times$ reduction in trainable parameters.


Key findings
Mapping Networks achieved a 99.5% (around 500x) reduction in trainable parameters while maintaining comparable or better performance across tasks. For deepfake detection, it improved accuracy on Celeb-DF to 85.90% with 2048 parameters, outperforming a baseline CNN2 (79.03% with 108618 parameters). The approach also significantly reduced overfitting, showing only a 1.8% drop in test accuracy compared to baseline models.
Approach
The proposed Mapping Networks employ a meta-learning architecture where a compact, trainable latent vector, modulated by fixed mapping weights, generates the parameters for a target neural network. The target network then performs standard feed-forward operations, while gradients propagate exclusively through the mapping network to update the latent vector. A novel Mapping Loss function is introduced to jointly optimize task performance and ensure structural regularity of the parameter manifold.
Datasets
MNIST, Fashion MNIST, Celeb-DF, FF++, Cityscapes, air pollution dataset
Model(s)
UNKNOWN
Author countries
India