InnerSelf: Designing Self-Deepfaked Voice for Emotional Well-being

Authors: Guang Dai, Pinhao Wang, Cheng Yao, Fangtian Ying

Published: 2025-03-18 13:45:22+00:00

AI Summary

InnerSelf is a novel voice system that uses speech synthesis and large language models to allow users to engage in supportive dialogues with their deepfake voices, aiming to improve emotional well-being by manipulating positive self-talk.

Abstract

One's own voice is one of the most frequently heard voices. Studies found that hearing and talking to oneself have positive psychological effects. However, the design and implementation of self-voice for emotional regulation in HCI have yet to be explored. In this paper, we introduce InnerSelf, an innovative voice system based on speech synthesis technologies and the Large Language Model. It allows users to engage in supportive and empathic dialogue with their deepfake voice. By manipulating positive self-talk, our system aims to promote self-disclosure and regulation, reshaping negative thoughts and improving emotional well-being.


Key findings
UNKNOWN. The paper presents a system design and does not include results from user studies. The authors plan to conduct user studies to evaluate the system's effectiveness in future work.
Approach
InnerSelf uses a multimodal emotion recognition module (combining audio features from Wav2Vec 2.0 and text features) to understand user emotion. It then employs GPT-4 to generate empathetic responses based on emotion and context, which are synthesized into the user's deepfake voice using the SV2TTS model.
Datasets
UNKNOWN
Model(s)
Wav2Vec 2.0, GPT-4, SV2TTS, and a speech-to-text model
Author countries
China