CardioLive: Empowering Video Streaming with Online Cardiac Monitoring

Authors: Sheng Lyu, Ruiming Huang, Sijie Ji, Yasar Abbas Ur Rehman, Lan Ma, Chenshu Wu

Published: 2025-02-02 07:26:05+00:00

AI Summary

CardioLive is the first online cardiac monitoring system integrated into video streaming platforms, leveraging both audio and video streams. It uses CardioNet, a novel audio-visual network, to learn cardiac series, achieving a Mean Absolute Error of 1.79 BPM, significantly outperforming video-only and audio-only methods.

Abstract

Online Cardiac Monitoring (OCM) emerges as a compelling enhancement for the next-generation video streaming platforms. It enables various applications including remote health, online affective computing, and deepfake detection. Yet the physiological information encapsulated in the video streams has been long neglected. In this paper, we present the design and implementation of CardioLive, the first online cardiac monitoring system in video streaming platforms. We leverage the naturally co-existed video and audio streams and devise CardioNet, the first audio-visual network to learn the cardiac series. It incorporates multiple unique designs to extract temporal and spectral features, ensuring robust performance under realistic video streaming conditions. To enable the Service-On-Demand online cardiac monitoring, we implement CardioLive as a plug-and-play middleware service and develop systematic solutions to practical issues including changing FPS and unsynchronized streams. Extensive experiments have been done to demonstrate the effectiveness of our system. We achieve a Mean Square Error (MAE) of 1.79 BPM error, outperforming the video-only and audio-only solutions by 69.2% and 81.2%, respectively. Our CardioLive service achieves average throughputs of 115.97 and 98.16 FPS when implemented in Zoom and YouTube. We believe our work opens up new applications for video stream systems. We will release the code soon.


Key findings
CardioLive achieved a MAE of 1.79 BPM, outperforming video-only and audio-only methods by 69.2% and 81.2%, respectively. The system demonstrated robustness across various conditions (different distances, angles, noise levels, body motions, lighting, and devices) and achieved high throughputs (115.97 FPS on Zoom and 98.16 FPS on YouTube).
Approach
CardioLive uses CardioNet, an audio-visual deep learning network, to extract temporal and spectral features from video (using a Temporal Differential Block and Frequency-Aware Block) and raw audio (emulating natural body filtering). A multi-head temporal attention mechanism fuses these features for robust heart rate estimation.
Datasets
Self-collected dataset from 8 different devices and 10 users, PURE, MMPD
Model(s)
CardioNet (a custom audio-visual network incorporating Temporal Differential Blocks, Frequency-Aware Blocks, and a multi-head temporal attention mechanism)
Author countries
China, USA