Subject: Cutting-Edge Research in Wireless Communication, Sensing, and Medical AI
Hi Elman,
This newsletter covers recent preprints exploring the exciting intersection of signal processing, machine learning, and AI in wireless communication, sensing, and medical domains.
This collection of preprints showcases the growing influence of AI, deep learning, and advanced signal processing techniques across diverse fields. Several papers focus on enhancing the performance and efficiency of wireless systems. Klioui (2025) (Klioui, 2025) introduces THADMM-Net, a deep unfolded network for direction of arrival estimation. This network leverages Toeplitz-Hermitian and positive semi-definite constraints to reduce the parameter count while improving accuracy. Yin et al. (2025) (Yin et al., 2025) propose a generative video semantic communication framework using large language models for efficient video reconstruction at ultra-low channel bandwidth ratios. Xu et al. (2025) (Xu et al., 2025) derive the intrinsic Cramér-Rao bound for 6D localization and tracking in 5G/6G systems, addressing the challenges posed by rotation matrices in Lie groups and proposing two Kalman filter-based tracking methods. Jabbar et al. (2025) (Jabbar et al., 2025) present a practical implementation of a DMA-assisted mmWave communication testbed, demonstrating its potential for ISAC applications.
Another prominent theme is the application of novel signal processing techniques. Mohades et al. (2025) (Mohades et al., 2025) introduce a manifold optimization approach for constructing binary deterministic sensing matrices with low coherence and constant column weight for compressed sensing. Ni et al. (2025) (Ni et al., 2025) leverage satellite remote sensing data and neural networks to improve the accuracy of air-sea interface flux calculations for maritime communication. Yun et al. (2025) (Yun et al., 2025) propose a coordinated pilot design method for 1-bit massive MIMO systems in correlated channels, minimizing channel estimation MSE using fractional programming. Sun et al. (2025) (Sun et al., 2025) develop deep-unfolded algorithms for joint active user detection, channel estimation, and data detection in grant-free cell-free massive MIMO. Hanon et al. (2025) (Hanon et al., 2025) introduce Herglotz-NET, a novel implicit neural representation architecture for spherical data using harmonic positional encoding based on complex Herglotz mappings.
Several papers explore the intersection of AI and healthcare. Shi et al. (2025) (Shi et al., 2025) present Fundus2Globe, a generative AI framework for creating 3D eye globe models from 2D fundus photographs and metadata, enabling personalized myopia management. Nie et al. (2025) (Nie et al., 2025) develop a deep learning model to estimate vascular age from PPG signals, demonstrating its potential as a digital biomarker for cardiovascular health. Stiehl et al. (2025) (Stiehl et al., 2025) apply dimension reduction methods, persistent homology, and machine learning to analyze EEG signals for detecting interictal epileptic discharges.
Further contributions include work on UAV coverage optimization and radar-based gait monitoring. Vavoulas et al. (2025) (Vavoulas et al., 2025) propose an efficient method for UAV coverage in large convex quadrilateral areas with elliptical footprints using circle-packing and homography transformation. López-Delgado et al. (2025) (López-Delgado et al., 2025) develop and validate a radar network for gait monitoring, comparing different configurations and algorithms for healthy and Parkinson's disease patients. Finally, several papers explore the application of semantic communication and generative AI for various tasks. Cao et al. (2025) (Cao et al., 2025) propose a task-oriented semantic communication framework for stereo-vision 3D object detection. Wang et al. (2025) (Wang et al., 2025) introduce a GenAI-enabled robust data augmentation scheme for wireless sensing in ISAC networks. Kafle et al. (2025) (Kafle et al., 2025) explore one-bit compressed sensing using generative models. The emphasis on data-driven methods, semantic communication, and generative AI suggests promising directions for future research.
Fundus2Globe: Generative AI-Driven 3D Digital Twins for Personalized Myopia Management by Danli Shi, Bowen Liu, Zhen Tian, Yue Wu, Jiancheng Yang, Ruoyu Chen, Bo Yang, Ou Xiao, Mingguang He https://arxiv.org/abs/2502.13182
Fundus2Globe presents a significant advancement in personalized myopia management by leveraging the power of generative AI. Myopia, a prevalent vision impairment, poses increasing global health concerns, especially its severe form, pathological myopia. This condition is characterized by abnormal eye shape changes linked to vision-threatening complications. While MRI provides detailed 3D insights into these changes, its cost and limited accessibility hinder its widespread clinical use. Fundus2Globe addresses this challenge by generating patient-specific 3D eye globes from readily available 2D color fundus photographs (CFPs) and routine metadata (axial length and spherical equivalent), effectively bypassing the need for MRI.
The framework's ingenuity lies in its integration of a 3D morphable eye model (3DMM), a pre-trained retinal foundational model (EyeFound), and a denoising diffusion probabilistic model (DDPM). The 3DMM, trained on a dataset of myopic eyes, efficiently encodes and decodes 3D eye globes, transforming them between 3D space and a lower-dimensional latent space. EyeFound extracts clinically relevant features from CFPs, while the DDPM facilitates cross-modal generation in the latent space, conditioned on the combined input of CFP embeddings and metadata. Furthermore, label distribution learning (LDL) is employed to handle the non-uniform distribution of continuous metadata attributes, enhancing the model's ability to capture subtle shape variations crucial for accurate diagnosis and personalized treatment.
Evaluations on a dataset of 197 eyes with paired CFP-MRI data showcased Fundus2Globe's remarkable accuracy, achieving submillimeter precision in reconstructing posterior ocular anatomy. Using chamfer distance as the primary metric, the model demonstrated superior performance when guided by both CFPs and metadata compared to CFPs alone. The incorporation of LDL further improved performance, reducing chamfer distance by 18% compared to using raw continuous metadata inputs. The model's robustness extends to accurately preserving the distribution of the aspheric descriptor (Q value) across different staphyloma subtypes, a key indicator of pathological myopia. External validation on the PALM dataset, comprising 1200 fundus images with pathological myopia labels, further confirmed the model's generalizability and its capacity to distinguish between eyes with and without pathological myopia based on Q-value distributions.
Beyond static reconstruction, Fundus2Globe offers the unique capability of generating plausible counterfactuals by manipulating input metadata, effectively creating personalized digital twins of the eye. This feature allows clinicians to simulate hypothetical scenarios and explore the impact of refractive shifts on posterior segment changes, aiding in personalized myopia management decisions. For example, the model demonstrated increased asphericity (lower Q values) in generated 3D eye globes with smaller spherical equivalent and longer axial length, with staphyloma-positive eyes exhibiting greater shape sensitivity to these parameter changes. This dynamic modeling capability opens new avenues for predicting disease progression, optimizing treatment plans, and developing novel shape-based biomarkers.
Artificial Intelligence-derived Vascular Age from Photoplethysmography: A Novel Digital Biomarker for Cardiovascular Health by Guangkun Nie, Qinghao Zhao, Gongzheng Tang, Yaxin Li, Shenda Hong https://arxiv.org/abs/2502.12990
Caption: This Kaplan-Meier curve illustrates the cumulative incidence of major adverse cardiovascular and cerebrovascular events (MACCE) stratified by vascular age gap groups (G1 representing a gap >9 years and G4 representing a gap between -9 and 9 years). Individuals with a vascular age gap greater than 9 years (G1) exhibited a significantly higher risk of MACCE compared to those with a gap within the reference range (G4), with a hazard ratio of 5.53 (p<0.005).
This study presents a significant advancement in cardiovascular health assessment by introducing a deep learning model that estimates vascular age directly from PPG signals. This readily accessible and non-invasive approach offers a potential game-changer for scalable risk stratification and personalized health monitoring. The researchers tackled the challenge of imbalanced age distributions in training data by implementing a novel distribution-aware loss function, the Dist Loss. This loss function combines sample-wise MAE with a distributional MAE, effectively aligning the predicted age distribution with the true label distribution: L<sub>Dist</sub> = MAE(Y, Ŷ) + MAE(S<sub>L</sub>, S<sub>P</sub>), where Y and Ŷ represent ground truth labels and model predictions, and S<sub>L</sub> and S<sub>P</sub> are pseudo-label and pseudo-prediction sequences reflecting label and prediction distributions. This innovative approach enhances prediction accuracy, particularly in underrepresented age groups.
A key contribution of this work is the introduction of the vascular age gap, defined as the difference between AI-predicted vascular age and chronological age. Analysis of the UK Biobank cohort revealed a strong correlation between vascular age gap and cardiovascular risk. A vascular age gap exceeding 9 years was significantly associated with an increased risk of MACCE (HR = 2.37, p < 0.005), while a negative gap below -9 years indicated a significantly lower risk. Furthermore, the vascular age gap proved to be a robust predictor of secondary outcomes, including diabetes, hypertension, coronary heart disease, heart failure, myocardial infarction, stroke, and all-cause mortality. Positive gaps consistently correlated with increased risk, while negative gaps suggested a protective effect. The model's utility extends to longitudinal applications, as demonstrated by analyzing serial PPG data, where changes in vascular age gap classification over time further stratified risk. External validation on the MIMIC-III dataset corroborated these findings, confirming a significant association between vascular age gap and in-hospital mortality (OR = 1.02, p < 0.005 per year increase in gap).
The researchers also explored the model's interpretability using saliency maps, which revealed consistent focus on the diastolic and systolic peaks of the PPG waveform across different age groups. This observation aligns with known physiological changes associated with vascular aging, reinforcing the model's ability to capture relevant features related to vascular health. Performance evaluation across multiple datasets demonstrated consistent results, with a Pearson correlation of 0.49 and an MAE of 7.57 years in the UKB hold-out set. Variations in performance across datasets likely reflect differences in patient populations and data acquisition methods.
Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model by Hang Yin, Li Qiao, Yu Ma, Shuo Sun, Kan Li, Zhen Gao, Dusit Niyato https://arxiv.org/abs/2502.13838
Caption: This image compares different video compression methods, showcasing the performance of a novel Generative Video Semantic Communication (GVSC) framework against traditional codecs (H.264, H.265) and DVST. The GVSC framework, particularly the "First Frame+Desc." method (highlighted with red circles), demonstrates superior perceptual quality reconstruction at ultra-low bitrates, even under challenging SNR conditions (0dB), by prioritizing semantic information transmission and leveraging GenAI diffusion models. The comparison highlights the effectiveness of transmitting key semantic information (first frame and textual descriptions) for video reconstruction, achieving high perceptual quality with minimal bandwidth.
This paper introduces a paradigm shift in video transmission with its Generative Video Semantic Communication (GVSC) framework. By leveraging the power of GenAI large models, specifically diffusion models, this innovative approach achieves ultra-low bitrate video transmission by prioritizing the transmission of high-level semantic information over raw pixel data, moving beyond traditional syntactic communication methods. At the transmitter, the GVSC framework extracts key visual and textual semantics. Visual data, such as sketches or the first frame, is transmitted using DJSCC, while textual descriptions, generated by video understanding models like Video-LLaVA, are encoded using turbo coding and QAM. The receiver then utilizes a pre-trained GenAI diffusion model to fuse these multimodal semantics and reconstruct the video, focusing on perceptual quality rather than strict pixel accuracy.
The researchers introduce adaptive transmission strategies tailored to different channel conditions and semantic modalities. These include "Sketches+Desc.," where sketches of each frame are transmitted alongside descriptions; "Sketch+Desc.," concentrating resources on the initial frame's sketch; and "First Frame+Desc.," transmitting the first RGB frame along with descriptions. The optimization of sketch transmission employs a weighted loss function combining MSE and LPIPS: L(x, x₀) = k · MSE(x, x₀) + (1 − k) · LPIPS(x, x₀), where k balances the importance of each loss. This balanced approach preserves crucial semantic details while minimizing bandwidth consumption.
Performance evaluation using metrics like CLIP score (frame-level semantic similarity), BERT score (video-level semantic consistency), and traditional PSNR and SSIM (pixel-level fidelity) demonstrated the GVSC framework's effectiveness in capturing and reconstructing video content aligned with human perception. Simulations across various SNR conditions and comparisons against traditional codecs (H.264/H.265+LDPC) and other semantic communication schemes highlighted the framework's superior performance at ultra-low channel bandwidth ratios (CBR) down to 10⁻². The "First Frame+Desc." scheme consistently achieved a CLIP score exceeding 0.92 at CBR = 0.0057 for SNR > 0 dB, showcasing robust performance even under challenging transmission conditions. This approach holds immense potential for bandwidth-constrained applications like streaming media, where real-time interaction is less critical.
This newsletter highlights a convergence of AI, deep learning, and advanced signal processing, driving innovation across diverse fields. Fundus2Globe showcases the transformative potential of generative AI in healthcare, offering a cost-effective and accessible solution for personalized myopia management. The AI-driven vascular age prediction from PPG signals provides a novel, non-invasive digital biomarker for cardiovascular health assessment, paving the way for scalable risk stratification and early intervention. Finally, the Generative Video Semantic Communication framework revolutionizes video transmission by prioritizing semantic information and achieving ultra-low bitrates, promising significant advancements in bandwidth-constrained communication scenarios. These innovative approaches represent a paradigm shift towards data-driven, semantic-aware solutions with far-reaching implications for future research and applications.