ArXiv Pulse - Stay updated with the latest research papers

General Overview

This collection of papers explores diverse topics within wireless communications, ranging from fundamental channel modeling and characterization to innovative applications of intelligent surfaces and advanced signal processing techniques. Maric and Njemcevic (2025a, 2025b) delve into the intricacies of two-wave with diffuse power (TWDP) fading channels. Their first paper introduces a novel statistical simulator for TWDP channels and derives key correlation properties, including autocorrelation and cross-correlation functions of quadrature components, as well as autocorrelation of the complex and squared envelope. Their second contribution focuses on deriving closed-form and infinite-series expressions for the conditional phase distribution of the TWDP process, offering valuable insights into how channel conditions influence phase behavior.

Several papers focus on enhancing wireless system performance through intelligent surface deployment and signal processing innovations. Gu, Park, and Choi (2025) propose ScNeuGM, a scalable neural graph modeling framework for optimizing resource allocation in Wi-Fi 7 networks. Their approach leverages an evolution strategy and deep hashing function to manage contention and interference, achieving significant reductions in slot usage and packet losses. Ilgac and Sezgin (2025) introduce a novel single-antenna terahertz sensing method using preconfigured metasurfaces to enable angular estimation, overcoming a key limitation of single-antenna radar systems. Zheng (2025) explores dual-polarized intelligent omni-surfaces (IOS) for independent reflective-refractive transmission, demonstrating enhanced performance compared to traditional IOS systems. Marini et al. (2025) present experimental evidence of perfect matching of reactive loads through complex frequencies, showcasing a novel approach to impedance matching without resistive elements.

The application of advanced signal processing and machine learning techniques is a recurring theme. Ma et al. (2025) analyze the asymptotic performance of one-bit quantized box-constrained precoding in large-scale multi-user systems, employing Gordon's inequality and a novel Gaussian Min-Max Theorem. Ngorima, Helberg, and Davel (2025) propose a data pilot-aided temporal convolutional network (TCN) for channel estimation in vehicle-to-vehicle communications, showing significant BER performance improvements. Wang et al. (2025) investigate antenna position optimization for movable antenna-empowered near-field sensing, deriving Cramér-Rao bounds and optimizing array geometry for enhanced AoA and distance estimation. Cao (2025) proposes a hybrid near-field and far-field localization method with multiple holographic MIMO surfaces, addressing the challenges of expanded near-field regions and interference in large-scale HMIMO systems.

Paper Highlights

Metis: A Foundation for Unified Speech Generation

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training by Yuancheng Wang, Jiachen Zheng, Junan Zhang, Xueyao Zhang, Huan Liao, Zhizheng Wu https://arxiv.org/abs/2502.03128

Caption: This diagram illustrates the two-stage generation process within Metis, a unified speech generation model. First, SSL tokens are generated based on inputs like text (shown here). These tokens are then converted into acoustic representations by the Speech Acoustic Codec, resulting in the generated speech waveform.

Metis represents a significant leap forward in speech generation. Unlike previous models limited to specific tasks or struggling with multi-tasking, Metis leverages a pre-training and fine-tuning approach, similar to successful foundation models in NLP and computer vision. This allows Metis to learn from massive amounts of unlabeled speech data, acquiring a deep understanding of speech patterns before specializing in particular tasks.

The key innovation lies in its masked generative pre-training on self-supervised learning (SSL) tokens. These tokens capture semantic and prosodic information extracted from 300,000 hours of diverse speech data. This pre-training imbues Metis with a strong foundation, facilitating efficient adaptation to downstream tasks with minimal task-specific data.

Metis utilizes a two-stage generation process. First, it generates SSL tokens conditioned on task-specific inputs like text, noisy speech, or visual features. Subsequently, it transforms these SSL tokens into acoustic representations for waveform synthesis. This modular design allows Metis to handle a wide range of tasks by simply adjusting the input conditions. During fine-tuning, task-specific conditions are integrated as additional inputs, enabling efficient adaptation with limited data and parameters. Remarkably, the model even supports multimodal conditional inputs, such as text, audio, and video, enhancing its versatility. The pre-training objective is represented by the formula: p_θ(x_ssl | x_ssl^m, x_prompt), where x_ssl is the SSL token sequence, x_ssl^m represents the masked SSL tokens, and x_prompt is an optional prompt sequence.

Metis excels across five diverse speech generation tasks: zero-shot text-to-speech (TTS), voice conversion, target speaker extraction, speech enhancement, and lip-to-speech, outperforming state-of-the-art task-specific and multi-task systems. For instance, in zero-shot TTS, Metis achieved comparable or superior performance to models trained on substantially larger datasets. In voice conversion, it significantly surpassed previous benchmarks. The researchers also explored multi-task fine-tuning, creating Metis-Omni, a model trained jointly on all four tasks, demonstrating the potential for a single unified model to handle diverse speech generation tasks.

ScNeuGM: Optimizing Wi-Fi 7 Performance

ScNeuGM: Scalable Neural Graph Modeling for Coloring-Based Contention and Interference Management in Wi-Fi 7 by Zhouyou Gu, Jihong Park, Jinho Choi https://arxiv.org/abs/2502.03300

Caption: This diagram illustrates the ScNeuGM framework for optimizing Wi-Fi 7 RTWT slot assignment. It shows the process of generating a colored graph representing STA contention and interference, using a neural network to assign slots (colors), computing the loss based on network performance, and updating the model parameters to minimize the number of slots while maintaining reliability. This iterative process leverages an evolution strategy and deep hashing function for scalability and efficiency in large IIoT networks.

ScNeuGM addresses the persistent challenges of contention and interference in Wi-Fi, particularly within the demanding Industrial Internet of Things (IIoT) environment. Building upon Wi-Fi 7's Restricted Target Wake Time (RTWT) mechanism, ScNeuGM introduces a scalable neural graph modeling framework to optimize slot assignments, minimizing latency while ensuring reliability.

ScNeuGM models the Wi-Fi network as a binary-directed graph, where stations (STAs) are vertices and contention/interference impacts are represented as edges. A neural network (NN) is trained to construct an optimal graph model, and the coloring of this graph corresponds to the optimal RTWT slot assignments. This approach combines the flexibility of neural networks with the structural benefits of graph coloring.

Two key challenges in large Wi-Fi networks are addressed: the absence of explicit STA-pairwise feedback and the quadratic complexity of processing all STA pairs. ScNeuGM employs an evolution strategy (ES) to directly optimize NN parameters based on a network-wide reward signal, eliminating the need for edge-wise feedback. Additionally, a deep hashing function (DHF) groups likely contending/interfering STA pairs, drastically reducing computational complexity by limiting NN training and inference to these groups. The objective is to minimize the number of slots Z while guaranteeing each STA k's reliability rk exceeds a threshold τ, formulated as: minz,Z Z, s.t. E[rk|Ŝ,z] ≥ τ, zk ∈ {1,...,Z}, ∀k.

Simulations demonstrate ScNeuGM's effectiveness, reducing the number of slots by 25% compared to heuristic graph models. The ES-trained NN is significantly more likely to produce near-optimal graphs than algorithms requiring edge-wise feedback. The DHF dramatically accelerates the NGM, significantly reducing training and inference times, as well as online slot assignment time. In dynamic scenarios, ScNeuGM achieves considerably fewer packet losses due to timely graph regeneration.

LEAD: A New Era in Alzheimer's Detection

LEAD: Large Foundation Model for EEG-Based Alzheimer's Disease Detection by Yihe Wang, Nan Huang, Nadia Mammone, Marco Cecchi, Xiang Zhang https://arxiv.org/abs/2502.01678

Caption: The image illustrates the LEAD training pipeline for EEG-based Alzheimer's Disease detection. Self-supervised pre-training utilizes 7 non-AD and 4 AD datasets, followed by unified fine-tuning on 5 AD datasets. This leads to subject-level detection, classifying individuals as either AD or HC (Healthy Control).

LEAD introduces a groundbreaking approach to Alzheimer's disease (AD) detection using electroencephalogram (EEG) data. Addressing the limitations of existing methods, LEAD presents the first large foundation model for EEG-based AD detection, accompanied by the world's largest EEG-AD corpus, comprising 813 subjects.

LEAD tackles the data scarcity challenge by leveraging this curated corpus and addresses inter-subject variations through a novel pipeline. This pipeline includes channel alignment for standardization, self-supervised contrastive pre-training, and unified fine-tuning. The pre-training phase involves both sample-level and subject-level contrastive learning on a combined dataset of AD and non-AD EEG data. Sample-level contrasting utilizes the InfoNCE loss: $L_{sam} = E_x[-log\frac{exp(sim(z_i, z_j)/T)}{\sum_j exp(sim(z_i, z_j)/T)}]$, where i and j are indices of samples in a batch B, and T is a temperature parameter. Subject-level contrasting employs a similar loss, treating samples from the same subject as positives. The final loss function is a weighted sum: $L = \lambda_1 L_{sam} + \lambda_2 L_{sub}$. Unified fine-tuning is then performed on multiple channel-aligned AD datasets.

LEAD's architecture incorporates a backbone encoder with temporal and spatial branches to capture comprehensive EEG features. This design, coupled with the pre-training strategy, enables LEAD to achieve state-of-the-art performance, showing significant improvements in F1 scores at both the sample and subject levels compared to existing methods. Ablation studies validate the effectiveness of key components, such as the inclusion of diverse EEG data in pre-training and the use of subject-level contrastive learning. Unified fine-tuning across multiple AD datasets consistently outperforms single-dataset fine-tuning.

Conclusion

This newsletter highlights significant advancements across various domains of wireless communications. From fundamental channel modeling with TWDP analysis to cutting-edge applications of intelligent surfaces and advanced signal processing, the research presented offers valuable insights and innovative solutions. The development of Metis, a unified speech generation model, marks a paradigm shift in the field, leveraging large-scale pre-training and fine-tuning for unparalleled versatility and performance. ScNeuGM addresses the critical challenges of contention and interference in Wi-Fi 7 networks, offering a scalable and efficient solution for optimizing resource allocation in demanding IIoT environments. Finally, LEAD revolutionizes Alzheimer's disease detection through a groundbreaking foundation model and a comprehensive EEG dataset, demonstrating the power of advanced machine learning techniques in biomedical applications. These contributions collectively represent significant progress in wireless communications and signal processing, paving the way for future innovations and impactful real-world applications.