ArXiv Pulse - Stay updated with the latest research papers

General Overview

This collection of preprints explores diverse applications of signal processing and machine learning in communication systems, medical imaging, and sensor networks. Several papers focus on enhancing wireless communication performance through innovative antenna designs and signal processing techniques. Prod’homme et al. (2025) demonstrate, both theoretically and experimentally, that mutual coupling in dynamic metasurface antennas (DMAs) can be leveraged to improve radiation pattern control, challenging the conventional wisdom of minimizing coupling. Ruah et al. (2025) introduce context-aware doubly-robust learning, a semi-supervised approach that combines real-world data with synthetic data from a network digital twin (NDT) for improved beamforming, adapting its reliance on the NDT based on its fidelity. Addressing practical challenges in Time Difference of Arrival (TDoA) positioning, Garcia-Fernandez (2025) proposes using Differential Transmitter Bias (DTB) to mitigate node-specific hardware biases, analogous to DCB in GNSS. Furthermore, Mu et al. (2025) analyze the performance of fluid antenna systems for LoRa, focusing on pilot sequence overhead and placement for improved Symbol Error Rate (SER). Finally, Kato et al. (2025) propose a Bayesian optimization approach for designing data-carrying reference signals that maximize spectral efficiency across a wider SNR range.

Another prominent theme is the application of deep learning for signal analysis and interpretation. Shen et al. (2025) employ U-Net architectures to estimate phase-aberrated point spread functions (PSFs) in ultrasound imaging, using synthetically generated phase aberration data for training. Chen et al. (2025) introduce FaultGPT, a vision-language model for industrial fault diagnosis, generating reports directly from vibration signals and text-based supervision. For real-time MRI video segmentation, Tholan et al. (2025) investigate the impact of pretraining and adaptation data sizes on SegNet and UNet performance for air-tissue boundary segmentation. Huang et al. (2025) propose IncepFormerNet, a novel hybrid architecture combining Inception and Transformer networks, for improved Steady-State Visually Evoked Potential (SSVEP) classification in Brain-Computer Interface (BCI) systems. Lastly, Masrur & Guvenc (2025) develop a 3D clustering-based deep learning model for Unmanned Aerial Vehicle (UAV)-based RF source localization, bridging the gap between simulation and reality with an enhanced two-ray propagation model.

Several works address optimization problems in resource allocation and signal processing. Liu & Khosravi (2025) propose probabilistic frameworks, including Maximum Likelihood (ML) and Expectation-Maximization (EM) approaches, for system identification of linear dynamics with bilinear observation models. Quan et al. (2025) investigate online resource management for the uplink of a wideband hybrid beamforming system, proposing sequential solutions for beam selection, user selection, and power allocation. Jiang et al. (2025) present a model-driven learning approach for joint waveform and beamforming design in Reconfigurable Intelligent Surface - Integrated Sensing and Communication (RIS-ISAC) systems, unfolding the iterative Alternating Direction Method of Multipliers (ADMM) algorithm for radar target detection and Direction of Arrival (DoA) estimation. Chen et al. (2025) optimize antenna position and beamforming for movable antenna-enabled ISAC, deriving optimal solutions and efficient algorithms for various channel scenarios.

Beyond these core themes, several papers explore specialized applications. Valdivia et al. (2025) introduce a novel audio signal interpolation method based on optimal transportation of spectrograms, leveraging the unbalanced transport framework. Kalbasi et al. (2025) apply wavelet transform for classifying local field potential data from rat brains in a conditioned place preference paradigm. Qian et al. (2025) present an optimized design of a current mirror in 150 nm GaAs technology, focusing on compactness and energy-efficient operation. Agaronyan et al. (2025) develop a graph-based deep learning model on stereo EEG data for predicting seizure freedom in epilepsy patients. Dong & Noh (2025) explore continual person identification using footstep-induced floor vibrations, addressing the challenge of data variability across different floor structures.

Finally, several papers focus on theoretical advancements and system-level design. Sheemar et al. (2025) investigate holographic joint communications and sensing, deriving Cramér-Rao bounds and proposing a Majorization-Minimization (MM)-based optimization framework. Ouyang et al. (2025) analyze the electromagnetic degrees of freedom for continuous-aperture array systems. Zhang et al. (2025) develop a theoretical framework for Acoustic Frequency-Division Multiplexing (AFDM)-enabled integrated sensing and communication, proposing a novel pilot design for enhanced performance. Meng et al. (2025) propose a variational Bayesian approach for near-field motion parameter estimation. Hong & Tsakiris (2025) study the problem of unlabeled sensing for subspaces with Toeplitz bases. These diverse contributions highlight the ongoing efforts to push the boundaries of signal processing and machine learning across various domains. Furthermore, Bhargav et al. (2025) explore robust information selection for hypothesis testing, introducing a misclassification penalty framework. Parchekani et al. (2025) investigate the use of reconfigurable intelligent surfaces for Orthogonal Frequency-Division Multiplexing (OFDM) radar interference mitigation. Gillani et al. (2025) leverage the error resilience of iterative algorithms for energy efficiency in radio astronomy calibration. Xu et al. (2025) study post-stroke rehabilitative mechanisms in individualized fatigue level-controlled treadmill training. Cui et al. (2025) evaluate multi-sensor placement and neural network architectures for physical activity level classification. Yu et al. (2025) propose a multi-task adaptive ray-tracing platform for 6G digital twin networks. Xie et al. (2025) propose a low-complexity placement design of pinching-antenna systems. Wang et al. (2025) introduce a framework for 3D radar sequence prediction using spatiotemporal coherent Gaussian representation. Krishne Gowda et al. (2025) discuss enhancing pavement sensor data acquisition for AI-driven transportation research.

Paper Highlights

Road to 6G Digital Twin Networks: Multi-Task Adaptive Ray-Tracing as a Key Enabler

Road to 6G Digital Twin Networks: Multi-Task Adaptive Ray-Tracing as a Key Enabler by Li Yu, Yinghe Miao, Jianhua Zhang, Shaoyi Liu, Yuxiang Zhang, Guangyi Liu https://arxiv.org/abs/2502.14290

Caption: The MART-6G platform architecture for 6G digital twin networks features three modules: an environment twin, a ray-tracing engine, and a channel twin, working together to provide accurate and adaptable channel modeling. The platform supports both offline and online tasks, adapting its simulation parameters based on specific requirements, including environment sensing, propagation modeling, and hardware acceleration. This adaptability enables a balance between accuracy and computational efficiency for real-time DTN applications.

The advent of 6G promises a paradigm shift in communication technology, bringing with it the Internet of Everything (IoE) and unprecedented connectivity. However, the inherent complexity of 6G network design and optimization demands innovative solutions. Digital twin networks (DTNs), virtual replicas of their physical counterparts, offer a powerful tool for real-time monitoring, prediction, and control, addressing this complexity head-on. A critical component of DTN realization lies in the accurate modeling of wireless channels, a task traditionally reliant on statistical analysis. This paper champions the adoption of deterministic ray-tracing (RT) as a more precise and adaptable method for 6G DTN channel modeling, introducing a novel multi-task adaptive ray-tracing platform for 6G, aptly named MART-6G.

MART-6G tackles the limitations of existing RT simulators by incorporating key 6G features and offering task-specific customization. The platform's architecture comprises three core modules: an environment twin module responsible for sensing and replicating dynamic environments, an RT engine module incorporating propagation algorithms, accelerations, calibrations, and 6G-specific features, and a channel twin module dedicated to generating multipath parameters and statistical distributions. Crucially, MART-6G's adaptability shines through its ability to tailor itself to specific DTN tasks, both online and offline, by judiciously selecting appropriate sensing methods, antenna and material libraries, propagation models, and calibration strategies. This inherent flexibility allows for a delicate balance between model accuracy and computational complexity, a crucial requirement for real-time DTN applications.

The paper substantiates MART-6G's performance through two compelling real-world case studies. In an offline network planning scenario, set on a university campus operating at 14.8 GHz, the platform demonstrated a marked improvement in path loss (PL) prediction accuracy after calibration, reducing the error from 7.7 dB to 4.5 dB. In a separate online vehicle-to-vehicle (V2V) scenario at 6 GHz, MART-6G accurately replicated the multipath component (MPC) distribution, achieving a channel similarity index (SI) of 86% compared to actual measurements. Impressively, the average update time for this online simulation was a mere 70 ms, showcasing the platform's real-time capabilities.

Larger-scale testing across both scenarios further validated MART-6G's robust performance. Calibration significantly enhanced the accuracy of both PL and delay spread (DS) predictions, reducing the normalized mean square error (NMSE) for PL from 0.005 to 0.002 and for DS from 0.008 to 0.005. While calibration inevitably increased simulation time, the platform maintained real-time performance for online tasks (0.07s per point) and achieved reasonable performance for offline tasks (249s per 1000 points). The paper concludes by outlining future challenges for RT-enabled DTN, including large-scale model validation, seamless integration with existing communication systems, and considerations for hardware and energy consumption. These findings underscore MART-6G's potential as a pivotal enabler for the realization of autonomous and intelligent 6G networks.

Continual Person Identification using Footstep-Induced Floor Vibrations on Heterogeneous Floor Structures

Continual Person Identification using Footstep-Induced Floor Vibrations on Heterogeneous Floor Structures by Yiwen Dong, Hae Young Noh https://arxiv.org/abs/2502.15632

Caption: Visualization of Feature Transformation for Person Identification Using Footstep Vibrations

Person identification plays a vital role in smart buildings, enabling personalized services such as health monitoring, activity tracking, and personnel management. However, traditional person identification systems rely on pre-collected data from every individual, a requirement that proves impractical in many buildings and public facilities where visitors are common. This necessitates the development of continual person identification systems capable of learning and adapting to new individuals on the fly. While existing camera-based solutions offer a potential avenue, they raise privacy concerns and require direct line-of-sight, limiting their applicability. Other modalities like wearables and pressure mats suffer from limitations related to device-carrying requirements or the need for dense deployment. Footstep-induced structural vibration sensing emerges as a promising alternative, offering a non-intrusive and privacy-friendly approach. However, this method faces a significant challenge: the inherent variability in vibration data stemming from structural heterogeneity and variations in human gait. This paper introduces a novel approach to tackle this variability and enable accurate online person identification using footstep vibrations.

The researchers meticulously characterized the variability in footstep-induced structural vibration data by decomposing it into two primary sources: footstep variability, which encompasses natural variations in human gait, and structural variability, arising from variations in wave generation and propagation within the structure. They quantified these sources using covariance analysis of frequency-domain footstep features, revealing that structural variability is the dominant factor. To mitigate this, they devised a physics-guided linear transformation function, Xtransformed = wTXoriginal, grounded in the wave propagation and attenuation characteristics of the structure. This transformation maps footstep features into a new feature space where within-person variability is minimized, and between-person separability is maximized. The transformation parameters, w, are optimized by maximizing the ratio of between-person covariance (SB) to within-person covariance (SW): maximize J(w) = (wTSBw)/(wTSww).

For online person identification, the transformed footstep features are modeled using a Dirichlet Process Mixture Model (DPMM), a non-parametric Bayesian model capable of accommodating an unlimited number of individuals. As new footstep data is observed, the DPMM updates its posterior probabilities, enabling the system to continuously learn and identify new individuals without predefined limits. Person identification is then performed by selecting the individual with the largest posterior probability. Field experiments conducted on both wood and concrete structures with 20 participants demonstrated the effectiveness of this approach. The proposed method achieved a remarkable 70% reduction in feature variability compared to the original data for both structures. Furthermore, the system boasted a 90% average accuracy in online person identification of 10 individuals on each structure, starting with data from only one person. This impressive performance underscores the efficacy of the proposed variability reduction and online learning approach for practical person identification in real-world settings.

FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision Language Models

FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision Language Models by Jiao Chen, Ruyi Huang, Zuohong Lv, Jianhua Tang, Weihua Li https://arxiv.org/abs/2502.15481

Caption: The architecture of FaultGPT, a conversational model for fault diagnosis question answering (FDQA), is illustrated. It processes time-frequency images of vibration signals through a visual encoder and a multi-scale cross-modal image decoder (MCID) to capture localized fault features. These features are then integrated with learnable prompt embeddings by a prompt learner, enabling a large language model (LLM) to generate detailed diagnostic reports.

Traditional fault diagnosis methods often fall short of providing comprehensive insights, relying heavily on single data sources like vibration signals and yielding limited classification scores. FaultGPT, a novel conversational model, addresses these shortcomings by harnessing the power of large vision-language models (LVLMs) for end-to-end fault diagnosis question answering (FDQA). This innovative approach empowers FaultGPT to generate detailed diagnostic reports directly from raw vibration signals, encompassing information on fault type, severity, and location, thus moving beyond simple classification.

FaultGPT's architecture consists of three key components. A visual encoder, based on the pre-trained CLIP model, extracts features from time-frequency images of vibration signals. A multi-scale cross-modal image decoder (MCID) then captures fine-grained fault semantics, addressing the limitations of global feature alignment in existing models. The MCID employs a scale path mechanism, inspired by ClipSAM, to capture features at different scales: V1 = conv3×3(Avg_Pools1×s1 (Fpatch)) and V2 = conv3×3(Avg_Pools2×s2 (Fpatch)). These multi-scale features are subsequently merged and processed to generate localized fault features M. Finally, a prompt learner integrates these localized features with learnable base prompt embeddings, enabling the LLM to generate accurate and context-aware diagnostic reports. The model is trained using a combined loss function encompassing cross-entropy, focal, and dice losses: L = αLce + βLfocal + γLdice.

The researchers rigorously evaluated FaultGPT on three extensive bearing fault datasets: CWRU, SCUT-FD, and Ottawa. They compared its performance against several open-source LVLMs, including GPT-Neo, Mistral, and LLaMA-2, using metrics such as Accuracy, BLEU, ROUGE, CIDEr, and Match. The results demonstrated that LLMs significantly outperformed smaller language models, with LLaMA-2 and Mistral excelling in cross-modal alignment tasks. On the SCUT-FD dataset, FaultGPT achieved a remarkable 92.8% Pixel-AUC and 85.6% Token Accuracy, showcasing the effectiveness of the MCID and prompt learner. Furthermore, the model exhibited promising few-shot and zero-shot learning capabilities, suggesting strong generalization potential.

Ablation studies confirmed the importance of each component in FaultGPT's architecture. Removing the MCID or prompt learner resulted in substantial performance degradation, highlighting their essential roles in capturing and aligning fault features. The study also validated the effectiveness of the combined loss function and demonstrated the superiority of instruction tuning over direct fine-tuning for improved generalization. FaultGPT's user-friendly interface, designed for non-experts, further enhances its practical applicability in real-world industrial settings. Future research will focus on extending FaultGPT to compound fault diagnosis and predicting remaining useful life, broadening its impact on industrial asset management.

Conclusion

This newsletter highlights a convergence of advanced signal processing and machine learning techniques across diverse applications. From optimizing antenna designs for 6G communication networks to developing sophisticated fault diagnosis systems for industrial machinery, the research presented here pushes the boundaries of what's possible. The development of MART-6G, a multi-task adaptive ray-tracing platform, promises to revolutionize the design and optimization of future 6G networks by enabling the creation of highly accurate and adaptable digital twins. Meanwhile, the innovative approach to continual person identification using footstep vibrations opens up exciting possibilities for non-intrusive and privacy-preserving monitoring in smart buildings. Finally, FaultGPT exemplifies the transformative potential of vision-language models in industrial settings, enabling automated and comprehensive fault diagnosis through conversational AI. These advancements collectively underscore the increasing synergy between signal processing, machine learning, and real-world applications, paving the way for smarter, more efficient, and more connected systems.