This collection of preprints explores the exciting intersection of signal processing, communication systems, and machine learning, with a particular focus on applications in healthcare and wireless networks. Deep learning emerges as a powerful tool for enhanced signal analysis and system optimization. For example, Haldar (2024) introduces Shannon-like interpolators optimized for weighted Hilbert spaces, allowing the incorporation of spectral priors for improved sub-Nyquist data interpolation.
In wireless communications, several advancements are presented. Demirkol and Kucur (2024) analyze the outage performance of a NOMA system with energy harvesting and Alamouti/MRC. Wang et al. (2024) propose a deep residual network for enhanced channel estimation in near-field IRS-aided MIMO systems. Xie et al. (2024) introduce KANsformer, a novel architecture combining transformers and Kolmogorov-Arnold networks for scalable beamforming.
Data-driven approaches also play a crucial role in optimizing system performance and resource allocation. Mancini et al. (2024) employ federated deep reinforcement learning (FedDRL) for joint channel selection in V2X networks. Huan et al. (2024) explore semi-supervised learning with multi-modal data (CSI and RGB images) for vehicle positioning. Soleymani et al. (2024) analyze the rate region of RIS-aided URLLC broadcast channels, comparing different RIS architectures. Tang et al. (2024) investigate jamming mitigation with a movable antenna array using a deep learning framework, showcasing the advantages of adaptable antenna positioning.
Healthcare and environmental monitoring benefit significantly from these advancements. Barua Soumma et al. (2024) develop LIFT-PD, a self-supervised learning framework for real-time freezing of gait detection in Parkinson's disease. Zhang and Han (2024) investigate optimal sensor placement for TDOA-based source localization, accounting for sensor location errors. Pop et al. (2024) present a wireless system integrated with a DNN model for structural health monitoring of CFRP structures.
Further notable contributions include Gholami et al.'s (2024) experimental realization of reconfigurable intelligent surfaces (RIS) using space-time coded metasurfaces, Majal and Chughtai's (2024) development of a Gaussian adaptive selective outlier rejecting smoother for trajectory reconstruction, and Wang et al.'s (2024) investigation of age of information-oriented probabilistic link scheduling for D2D networks. Zhu and Bhandari (2024) present a novel modulo ADC hardware implementation for the Unlimited Sensing Framework, enabling high-dynamic-range signal capture. Zhang et al. (2024) propose a channel-wise attention-based PLS-1D-CNN model for SARS-CoV-2 screening using infrared signatures. Son and Quynh (2024) introduce ISDNN, a deep neural network for channel estimation in massive MIMO systems.
Finally, deep learning finds application in diverse domains, including seizure prediction (Saeizadeh et al., 2024), battery end-of-life prediction (Park et al., 2024), and ECG analysis (Han & Ding, 2024). Fu et al. (2024) investigate multi-IRS deployment optimization for enhanced wireless coverage. Pjanić et al. (2024) propose a dynamic user grouping method based on location and heading in 5G NR systems. Brüsch et al. (2024) introduce contrastive random lead coding for channel-agnostic self-supervision of biosignals. Gideoni et al. (2024) explore non-invasive neural decoding in source-reconstructed brain space.
EEGPT: Unleashing the Potential of EEG Generalist Foundation Model by Autoregressive Pre-training by Tongtian Yue, Shuning Xue, Xuange Gao, Yepeng Tang, Longteng Guo, Jie Jiang, Jing Liu https://arxiv.org/abs/2410.19779
Caption: The image illustrates the architecture of EEGPT, a generalist EEG foundation model. The left side depicts the single-electrode generative pre-training stage, while the right side shows the multi-electrode and multi-task fine-tuning process, incorporating a task-shared electrode graph for transfer learning across diverse EEG datasets and tasks. This architecture enables EEGPT to unify data from various electrode configurations and achieve state-of-the-art performance on a range of downstream tasks.
EEGPT stands out as the first generalist EEG foundation model, marking a significant advancement in the field. Traditionally, EEG research has been limited by specialized models designed for specific tasks, datasets, or even individual subjects. This fragmented approach hinders transfer learning and the development of versatile, robust models. EEGPT addresses these challenges head-on, offering broad compatibility with various EEG acquisition devices, subjects, and tasks, accommodating signals from up to 138 electrodes in any combination.
The model's versatility stems from several key innovations. First, it employs an electrode-wise modeling strategy, treating each electrode as a fundamental unit. This allows for the integration of diverse EEG datasets, regardless of the number or arrangement of electrodes, effectively unifying data formats and enabling training on a massive dataset of 37.5M pre-training samples.
Second, EEGPT pioneers the use of autoregressive pre-training in the EEG domain, moving away from the traditional masked autoencoder approach. This shift to a next signal prediction task allows the model to better capture the sequential and temporal dependencies inherent in EEG data. The authors explore scaling laws with models up to 1.1B parameters, the largest in EEG research to date, demonstrating the potential for even greater performance with increased scale.
For transfer learning, EEGPT introduces a learnable electrode graph network shared across multiple tasks. This network, with electrodes as nodes, integrates spatial information, complementing the temporal representations learned during pre-training. The task-specific node activation patterns adapt to the input data format, enabling the model to handle diverse tasks within a single framework. This multi-task learning approach not only confirms multi-task compatibility but also reveals a synergistic effect, with tasks demonstrating mutual enhancement.
EEGPT's effectiveness was rigorously evaluated on 12 benchmarks spanning 5 distinct downstream tasks: emotion recognition, motor imagery classification, mental workload detection, sleep stage classification, and cross-modality tasks. EEGPT consistently outperformed existing specialist models across all tasks, demonstrating significant accuracy improvements. These results, coupled with extensive ablation studies, validate the effectiveness of the proposed training recipe, highlighting the benefits of autoregressive pre-training and confirming the emergence of scaling laws for both model size and training data volume. EEGPT represents a significant leap forward, offering improved scalability, transferability, and adaptability, and paving the way for a new era of generalist EEG models.
UniMTS: Unified Pre-training for Motion Time Series by Xiyuan Zhang, Diyan Teng, Ranak Roy Chowdhury, Shuheng Li, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang https://arxiv.org/abs/2410.19818
Caption: This image illustrates the challenges in generalizing motion time series models across different devices, locations, activities, and mounting orientations due to insufficient real-world data. It highlights UniMTS's ability to address these challenges by enabling generalization across 1) devices and locations (e.g., smartwatch on wrist vs. smartphone on upper leg), 2) mounting orientations (depicted by rotated coordinate systems), and 3) activities performed during training (e.g., lying down, sitting) versus deployment (e.g., walking, cycling). The lock icon symbolizes the privacy-preserving nature of the UniMTS framework.
Motion time series, collected from wearable devices and smartphones, hold vast potential for understanding human behavior. However, privacy concerns and the difficulty of data labeling have limited the development of robust, generalizable models. Existing models, often trained and tested on the same dataset, struggle with variations in device placement, orientation, and activity type. UniMTS introduces a unified pre-training framework to address these challenges, achieving state-of-the-art performance in zero-shot, few-shot, and full-shot settings.
UniMTS utilizes a contrastive learning approach, aligning motion time series with text descriptions enriched by large language models (LLMs). This allows the model to learn the semantic meaning of the time series, enabling generalization across different activities. To overcome the scarcity of large-scale motion data, UniMTS synthesizes time series from existing motion skeleton data, encompassing a wide range of body locations. Spatio-temporal graph networks model the relationships between joints, facilitating generalization across diverse device placements. A rotation-invariant augmentation technique, applying a random rotation matrix R<sub>Ji</sub> ~ Uniform(SO(3)) to each joint J<sub>i</sub> at each timestep t (x̃<sub>Ji</sub><sup>t</sup> = R<sub>Ji</sub>x<sub>Ji</sub><sup>t</sup>), ensures robustness to varying device orientations.
Evaluated on 18 real-world motion time series datasets, UniMTS demonstrates exceptional performance. In the zero-shot setting, it achieves a remarkable average accuracy of 43.5% and macro-F1 score of 34.3%, outperforming the best baseline by a significant margin. This improvement holds true in few-shot and full-shot settings as well, demonstrating consistent gains over existing methods. Analyses further reveal that the learned embeddings effectively cluster time series according to their semantic meanings, even for activities unseen during pre-training. The framework's efficiency is also noteworthy, with a smaller model size and faster fine-tuning time compared to the best-performing baseline. UniMTS represents a significant step forward in motion time series analysis, paving the way for more robust and generalizable models while preserving privacy.
PaPaGei: Open Foundation Models for Optical Physiological Signals by Arvind Pillai, Dimitris Spathis, Fahim Kawsar, Mohammad Malekzadeh https://arxiv.org/abs/2410.20542
Caption: The image illustrates the PaPaGEI framework for PPG signal analysis. It shows the process of pre-training a foundation model on a large dataset of PPG signals from diverse users, followed by its application to various downstream health-related tasks. The framework leverages a morphology-aware approach to learn robust and generalizable representations from PPG data.
PPG signals, readily obtainable from various devices, offer a valuable window into cardiovascular health and other physiological indicators. However, current machine learning models for PPG analysis often lack generalizability due to limited data, noise, and individual variability in signal morphology. PaPaGei, the first open foundation model for PPG signals, addresses these challenges through large-scale pre-training and novel representation learning.
Trained on over 57,000 hours of PPG data from publicly available datasets, PaPaGei employs a self-supervised learning framework. A key innovation is its morphology-aware approach (PaPaGei-S), focusing on agreements between signals with similar blood volume changes, captured by metrics like stress-induced Vascular Response Index (SVRI: $\frac{\sum_{i=sys}^{n} x_i}{\sum_{i=1}^{n - sys} x_i}$), Inflection Point Area ratio (IPA: $\frac{\sum_{i=1}^{dn} x_i}{\sum_{i=dn}^{n} x_i}$), and Signal Quality Index (SQI: $\frac{1}{W}\sum_{w=1}^{W} \frac{m_3}{m_2^{3/2}}$). This contrasts with the patient contrastive approach (PaPaGei-P), which maximizes agreement between signals from the same patient.
Evaluated on 20 diverse downstream tasks across 10 datasets, PaPaGei demonstrates significant improvements over existing time-series foundation models. It boosts classification performance by an average of 6.3% and regression performance by 2.9%, outperforming competitors in a majority of tasks. Notably, PaPaGei achieves these results with a relatively compact model size, highlighting its data and parameter efficiency. Further analysis shows that PaPaGei-S consistently outperforms PaPaGei-P, emphasizing the effectiveness of the morphology-aware approach. While PaPaGei shows promising results across various skin tones, especially lighter ones, further research is needed to improve robustness on darker skin tones. The release of PaPaGei as an open-source model marks a significant step forward, opening up new opportunities for developing more robust and widely applicable health monitoring tools.
This newsletter highlights the growing influence of deep learning in signal processing and communication systems, particularly within healthcare and wireless networks. The showcased papers present impactful advancements, from EEG analysis with EEGPT's generalist foundation model to the robust motion time series analysis of UniMTS and the open-source PPG foundation model PaPaGei. These models address critical challenges in their respective domains, offering improved performance, generalizability, and efficiency. The trend towards data-driven approaches and the utilization of large-scale datasets are evident throughout, promising continued progress in these exciting fields. The open-source nature of models like PaPaGei further encourages community involvement and accelerates the development of innovative applications. These advancements collectively underscore the potential of signal processing and machine learning to revolutionize healthcare, wireless communications, and other diverse fields.