This collection of preprints explores diverse applications of signal processing and machine learning in communication systems, biomedical engineering, and human-computer interfaces. Several papers focus on enhancing the efficiency and robustness of wireless communication systems. Sarkis, Jiang, and Poggiolini (Sarkis et al., 2024) introduce a faster algorithm for spatial power profile calculation in backward Raman amplified UWB systems, addressing the challenges of ISRS. Hashempour, Berardinelli, Adeogun, and Jorswieck (Hashempour et al., 2024) investigate power-efficient cooperative communication in IIoT subnetworks, comparing relay-based and RIS-based protocols and employing SPCA for power minimization. Delbari, Wang, Gholian, Asadi, and Jamali (Delbari et al., 2024) propose a temperature-aware phase-shift design for LC-RIS to enhance secure communication. Focusing on the THz band, Yin, Gao, and Han (Yin et al., 2024) propose a dual-functional FMCW waveform for space debris detection and inter-satellite communication, while Gao and Han (Gao & Han, 2024) model THz wave propagation in charged dust using extended Mie scattering theory. These works collectively demonstrate the ongoing efforts to optimize and enhance the performance of various wireless communication systems using advanced signal processing techniques.
Another prominent theme is the application of deep learning to improve signal quality and extract meaningful information. Naseri, De Poorter, Moerman, Poor, and Shahid (Naseri et al., 2024) explore U-Net CNNs with quantization and depthwise separable convolutions for high-throughput blind co-channel interference cancellation on resource-constrained edge devices. Du, Lu, Ai, and Ling (Du et al., 2024) introduce a neural denoising vocoder based on amplitude and phase prediction for clean waveform generation from noisy mel-spectrograms. Dong, Cao, Gui, and Zhang (Dong et al., 2024) develop a robust distributed deep joint source-channel coding scheme for image transmission with imperfect CSI. These contributions highlight the growing trend of leveraging deep learning for signal enhancement and efficient data transmission in communication systems.
Several papers focus on biomedical signal processing and human-computer interfaces. Ariza, Tardon, Barbancho, De-Torres, and Barbancho (Ariza et al., 2024) utilize a Bi-LSTM network for EEG-based error detection in musicians' performance. Tahery, Akhlaghi, Amirsoleimani, and Farzi (Tahery et al., 2024) introduce HeartBERT, a self-supervised ECG embedding model for efficient medical signal analysis. Wang-Nöth, Heiler, Huang, et al. (Wang-Nöth et al., 2024) investigate optimized data collection strategies for artifact detection in EEG recordings. Jiang, Meng, Chen, Xu, and Wu (Jiang et al., 2024) propose CSP-Net, integrating CSP filters with CNNs for EEG-based motor imagery classification. These studies showcase the potential of machine learning and deep learning in analyzing complex biomedical signals for various applications, including performance monitoring, disease diagnosis, and brain-computer interfaces.
The intersection of artificial intelligence and communication networks is further explored by Shao and Li (Shao & Li, 2024) who introduce AI Flow, a framework for distributing AI inference across network edge devices, and by Muth, Schmidt, Chimmalgi, and Schmalen (Muth et al., 2024) who investigate resolution improvement in OFDM-based joint communication and sensing. Hasabelnaby, Obeed, Saif, Chaaban, and Hossain (Hasabelnaby et al., 2024) provide a comprehensive survey on the evolution of distributed antenna systems, from centralized RAN to Open RAN. These works highlight the growing interest in integrating AI capabilities into communication networks to improve efficiency, performance, and adaptability.
Finally, several papers address specific challenges in signal processing and communication. Villena, Tardon, Barbancho, et al. (Villena et al., 2024) propose preprocessing techniques to mitigate eye artifacts in EEG analysis. Li, Shin, and Yin (Li et al., 2024) present a personalized continual EEG decoding framework. Lee and Yin (Lee & Yin, 2024) explore a network expansion approach for reliable brain-computer interfaces. Islam, Bentahar, Cohen, and Rjoub (Islam et al., 2024) introduce a multi-modal unsupervised learning approach for biomedical signal processing in CPR. These diverse contributions demonstrate the ongoing efforts to develop innovative solutions for specific challenges in signal processing and communication across various domains. Collectively, these preprints offer valuable insights into the latest advancements and future directions in these fields.
AI Flow at the Network Edge by Jiawei Shao, Xuelong Li https://arxiv.org/abs/2411.12469
Caption: This figure illustrates the cooperative inference process within the AI Flow framework. A small model on an edge device processes initial data (t0-t4) and transmits relevant information to a larger model on an edge server. The server verifies the initial processing (green checkmarks for t1-t2) and corrects any errors (red Xs for t3-t4), demonstrating the distributed intelligence approach.
The proliferation of large language models (LLMs) and their multimodal counterparts has revolutionized AI, offering unprecedented capabilities across diverse domains. However, deploying these powerful models at the network edge, closer to the data source, presents significant challenges. Resource-constrained edge devices struggle with the computational demands of model inference, while limited bandwidth hinders the transmission of raw data to powerful cloud servers. This paper introduces AI Flow, a novel framework designed to address these challenges by efficiently distributing intelligence across devices, edge nodes, and cloud servers.
AI Flow streamlines the inference process by leveraging the heterogeneous computational resources available across the network. It introduces a paradigm shift from transmitting raw information flow to intelligence flow, where communication is task-oriented and integrated into the inference process. Edge devices extract only critical features, minimizing communication overhead. The framework supports three cooperative inference schemes: on-device inference for simple tasks, device-edge cooperative inference for more complex tasks, and device-edge-cloud cooperative inference for resource-intensive tasks requiring external knowledge. This hierarchical approach adapts to dynamic network conditions and task requirements, ensuring efficient and low-latency responses.
Two key enabling techniques underpin AI Flow. Cooperative inference minimizes communication overhead by identifying and transmitting only task-relevant information, guided by the information bottleneck principle. Ideally, the minimal information Z transmitted satisfies I(X; Z) = I(Y; Z) = I(X; Y), where X is the input data and Y is the target output. Model inference speedup addresses the computational bottleneck of large models through techniques like pruning, low-rank factorization, quantization, knowledge distillation, and dynamic neural networks. These methods reduce model size and complexity without significantly sacrificing performance.
A case study on image captioning demonstrates the effectiveness of AI Flow. Using speculative decoding, a small model on the edge device generates draft tokens, which are then verified and corrected by a larger model on the edge server. Experiments on the Vehicles-OpenImage dataset show that with an optimal draft token length of 4, AI Flow achieves approximately double the inference speed compared to server-only inference, measured by time per output token (TPOT). However, performance degrades when the draft token length is set too high or too low, highlighting the importance of careful parameter tuning. The paper concludes by highlighting future research directions for AI Flow, including security and privacy considerations through techniques like homomorphic encryption and differential privacy. Further optimization through software-hardware co-design and strategies for enhanced scalability and stability in handling concurrent requests are also crucial areas for future development.
Brain-to-Text Decoding with Context-Aware Neural Representations and Large Language Models by Jingyuan Li, Trung Le, Chaofei Fan, Mingfei Chen, Eli Shlizerman https://arxiv.org/abs/2411.10657
Caption: Figure A illustrates the DCoND (Divide-and-Conquer Neural Decoder) architecture, which decodes phonemes by marginalizing over diphone probabilities, effectively incorporating contextual dependencies. Figure B shows how Large Language Models (LLMs) are integrated with DCoND using in-context learning (LI) and fine-tuning (LIFT) to refine transcriptions by leveraging the relationship between phonemes and words, leading to improved accuracy in brain-to-text decoding.
A new study has achieved a significant leap forward in brain-to-text decoding of attempted speech, offering a potential game-changer for individuals with severe speech impairments. Researchers have developed a novel framework called DCoND (Divide-and-Conquer Neural Decoder) that leverages the context-dependent nature of phoneme representation in the brain. Instead of decoding single phonemes directly from neural activity, DCoND utilizes diphones – sequences of two adjacent phonemes – as the modeling target. This approach addresses the phenomenon of coarticulation, where the neural representation of a phoneme is influenced by surrounding phonemes. DCoND infers the phoneme probability distribution by marginalizing over the diphone distribution, effectively capturing contextual dependencies while maintaining a manageable number of classes for decoding. The divide-and-conquer strategy is adaptable to various neural decoder architectures, enhancing their performance in brain-to-text applications. Formally, the method decodes a phoneme from neural activity by marginalizing over the distribution of diphones:
p(Z = cᵢ|X) = Σ p(cⱼ, cᵢ|X)
where p(cⱼ, cᵢ|X) is the probability of neural activity X encoding the diphone cⱼ → cᵢ.
The study further enhances the decoding pipeline by incorporating Large Language Models (LLMs) to refine the transcriptions generated from the decoded phonemes. The researchers propose a novel ensembling method that provides both transcription candidates and their corresponding phoneme sequences as inputs to GPT3.5. This allows the LLM to leverage the relationship between phonemes and words, leading to more accurate and contextually sound transcriptions. Two modes of LLM utilization were explored: fine-tuning (DCoND-LIFT) and in-context learning (DCoND-LI). DCoND-LI offers a faster, gradient-free alternative for resource-constrained settings, while DCoND-LIFT leverages the full training data for optimal performance.
The effectiveness of the proposed methods was evaluated on the Brain-to-Text 2024 benchmark. DCoND-LIFT achieved a state-of-the-art Word Error Rate (WER) of 5.77%, significantly outperforming the leading benchmark method with a WER of 8.93%. DCoND-L, which combines DCoND with a 5-gram language model and OPT, also outperformed existing methods, achieving a WER of 8.06%. Furthermore, DCoND-LI, using in-context learning with GPT3.5, achieved a WER of 7.29%, demonstrating the potential of ICL for efficient adaptation in brain-to-text systems. The study also showed that DCoND achieved a Phoneme Error Rate (PER) of 15.34%, compared to 16.62% for monophone-based decoding, further validating the effectiveness of the diphone representation. This research demonstrates the potential of context-aware neural representations and LLM integration for significantly improving brain-to-text decoding performance. The proposed methods pave the way for more robust and accurate communication systems for individuals with speech impairments, bringing us closer to restoring their ability to communicate effectively. The adaptability of the DCoND framework and the exploration of both fine-tuning and in-context learning with LLMs offer valuable insights for future research and development in this critical area.
HeartBERT: A Self-Supervised ECG Embedding Model for Efficient and Effective Medical Signal Analysis by Saedeh Tahery, Fatemeh Hamid Akhlaghi, Termeh Amirsoleimani, Saeed Farzi https://arxiv.org/abs/2411.11896
Researchers have introduced HeartBERT, a self-supervised ECG embedding model inspired by the RoBERTa architecture used in natural language processing. The model aims to address the challenges of limited labeled data, high computational costs, and generalizability issues in existing ECG analysis methods. HeartBERT leverages the inherent similarities between natural language and ECG signals, both being periodic time series data with meaningful patterns. This innovative approach translates raw ECG signals into a synthetic language representation through a process of resampling, normalization, quantization, windowing, and numerical-to-ASCII conversion. This transformed data is then fed into a SentencePiece Byte Pair Encoding (BPE) tokenizer, which dynamically segments the data into meaningful tokens based on frequency, capturing intricate cardiac nuances more effectively than traditional fixed-size windowing methods.
The HeartBERT model architecture is based on the encoder-only RoBERTa framework. Tokenized ECG data is embedded into a high-dimensional vector space, augmented with positional encodings to preserve temporal information. The model utilizes six transformer blocks with multi-head self-attention mechanisms to capture dependencies and relationships within the ECG signal. Pre-training is performed using masked language modeling (MLM) on a combination of three publicly available datasets (MIT-BIH Arrhythmia Database, PTB-XL, and European ST-T Database), totaling over 314,000 data points. This self-supervised approach allows the model to learn contextual embeddings from unlabeled ECG data, reducing the reliance on costly labeled datasets. The vocabulary size is set to 52,000, the maximum sequence length to 512, and the embedding size to 768. The model is trained for 1000 epochs with a batch size of 64, a learning rate of 5e-5, and using the AdamW optimizer.
The effectiveness of HeartBERT is evaluated on two downstream tasks: sleep-stage classification and heartbeat classification. For sleep-stage classification using the MIT-BIH Polysomnographic Database, HeartBERT achieved an F1-score of approximately 75% for three-stage classification (Wake, REM, NREM) and around 62% for five-stage classification (Wake, REM, S1, S2, S3), outperforming a baseline deep convolutional recurrent (DCR) model. For heartbeat classification using the Icentia11k dataset, HeartBERT achieved an F1-score of about 88%, surpassing the performance of a multimodal image fusion (MIF) baseline model, particularly when trained on a significantly smaller subset of the data (20,000 heartbeats vs. over 150,000 for the MIF model). Experiments with freezing different layers of HeartBERT during fine-tuning showed that unfreezing only one layer yielded the best performance, suggesting that minimal fine-tuning allows the model to effectively leverage the pre-trained features while adapting to the specific downstream task. The results demonstrate the versatility, generalizability, and efficiency of HeartBERT. Its ability to perform well with smaller training datasets, reduced learning parameters, and achieve superior performance compared to rival models highlights its potential for practical applications in ECG analysis. The self-supervised learning approach reduces the dependency on labeled data, a significant advantage in the medical field. The model's ability to capture intricate ECG signal patterns through contextual embeddings makes it a promising tool for various cardiovascular-related tasks. Future work may explore applying HeartBERT to other downstream tasks and further optimizing the model architecture and training process.
This newsletter highlights significant advancements in applying AI and signal processing across diverse fields. The AI Flow framework demonstrates a paradigm shift in edge computing, moving from information flow to intelligence flow for efficient resource utilization and improved latency. This approach addresses the challenges of deploying complex AI models on resource-constrained edge devices, paving the way for ubiquitous AI-powered services. In biomedical signal processing, HeartBERT showcases the power of self-supervised learning in ECG analysis. By leveraging the Transformer architecture and masked language modeling, HeartBERT achieves impressive results in sleep stage and heartbeat classification, even with limited labeled data. This advancement offers promising potential for improved diagnostics and personalized healthcare. Finally, the innovative DCoND framework for brain-to-text decoding demonstrates a significant improvement in accuracy by incorporating contextual information through diphone representations and leveraging the power of LLMs. This breakthrough brings us closer to restoring effective communication for individuals with speech impairments. These highlighted papers collectively represent significant progress in their respective fields, pushing the boundaries of what's possible with AI, signal processing, and their intersection. They also open up exciting avenues for future research, from optimizing model architectures and training processes to addressing security and privacy concerns in distributed AI systems.