ArXiv Pulse - Stay updated with the latest research papers

General Overview

Recent advances in wireless communications and sensing technologies have spurred significant innovations in signal processing and system design. Researchers are exploring diverse applications, from respiration monitoring to retail inventory management. For instance, Xiong et al. (2024) https://arxiv.org/abs/2502.12114 introduced BS-Breath, a novel approach for respiration pattern estimation using cell-free massive MIMO systems. Their method achieves remarkable correlation improvements through Weighted Antenna Combining. Complementing this work, Zhang et al. (2024) https://arxiv.org/abs/2502.12093 developed WeVibe, a weight change estimation system utilizing audio-induced shelf vibrations, showcasing the versatility of vibration-based sensing in retail.

The integration of artificial intelligence with wireless systems is also a key focus. Qiao et al. (2024) https://arxiv.org/abs/2502.12096 proposed TokCom, a unified framework for cross-modal context-aware semantic communications. This framework leverages generative foundation models to improve bandwidth efficiency by an impressive 70.8%. Meanwhile, Jiang et al. (2024) https://arxiv.org/abs/2502.11965 developed CSI-CLIP, a MIMO wireless channel foundation model demonstrating superior performance in positioning and beam management through contrastive learning.

Addressing challenges in massive MIMO and integrated sensing and communication (ISAC) systems remains crucial. Lee & Hong (2024) https://arxiv.org/abs/2502.10836 introduced CIRCLE, a CSIT-Free MIMO precoding method utilizing circulant permutation of the DFT matrix to enable interference-free signal combining. Hernangómez et al. (2024) https://arxiv.org/abs/2502.10371 advanced ISAC with CISSIR, a beam codebook design approach that reduces self-interference while maintaining communication performance (Liu et al., 2024) https://arxiv.org/abs/2502.09929.

Finally, machine learning continues to show promise in biomedical signal processing. You et al. (2024) https://arxiv.org/abs/2502.11023 presented DT4ECG, a dual-task learning framework achieving 99.12% accuracy in ECG-based human identity recognition. Complementing this, Odonga et al. (2024) https://arxiv.org/abs/2502.09626 explored bias mitigation in wearable-based Freezing of Gait detection systems, demonstrating how transfer learning improves fairness and performance across demographics. These advances highlight the growing importance of considering both technical performance and ethical implications in medical sensing.

Paper Highlights

Token Communications: Revolutionizing Semantic Communication with Generative AI

Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications by Li Qiao, Mahdi Boloursaz Mashhadi, Zhen Gao, Rahim Tafazolli, Mehdi Bennis, Dusit Niyato https://arxiv.org/abs/2502.12096

Caption: This diagram illustrates the Token Communications (TokCom) framework, showing how multimodal data (text, image, audio) is tokenized at the transmitter, processed by GFMs/MLLMs, and transmitted over a communication channel. The receiver then de-tokenizes the received data, utilizing cross-modal attention and context to reconstruct the original information, even with potential token loss during transmission. The different colored blocks represent different modalities, while the crossed boxes indicate tokens masked for compression or lost in communication.

Semantic communication (SemCom) aims to transmit meaning rather than just bits, a crucial shift for emerging applications like the metaverse and extended reality (XR). While current SemCom systems leverage AI, they often operate at a low level (pixels, samples), overlooking higher-level semantic context. This paper introduces TokCom, a novel framework leveraging generative foundation models (GFMs) and multimodal large language models (MLLMs) to enhance SemCom by processing information as tokens. These tokens, compressed representations of multimodal data, enable efficient semantic processing and context awareness by capturing relationships between different data modalities (text, image, audio). TokCom integrates transformer-based token processing, allowing the system to predict missing or corrupted tokens based on surrounding context, thus reducing the need for retransmissions.

TokCom offers several advantages over traditional SemCom. It uses a pre-trained token codebook as a shared knowledge base between transmitter and receiver, reducing overhead. The discrete nature of tokens makes TokCom inherently digital and compatible with existing network architectures. TokCom leverages cross-modal relationships, enabling ultra-low-rate communication and adapts to various tasks like reconstruction and generation. The paper outlines four basic TokCom setups: semantic source compression, semantic channel coding, semantic multiple access, and semantic network protocols. For example, in semantic source compression, GFM/MLLMs predict token probabilities, enabling efficient compression by minimizing log-likelihood loss. In semantic channel coding, the modulation and coding scheme are adapted based on both channel quality and token predictability derived from context.

A case study on generative image transmission demonstrates TokCom's effectiveness. The setup involves tokenizing images, grouping tokens into packets, and transmitting them after channel coding and modulation. A cross-modality TokCom scheme is proposed, incorporating cross-modality information (CMI), such as image class labels, to improve the prediction accuracy of lost tokens. The Token Communication Bandwidth Efficiency (TCE), defined as $\frac{h \times w}{T \times N \times log_2(Q)}$, is used as a key performance metric, where h and w are image dimensions, T is the average number of retransmissions, N is the number of tokens per image, and Q is the codebook size. Results show significant performance gains with TokCom. At a 10dB SNR and a packet error rate (PER) of 19%, TokCom achieves a 23.8% TCE improvement while maintaining comparable semantic quality (measured by CLIP score) to the conventional scheme with retransmissions. Even at a lower SNR of 6dB (PER of 41%), TokCom maintains robustness with only a slight 4.5% drop in CLIP score. The inclusion of CMI further enhances performance, particularly in preserving semantic quality under challenging channel conditions. The study highlights the importance of semantic and perceptual quality metrics like CLIP and LPIPS in evaluating generative SemCom, as they better reflect the quality of reconstructed content compared to traditional metrics like PSNR. The paper also identifies open research directions, including efficient tokenizer design, addressing computational complexity through collaborative inference, and ensuring privacy and security in TokCom systems.

Unveiling the Power of Complex-Valued Transformers in Wireless Communications

Unveiling the Power of Complex-Valued Transformers in Wireless Communications by Yang Leng, Qingfeng Lin, Long-Yin Yung, Jingreng Lei, Yang Li, Yik-Chung Wu https://arxiv.org/abs/2502.11151

Caption: This diagram illustrates the architecture of a complex-valued transformer for wireless communication tasks. It shows the embedding layer, parallel encoder with complex-valued multi-head attention (CMHA), CLN, and CFCN(2) blocks, and the aggregation decoder leading to the final output p. The use of complex-valued operations like CLinear and CAtt allows for more efficient processing of complex-valued signals like channel state information.

Complex-valued neural networks (CVNNs) have gained traction in wireless communications due to their ability to naturally represent complex-valued signals like channel state information (CSI). However, existing work primarily focuses on simpler CVNN architectures, neglecting the potential of more advanced techniques like transformers. This paper introduces a comprehensive framework for complex-valued transformers in wireless communications, exploring both the theoretical advantages of CVNNs and their practical application in diverse tasks. The authors theoretically prove that CVNNs require fewer layers than their real-valued counterparts to achieve the same approximation error for a continuous complex-valued function. This translates to lower computational complexity, a crucial factor in real-world deployments.

The paper details the core operations within CVNNs, including the complex-valued linear transformation (CLinear), complex-valued layer normalization (CLN), and the crucial complex-valued attention mechanism (CAtt). The CLinear operation, expressed as $\mathbf{R}(y) + j\mathbf{I}(y) =(\mathbf{R}(W^c) + j\mathbf{I}(W^c))(\mathbf{R}(x) +j\mathbf{I}(x)) + \mathbf{R}(b) + j\mathbf{I}(b)$, leverages the inherent relationship between the real and imaginary components of complex numbers, requiring fewer trainable parameters than real-valued implementations. The CAtt mechanism, defined as $\text{CAtt}(Q^c, K^c, V^c) = \text{Softmax}(\mathbf{R}((Q^c)^H K^c)/\sqrt{d})(V^c)^T$, utilizes cosine similarity to measure the relationship between complex vectors, a more appropriate metric than the dot product used in real-valued transformers.

The authors demonstrate the effectiveness of complex-valued transformers in three key wireless communication applications: channel estimation, user activity detection, and joint design of pilot sequence, feedback quantization, and precoder. In channel estimation, the proposed ComplexLight architecture outperforms real-valued counterparts, achieving lower mean squared error (MSE) across various SNRs and Doppler shifts, even with limited training data. For user activity detection, the complex-valued transformer significantly improves the probability of missed detection (PM) and probability of false alarm (PF) trade-off compared to real-valued versions and other benchmarks. Finally, in the joint design task, the complex-valued transformer achieves higher downlink sum rates than real-valued counterparts and a DNN-based baseline, demonstrating robustness across different feedback capacities and SNR levels. The results consistently highlight the superior performance of complex-valued transformers. For example, in channel estimation, ComplexLight achieves the lowest MSE across all tested SNRs and Doppler shifts. In user activity detection, the complex-valued transformer achieves an order of magnitude lower probability of error compared to real-valued counterparts when trained on limited data. In the joint design task, the complex-valued transformer achieves a 20% performance gain over the real-valued transformer with the same number of feedback bits. These findings underscore the potential of complex-valued transformers to significantly enhance the performance and efficiency of various wireless communication systems.

ELAA-ISAC: Mapping Indoor Environments Using Wireless Signals

ELAA-ISAC: Environmental Mapping Utilizing the LoS State of Communication Channel by Jiuyu Liu, Chunmei Xu, Yi Ma, Rahim Tafazolli, Ahmed Elzanaty https://arxiv.org/abs/2502.10091

Caption: This figure showcases the performance of the ELAA-ISAC method, demonstrating the impact of the number of service antennas (M) and user locations (N) on mapping accuracy, measured by average Intersection over Union (IoU). As both M and N increase, the IoU improves, highlighting the benefit of denser antenna arrays and more extensive MT exploration for accurate indoor environment mapping.

Researchers have proposed a novel method, dubbed ELAA-ISAC, for mapping indoor environments using the line-of-sight (LoS) state information of extremely large aperture array (ELAA) channels. This technique leverages the high spatial resolution offered by ELAAs and the mobility of a mobile terminal (MT) to infer the presence and location of obstacles, effectively turning wireless communication signals into environmental probes. Unlike traditional ISAC approaches that often require a trade-off between communication and sensing performance, this method relies on information already being estimated for communication purposes, thus avoiding any performance compromise.

The core of the ELAA-ISAC method lies in its LoS state estimation, formulated as a binary hypothesis testing problem (H₀: NLoS, H₁: LoS). The optimal decision rule is derived based on the likelihood ratio test. The theoretical error probability for LoS estimation is given by: ε = Q(√(Kpm)/(2pm + 2(K + 1)σε)) , where K is the Rician K-factor, pm is the path loss, and σε is the channel estimation error variance. This formula highlights the importance of a strong LoS component (high K-factor) and accurate channel estimation for reliable LoS state determination.

The environmental mapping algorithm progressively outlines the layout by combining LoS state information from multiple MT locations. The underlying principle is that LoS links between the MT and ELAA antennas signify obstacle-free paths. As the MT moves through the environment, the algorithm updates the map by marking areas with LoS links as explored and free of obstacles. Simulation results demonstrate the effectiveness of this approach, showing that the mapping accuracy improves with increasing numbers of service antennas and MT locations. In a LoS-dominated environment (K-factor > 15 dB) with 256 service antennas and 18 MT locations, the proposed method achieved an average intersection over union (IoU) exceeding 80%. The results also confirmed the theoretical error probability analysis and highlighted the impact of channel estimation error and NLoS components on mapping quality. Future research directions include exploring wideband signal utilization for enhanced LoS estimation, optimizing MT trajectories, and extending the framework to 3D environments. This research opens promising avenues for leveraging the rich spatial information embedded within wireless communication signals for accurate and efficient environmental mapping.

Conclusion

This newsletter highlights the convergence of advanced wireless communication technologies with cutting-edge AI techniques. TokCom showcases the potential of generative AI and tokenization for achieving unprecedented bandwidth efficiency in semantic communication, paving the way for richer and more efficient data transmission in applications like the metaverse. The introduction of complex-valued transformers demonstrates a fundamental shift in neural network architectures for wireless systems, offering improved performance and reduced complexity in tasks like channel estimation and user activity detection. Finally, ELAA-ISAC presents a novel approach to environmental mapping by leveraging the inherent sensing capabilities of ELAA systems, blurring the lines between communication and sensing and opening exciting possibilities for future applications. These advances collectively point towards a future where wireless systems are not just about transmitting bits, but about understanding and interacting with the world around us.