This collection of preprints explores diverse applications of signal processing and machine learning, with a notable emphasis on wireless communications and sensing. Several papers focus on enhancing the performance and efficiency of future wireless systems, particularly in the context of 6G. For instance, Fischer et al. (2024) (Fischer et al., 2024) investigate Kolmogorov-Arnold networks for nonlinear equalization in high-speed optical networks, demonstrating superior performance compared to traditional linear equalizers and convolutional neural networks. Similarly, Tinnerberg et al. (2024) (Tinnerberg et al., 2024) analyze the trade-off between spectral efficiency and processing latency in panel-based large intelligent surfaces (LIS), highlighting the impact of antenna distribution and detection algorithms. The development of novel antenna architectures is further explored by Prod'homme & del Hougne (2024) (Prod'homme & del Hougne, 2024), who demonstrate the potential of leveraging mutual coupling in dynamic metasurface antennas to improve radiation pattern control. Complementing these hardware-focused investigations, Guimarães et al. (2024) (Guimarães et al., 2024) provide a comprehensive survey of machine learning techniques for spectrum sharing, covering various aspects from sensing and allocation to access and handoff.
Beyond communication systems, several contributions address specific signal processing challenges. Jiménez-Galindo et al. (2024) (Jiménez-Galindo et al., 2024) propose a modified shuffled frog-leaping algorithm for designing digital filters, achieving performance comparable to or exceeding other bio-inspired optimization methods. In biomedical signal processing, Liu et al. (2024) (Liu et al., 2024) introduce MSEMG, a lightweight sEMG denoising model based on the Mamba State Space Model, demonstrating improved performance with fewer parameters. Meanwhile, García-Fernández & Särkkä (2024) (García-Fernández & Särkkä, 2024) develop Gaussian multi-target filtering algorithms for continuous-time target dynamics with discrete-time measurements, incorporating realistic birth, motion, and lifespan models. Addressing the challenge of limited labeled data, Wang et al. (2024) (Wang et al., 2024) propose a graph-enhanced EEG foundation model that integrates both temporal and inter-channel information through a combination of graph neural networks and a masked autoencoder.
The theme of leveraging data-driven approaches extends to other domains as well. Xu et al. (2024) (Xu et al., 2024) introduce RelCon, a self-supervised relative contrastive learning approach for building a motion foundation model from wearable sensor data. In RF propagation modeling, Chen et al. (2024) (Chen et al., 2024) present RFScape, a framework that combines neural object representation with ray tracing to accurately model complex RF-object interactions. Furthermore, Das et al. (2024) (Das et al., 2024) propose a Kalman filter-based algorithm using B-splines for fusing acceleration and strain data to estimate full-field dynamic displacement, offering a promising approach for structural health monitoring.
Several papers explore the potential of reconfigurable intelligent surfaces (RIS) and related technologies. An et al. (2024) (An et al., 2024) provide a comprehensive overview of emerging intelligent metasurface technologies, including 2D RIS, 3D stacked intelligent metasurfaces (SIM), and flexible intelligent metasurfaces (FIM). Di Renzo (2024) (Di Renzo, 2024) further focuses on the state-of-the-art of SIM for wireless communications, sensing, and computing. Di Renzo & del Hougne (2024) (Di Renzo & del Hougne, 2024) advocate for the use of multiport network theory for modeling and optimizing reconfigurable metasurfaces, emphasizing electromagnetic consistency. These contributions highlight the growing interest in RIS and its potential to revolutionize wireless systems.
Finally, several papers address practical challenges in diverse application areas. Song et al. (2024) (Song et al., 2024) investigate energy harvesting using reconfigurable holographic surfaces mounted on miniature UAVs for THz cooperative networks. Zhang et al. (2024) (Zhang et al., 2024) propose a visual-inertial localization algorithm for GNSS-denied environments, combining visual place recognition, pedestrian dead reckoning, and Kalman filtering. Parker et al. (2024) (Parker et al., 2024) explore scaling transformers for low-bitrate high-quality speech coding, achieving state-of-the-art performance. These diverse contributions demonstrate the breadth and depth of ongoing research in signal processing and machine learning across various domains.
Emerging Technologies in Intelligent Metasurfaces: Shaping the Future of Wireless Communications by Jiancheng An, Mérouane Debbah, Tie Jun Cui, Zhi Ning Chen, Chau Yuen https://arxiv.org/abs/2411.19754
Caption: Figure for paper Emerging Technologies in Intelligent Metasurfaces: Shaping the Future of Wireless Communications
The world of wireless communications is undergoing a dramatic transformation, thanks to the advent of intelligent metasurfaces. These engineered surfaces, composed of subwavelength-scale meta-atoms, manipulate electromagnetic waves to an unprecedented degree, offering significant improvements in network performance. This paper provides a comprehensive overview of the latest advancements in intelligent metasurface technology, focusing on three key areas: Reconfigurable Intelligent Surfaces (RIS), Stacked Intelligent Surfaces (SIM), and Flexible Intelligent Surfaces (FIM).
RIS, typically 2D structures, have already demonstrated their potential in real-world deployments. Experiments with RIS prototypes have shown significant gains in signal strength and coverage, even in challenging environments with blockages. For instance, in an outdoor test scenario, an RIS provided a power gain of over 6 dB over a 35 m propagation path with a blockage. Mathematically, the impact of RIS on wave propagation can be modeled using techniques like the Rayleigh-Sommerfeld diffraction integral, where the propagation coefficient between meta-atoms is given by:
w<sub>n,n'</sub> = (A cos χ<sub>n,n'</sub> / 2πr<sub>n,n'</sub>) * e<sup>j2πr<sub>n,n'</sub>/λ</sup>
where A is the area of each meta-atom, r<sub>n,n'</sub> is the propagation distance, χ<sub>n,n'</sub> is the angle between the propagation direction and the normal of the metasurface, and λ is the wavelength. Current research focuses on extending RIS capabilities from passive reflection to simultaneous transmission and reflection (STAR-RIS), and from channel estimation to more efficient channel training methods.
Taking the concept of intelligent metasurfaces a step further, SIM introduces a third dimension. By stacking multiple layers of programmable metasurfaces, SIMs create an analog electromagnetic neural network capable of performing complex signal processing tasks at the speed of light. Prototypes have demonstrated impressive capabilities in applications like MIMO beamforming, radar sensing, and image classification. For example, a six-layer SIM with only four RF chains achieved the same channel estimation accuracy as a fully digital estimator with 64 RF chains. SIMs offer significant advantages in terms of power efficiency and reduced hardware complexity, particularly for applications like cell-free massive MIMO and low-earth orbit satellite communications.
Finally, FIM brings adaptability and conformability to intelligent metasurfaces. These surfaces can morph their 3D shape in response to dynamic wireless channels, achieving diversity gain and improved signal quality. Prototypes have demonstrated shape morphing using various methods, including electrical currents, liquid metal networks, and ionic actuators. For example, in a MIMO scenario, FIMs with a morphing range of 0.1λ enhanced weak eigenchannels by over 40 dB compared to rigid arrays. FIMs hold great promise for millimeter-wave and terahertz communications, where channel coherence distance is small. However, challenges remain in areas like channel estimation, efficient surface-shape morphing, and practical deployment. The convergence of these three emerging technologies – RIS, SIM, and FIM – promises to revolutionize wireless communications, enabling unprecedented levels of performance and efficiency. Further research and development in these areas will be crucial for realizing the full potential of intelligent metasurfaces and shaping the future of wireless connectivity.
Machine Learning for Spectrum Sharing: A Survey by Francisco R. V. Guimarães, José Mairton B. da Silva Jr., Charles Casimiro Cavalcante, Gabor Fodor, Mats Bengtsson, Carlo Fischione https://arxiv.org/abs/2411.19032
Caption: This visualization depicts the interconnected landscape of research topics related to machine learning for spectrum sharing in wireless communication networks. Key areas like reinforcement learning, deep learning, spectrum sensing, resource allocation, and 5G mobile communication are highlighted, showcasing the convergence of these technologies. The density of each topic reflects its prevalence in the literature, emphasizing the growing importance of machine learning for optimizing spectrum usage.
The ever-increasing demand for wireless services, driven by 5G and the nascent 6G, necessitates efficient spectrum sharing to accommodate diverse applications and the sheer volume of connected devices. Traditional model-based approaches, like optimization and game theory, struggle with the complexity of modern communication environments. This has spurred significant interest in data-driven methods, particularly machine learning (ML), to manage the intricacies of spectrum sharing. This survey offers a comprehensive overview of the state-of-the-art in ML for spectrum sharing, mapping prominent methods and their applications across various dimensions, including spectrum sensing, allocation, access, and handoff.
The survey begins with a concise introduction to the three major ML categories: supervised learning, unsupervised learning, and reinforcement learning (RL). For supervised learning, the survey highlights algorithms like support vector machines (SVMs), k-nearest neighbors (k-NN), and random forests, often used for spectrum sensing and classification tasks. In unsupervised learning, methods like K-means clustering and mixture models are employed for tasks such as channel availability estimation and primary user (PU) detection. Reinforcement learning, with its focus on maximizing cumulative rewards in dynamic environments, is presented as a powerful tool for spectrum allocation and access. The core concept of maximizing the expected return, denoted as gₜ = Σ γʳᵗ⁺ᵏ⁺¹ (where γ is the discount rate), is explained along with key RL algorithms like Q-learning, SARSA, and deep Q-learning.
The survey then delves into the specific applications of ML in spectrum sharing. For spectrum sensing, supervised learning using SVMs and CNNs is shown to be effective, particularly for classifying channel occupancy based on energy statistics or covariance matrices. Unsupervised learning, employing K-means and mixture models, offers faster training but lower accuracy. For spectrum allocation, RL-based methods, especially Q-learning and deep Q-learning, are highlighted for their ability to adapt to dynamic channel conditions and optimize resource utilization. The survey discusses how these methods can be enhanced by incorporating techniques like experience replay, target networks, and federated learning. Similarly, for spectrum access, RL algorithms are shown to be effective in controlling channel access and minimizing interference. The survey also explores the use of DDPG and PPO for handling continuous state and action spaces in spectrum access problems.
Beyond the core spectrum sharing mechanisms, the survey also covers the application of ML in spectrum handoff, beamforming, and security. For handoff, ML classifiers like k-NN and SVMs are used for beam selection and handover decisions. In beamforming, ML techniques, particularly DNNs and CNNs, are employed for designing adaptive beamforming weights to maximize spectral efficiency and minimize interference. Finally, the survey discusses how ML can be used to enhance security in spectrum sharing networks by addressing threats such as eavesdropping, jamming, spoofing, and intrusion. For instance, supervised learning algorithms like SVMs are used for detecting abnormal network behavior and identifying potential attackers. The survey concludes by highlighting several open research challenges and future trends in ML for spectrum sharing. These include the management of 6G THz frequency bands, dimension reduction for beamforming in massive MIMO systems, the coexistence of communication and radar systems, real-time ML applications, and the integration of ML services within the network. The survey emphasizes the growing importance of ML in addressing the complex challenges of spectrum sharing in future wireless systems and provides a valuable roadmap for future research in this area.
Scaling Transformers for Low-Bitrate High-Quality Speech Coding by Julian D Parker, Anton Smirnov, Jordi Pons, CJ Carr, Zack Zukowski, Zach Evans, Xubo Liu https://arxiv.org/abs/2411.19842
Caption: The architecture of the Transformer Audio AutoEncoder (TAAE) for neural speech coding. The model uses a Finite Scalar Quantization (FSQ) bottleneck and incorporates strided convolutions, transformer blocks with self-attention and feedforward layers, along with pre/post processing steps for patching and unpatching the audio input. The dashed lines indicate the flow of information for calculating the pre-training and fine-tuning loss (Lpre/Lfine) and the discriminator loss (Ldisc).
A new study from Stability AI introduces a novel transformer-based architecture, dubbed Transformer Audio AutoEncoder (TAAE), for neural speech coding, achieving state-of-the-art performance at ultra-low bitrates. The model challenges the conventional reliance on convolutional or recurrent networks in codec design, demonstrating the scalability and effectiveness of transformers in this domain. Instead of using traditional Residual Vector Quantizers (RVQs), TAAE employs a modified Finite Scalar Quantization (FSQ) approach. This tackles issues like inconsistent codebook utilization and complexities in generative modeling associated with RVQs. The model also incorporates a novel post-hoc method to decompose FSQ into low-order residuals, further enhancing its performance.
TAAE's architecture mirrors a standard transformer setup, featuring multiple blocks operating at different temporal resolutions. Each block includes a strided 1D dense convolution layer for downsampling in the encoder (and its transposed equivalent for upsampling in the decoder), followed by a chain of transformer blocks. These blocks consist of self-attention and feedforward sections, incorporating pre-norm layer normalization, QK-norm, and LayerScale for training stability. The self-attention mechanism uses a sliding window and Rotary Positional Embeddings (RoPE). Unlike convolutional architectures, TAAE performs the majority of temporal downsampling/upsampling at the input/output, minimizing the sequence length and avoiding small dimension embeddings within the transformer blocks. A modified FSQ bottleneck quantizes the latent representation using the formula:
Q<sub>L</sub>(x) = 2/(L-1) * [((L-1) * tanh x + 1)/2]<sup>1/2</sup> - 1/2 + ε<sub>L-1</sub>
where L is the number of quantization levels, x is the scalar input, and ε<sub>L-1</sub> represents a small constant.
Trained on the Librilight (60k hours) and English portion of Multilingual LibriSpeech (45k hours) datasets, TAAE was evaluated against several state-of-the-art codecs, including DAC, SpeechTokenizer, SemantiCodec, Encodec, and Mimi. Objective metrics like SI-SDR, Mel Distance, STFT Distance, PESQ, and STOI were used, alongside a subjective MUSHRA test with 24 participants. The results show TAAE significantly outperforming baselines across all objective metrics at bitrates of 400 and 700 bps. The subjective tests confirmed these findings, with TAAE achieving near-ground-truth quality ratings. Further analysis revealed near-optimal codebook utilization and competitive real-time performance despite the larger parameter count. The study also explored scaling experiments, demonstrating TAAE's performance improvement with increasing parameter count (up to 1B). A causal version of TAAE, suitable for streaming applications, showed minimal performance degradation compared to the non-causal version and outperformed the streaming codec Mimi in objective metrics. Furthermore, TAAE demonstrated strong generalization to unseen languages, outperforming multilingual codecs even though trained only on English data. These findings highlight the potential of scaling transformer-based architectures for achieving new benchmarks in speech quality and compression. While promising, TAAE has some limitations. The training dataset is limited to 16kHz English speech, primarily audiobook recordings. This might affect performance on speech from different settings or with significant background noise. The large parameter count also requires more computational resources than some baselines. Future work will focus on scaling to larger and more diverse datasets at higher sampling rates.
This newsletter showcases the exciting advancements happening at the intersection of signal processing and machine learning. From revolutionizing wireless communications with intelligent metasurfaces to optimizing spectrum utilization with data-driven approaches and achieving breakthroughs in low-bitrate speech coding with scaled transformers, these preprints highlight the transformative potential of these technologies. The common thread weaving through these diverse research areas is the power of intelligent algorithms and innovative hardware designs to address complex challenges and unlock new possibilities. The research on intelligent metasurfaces promises to reshape the wireless landscape, enabling more efficient and adaptable communication systems. Meanwhile, the application of machine learning to spectrum sharing offers a crucial pathway to managing the growing demand for wireless connectivity. Finally, the success of scaled transformers in speech coding opens up exciting new avenues for achieving high-quality audio at ultra-low bitrates, with potential implications for a wide range of applications. These advancements underscore the rapid pace of innovation in these fields and point towards a future where intelligent systems play an increasingly central role in our connected world.