Subject: Cutting-Edge Advancements in Signal Processing, Communication, and Sensing
Hi Elman,
This newsletter explores critical challenges and advancements in signal processing, communication, and sensing, with a particular focus on integrated sensing and communication (ISAC) systems and novel machine learning applications. Several papers delve into the complexities of beamforming optimization within ISAC frameworks. Zhao et al. (2024) propose a joint beamforming scheme for multi-target detection and multi-user communication, tackling the inherent performance trade-off by maximizing the weakest target's detection probability under communication SINR constraints. Similarly, Liu et al. (2024) introduce a parametric scattering model for extended target sensing, optimizing beamforming to minimize the Cramér-Rao Bound (CRB) while maintaining communication quality. Zhu et al. (2024) investigate discrete RIS-enhanced SSK MIMO, optimizing reflecting beamforming to minimize the average bit error probability. These works collectively highlight the ongoing effort to refine beamforming techniques for enhanced performance in increasingly complex ISAC environments.
The impact of hardware limitations and resource constraints is another recurring theme. Chowdary et al. (2024) investigate the trade-off between sensing accuracy and communication rate in quantized hybrid radar fusion systems, employing the quantized CRB and communication rate as metrics. Xiu et al. (2024) tackle power minimization in RIS-aided ISCPT-NOMA systems, jointly optimizing various system parameters under QoS, sensing accuracy, and energy harvesting constraints. Sun et al. (2024) propose a power-measurement-based channel autocorrelation estimation method for IRS-assisted wideband communications, circumventing the need for extensive pilot transmission. These contributions underscore the importance of developing efficient algorithms and system designs that account for practical limitations.
Beyond ISAC, several papers explore novel applications of signal processing and machine learning. Moebus et al. (2024) introduce Nightbeat, a frequency-based method for IMU-only heart rate estimation during sleep, validated on a new dataset of wrist-worn accelerometer recordings. Li and Príncipe (2024) propose a kernel operator-theoretic Bayesian filter for nonlinear dynamical systems, leveraging the theory of reproducing kernel Hilbert spaces. Gonzalez-Martinez et al. (2024) improve snore detection using harmonic/percussive source separation and convolutional neural networks. These diverse applications showcase the versatility of signal processing techniques in addressing real-world problems.
Theoretical foundations and fundamental concepts are also addressed. Khan and Chirikjian (2024) investigate parameter estimation on homogeneous spaces, leveraging group-theoretic structures to characterize the Fisher Information Metric and CRB. Ji (2024) explores the relationship between edge centrality and the total variation of graph distributional signals. These works contribute to a deeper understanding of the underlying mathematical principles governing signal processing and network analysis.
Finally, several papers focus on specific applications and technological advancements. Puts et al. (2024) demonstrate an all-optical excitable spiking laser neuron, paving the way for all-optical photonic spiking neural networks. Kokaram et al. (2024) demystify the use of compression in virtual production, assessing the impact of lossy compression standards on in-camera picture quality. Tu et al. (2024) propose Parameterized TDOA, a novel method for instantaneous TDOA estimation and localization of mobile targets. These contributions highlight the ongoing innovation in diverse areas, pushing the boundaries of signal processing and communication technologies.
Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure by Xiang Li, Yixiang Dai, Qing Qu https://arxiv.org/abs/2410.24060
Caption: Figure (a) and (b) illustrate the effect of dataset size on score field approximation error and generated images, showing increased generalization with larger datasets. Figure (c) and (d) demonstrate the impact of model scale, revealing that smaller models, relative to the dataset size, exhibit stronger generalization and align more closely with the Gaussian inductive bias. These findings suggest that diffusion models leverage the Gaussian structure of training data for generalization, with linearity playing a key role, particularly in early training and smaller models.
Diffusion models have achieved remarkable success in image generation, but their ability to generalize from limited data remains somewhat mysterious. This paper delves into the mechanics of diffusion models, examining the learned score functions, which are essentially a series of denoisers trained at various noise levels. The authors discovered that as these models shift from memorizing training data to generalizing and creating novel images, their denoisers exhibit increasing linearity. This observation led them to investigate the linear counterparts of these complex nonlinear models.
To understand this linearity, the researchers trained linear models to mimic the function mappings of the nonlinear diffusion denoisers, creating simplified versions. Surprisingly, these linear denoisers proved to be near-optimal for a multivariate Gaussian distribution characterized by the training data's empirical mean and covariance. This suggests a key finding: diffusion models possess an inductive bias towards capturing and utilizing the Gaussian structure (specifically, the covariance information) of the training data for image generation. This bias is mathematically represented by the formula for the Gaussian denoiser: D<sub>G</sub>(x; σ(t)) = μ + UΛ<sub>σ(t)</sub>U<sup>T</sup>(x - μ), where μ and Σ are the empirical mean and covariance, and Σ = UΛU<sup>T</sup> is the singular value decomposition (SVD) of Σ.
The strength of this Gaussian inductive bias is influenced by the model's capacity relative to the size of the training dataset. It's most prominent when the model is relatively small compared to the dataset. However, even in overparameterized models, this bias appears during the early training stages before memorization dominates. This implies that early stopping could be a valuable technique for promoting generalization in large models. The authors also suggest that the recently observed "strong generalization" phenomenon, where models trained on different datasets generate similar images from the same initial noise, can be partially explained by this underlying Gaussian structure. They hypothesize that datasets of the same class may exhibit similar Gaussian structures, even with relatively few samples.
The researchers quantified the linearity of diffusion denoisers using a linearity score (LS) based on cosine similarity. They observed LS values exceeding 0.96 in well-trained models, indicating significant linearity. They also used score field approximation error (RMSE) to measure the difference between the learned denoisers (D<sub>θ</sub>) and the Gaussian denoisers (D<sub>G</sub>). In the generalization regime, this error was found to be low, particularly in the high and low noise regimes.
Generative AI-Powered Plugin for Robust Federated Learning in Heterogeneous IoT Networks by Youngjoon Lee, Jinu Gong, Joonhyuk Kang https://arxiv.org/abs/2410.23824
Caption: This figure illustrates the process of using generative AI to augment data on edge devices, transitioning from a non-IID to a more IID-like data distribution. The bar graphs represent data distribution across different classes (C1-C6) on various devices, with the dark bars showing original data and the hatched bars representing synthetic data generated by the AI. This data augmentation aims to balance class representation across devices, facilitating more effective federated learning.
Federated learning (FL) offers a privacy-preserving method for training machine learning models across decentralized devices, but it faces challenges with Non-IID data distributions and device heterogeneity. This paper introduces a plugin designed to tackle these issues by leveraging generative AI and balanced sampling to enhance convergence speed and model robustness. The central idea is to use generative AI on edge devices to synthesize data for underrepresented classes, effectively approximating IID conditions. This is coupled with a balanced sampling strategy at the central server, prioritizing devices with data distributions closest to the IID target for model aggregation. This approach aims to reduce the number of communication rounds and improve the quality of model updates.
The proposed plugin integrates seamlessly with existing FL algorithms like FedAvg, FedProx, and FedRS. It operates in two phases: The first phase is data augmentation with generative AI, where each edge device n with dataset D<sub>n</sub> uses a generative model G to create synthetic data for underrepresented classes y. The amount of synthetic data generated is determined by a deficiency ratio A<sub>y</sub>, calculated by comparing the class size |D<sup>y</sup><sub>n</sub>| with the maximum class size D<sub>max</sub>:
A<sub>y</sub> = |D<sub>max</sub> - D<sup>y</sup><sub>n</sub>| / |D<sup>y</sup><sub>n</sub>| if |D<sup>y</sup><sub>n</sub>| > 0
The second phase involves balanced sampling at the central server. After local training, each device calculates the proportion of data for each class p<sub>n</sub>(y). The central server then calculates a weighted global distribution p<sub>global</sub>(y) and compares each device's distribution to the target IID distribution p<sub>IID</sub>(y) using a distance function. The K devices with distributions closest to p<sub>IID</sub>(y) are selected for aggregation, ensuring a more balanced representation in the global model update.
Experimental results on a medical text classification task using Intel's Gaudi 2 AI accelerator demonstrated significant performance improvements. Using a Non-IID data distribution across 100 edge devices and selecting 10 devices (K=10) for each aggregation round, the plugin improved accuracy across various text classification models and FL algorithms. BioBERT saw its accuracy increase from 42.1% to 53.4%, while BioMedBERT improved from 28.7% to 52.0%. General-purpose models like BERT and DistilBERT also benefited, with gains of 17.5% and 14.4%, respectively. Convergence speed also improved significantly, with the plugin-enhanced configurations requiring about 35% fewer training epochs to reach peak performance compared to baseline FL algorithms.
Semantic Knowledge Distillation for Onboard Satellite Earth Observation Image Classification by Thanh-Dung Le, Vu Nguyen Ha, Ti Ti Nguyen, Geoffrey Eappen, Prabhu Thiruvasagam, Hong-fu Chou, Duc-Dung Tran, Luis M. Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas https://arxiv.org/abs/2411.00209
This paper introduces a dynamic weighting knowledge distillation (KD) framework designed for efficient Earth observation (EO) image classification in resource-constrained environments, such as onboard satellite processing. This framework utilizes powerful Vision Transformers (ViTs) like EfficientViT and MobileViT as teacher models to train lightweight student models, specifically ResNet8 and ResNet16, achieving over 90% accuracy, precision, and recall. Unlike traditional KD methods that rely on static weight distribution, this dynamic weighting mechanism adapts based on the confidence of each teacher model, allowing student models to prioritize more reliable knowledge sources and improve their ability to generalize data representation.
The DualKD framework's core lies in its dynamic weighting mechanism. For an input x, softened probability distributions are calculated for both teacher models (P<sub>T<sub>1</sub></sub>(x), P<sub>T<sub>2</sub></sub>(x)) and the student model (P<sub>S</sub>(x)) using a temperature parameter T. Confidence scores for each teacher (C<sub>T<sub>1</sub></sub>, C<sub>T<sub>2</sub></sub>) are computed based on the average maximum probabilities of their softened outputs. Weights α and β are then dynamically assigned to each teacher in the distillation loss (KD<sub>loss</sub>), prioritizing the more confident teacher or ignoring both if confidence is low. The KD<sub>loss</sub> is a weighted Kullback-Leibler (KL) divergence between the student's and each teacher's softened probabilities: KD<sub>loss</sub> = α ⋅ D<sub>KL</sub>(P<sub>S</sub>(x) || P<sub>T<sub>1</sub></sub>(x)) + β ⋅ D<sub>KL</sub>(P<sub>S</sub>(x) || P<sub>T<sub>2</sub></sub>(x)). This is combined with a classification loss (CE<sub>loss</sub>) to form the total loss (L<sub>total</sub>): L<sub>total</sub> = (1 - (α+β)/2) ⋅ CE<sub>loss</sub> + ((α+β)/2) ⋅ KD<sub>loss</sub>.
Experiments on the EuroSAT dataset demonstrated substantial performance improvements. ResNet8 with DualKD achieved 92.88% accuracy, a 5.12% improvement over the baseline model, while ResNet16 with DualKD reached 96.46% accuracy, a 3.56% improvement. While the student models didn't quite match the teacher models' performance (EfficientViT at 98.76% and MobileViT at 99.09%), the significant reduction in complexity makes them highly suitable for onboard deployment. ResNet8, in particular, showed remarkable efficiency gains with 97.5% fewer parameters, 96.7% fewer FLOPs, 86.2% lower power consumption, and 63.5% faster inference time compared to MobileViT.
This newsletter highlights a convergence of trends in leveraging sophisticated techniques to improve performance and efficiency in various domains. From understanding the theoretical underpinnings of diffusion models and their surprising reliance on Gaussian structure for generalization to developing innovative knowledge distillation frameworks for resource-constrained environments like onboard satellite image classification, the research showcased here pushes the boundaries of what's possible. The application of generative AI to address Non-IID data challenges in federated learning further emphasizes the growing importance of AI in optimizing complex distributed systems. Collectively, these advancements represent significant steps towards more robust, efficient, and practical solutions in signal processing, communication, and sensing.