Subject: Methodological Advancements in Statistical Modeling and Causal Inference
Hi Elman,
This newsletter covers recent preprints showcasing diverse methodological advancements across various domains, with a particular focus on statistical modeling and causal inference.
This collection of preprints explores diverse methodological advancements across various domains, with a notable emphasis on statistical modeling and causal inference. Several papers introduce novel approaches to improve the efficiency and robustness of existing methods. Zhou et al. (2024) propose a generalized framework for basket trials using p-value combination tests, enabling flexibility in endpoint selection and analysis. Similarly, Lin et al. (2024) investigate the application of Generalized Extreme Value (GEV) models for intraday risk assessment in financial markets, leveraging robust estimators and introducing novel risk indicators. Magirr et al. (2024) delve into the efficiency of nonparametric superiority tests based on restricted mean survival time (RMST) compared to the log-rank test under proportional hazards, providing practical guidance for clinical trial design. Finally, Sudijono et al. (2024) propose optimizing returns from experimentation programs by reframing A/B testing as a constrained optimization problem and leveraging empirical Bayes methods.
Another prominent theme is the development of innovative Bayesian methods for complex data analysis. Carmona et al. (2024) introduce Semi-Modular Inference (SMI) for simultaneous reconstruction of spatial frequency fields and sample locations, addressing challenges in linguistic studies. Venu (2024) explores the analogy between Highest Posterior Density (HPD) intervals and profile likelihood ratio confidence intervals, offering insights into Bayesian estimation. Sun et al. (2024) employ Bayesian spatial functional concurrent regression to uncover the dynamics between SARS-CoV-2 wastewater concentrations and community infections. Akinfenwa et al. (2024) advocate for visualization techniques in exploratory modeling analysis of Bayesian hierarchical models, enhancing model selection and interpretation.
Several contributions focus on specific applications and domain-specific challenges. Sommer et al. (2024) develop methods for deriving uncertainty intervals for lifetime risks related to occupational radon exposure, with implications for radiation protection policies. Andorra & Göbel (2024) introduce the Soccer Factor Model (SFM) for skill evaluation, addressing the confounding factor of team strength in player performance assessment. Prashanth (2024) utilizes Convolutional Neural Networks (CNNs) for accurate early detection of Parkinson's disease from SPECT imaging. Bertsimas et al. (2024) propose a novel framework, R.O.A.D., for clinical trial emulation using observational data, addressing confounding bias and identifying heterogeneity in treatment effects. Finally, Cuellar (2024) discusses the overlooked risks of non-validated exclusions in forensic science, emphasizing the importance of empirical evidence. Many other methodological contributions and applications are also presented, spanning diverse fields like vector optimization, dose-finding designs, time series forecasting, earthquake ground shaking estimation, and radio interferometry. Several papers also tackle specific challenges in data analysis and modeling, such as removing spurious correlation from neural network interpretations, handling missing data in high-dimensional longitudinal studies, and exploring the use of surrogate endpoints in health technology assessment.
The R.O.A.D. to clinical trial emulation by Dimitris Bertsimas, Angelos G. Koulouras, Hiroshi Nagata, Carol Gao, Junki Mizusawa, Yukihide Kanemitsu, Georgios Antonios Margonis https://arxiv.org/abs/2412.03528
Caption: Two Optimal Policy Trees (OPTs) derived from a novel framework for emulating clinical trials with observational data are shown. These OPTs, trained on a colorectal liver metastasis dataset to emulate the JCOG0603 trial, identify subgroups with heterogeneous treatment effects and recommend whether to prescribe imatinib (1) or not (0) based on factors like primary mitotic count and maximum tumor size. The high concordance between the two OPTs and the accurate prediction of non-benefiting patient subgroups demonstrate the framework's ability to extract clinically relevant insights from real-world data.
This paper introduces the R.O.A.D. framework, a significant advancement in emulating clinical trials using observational data. The framework addresses a critical challenge in causal inference: mitigating both observed and unobserved confounding, a limitation that has hindered the effective use of real-world data for drawing reliable conclusions about treatment effectiveness. This work demonstrates the potential of transforming observational studies into valuable resources for generating clinically relevant insights, especially when RCTs are infeasible.
The R.O.A.D. framework operates through a series of well-defined steps. Initially, it applies the eligibility criteria of the target RCT to an observational cohort. This ensures that the selected population aligns with the characteristics of the trial participants. Then, a sophisticated prognostic matching algorithm is employed to create a matched cohort mirroring the covariate distribution and baseline prognosis of the control group in the target RCT. This algorithm minimizes the absolute differences in mean baseline risk and covariate means between the matched cohort and the trial using the formula: min (Σᵢ∈S₁ Σⱼ∈S₀ wᵢzᵢⱼ / μ₀ + Σᵢ∈S₁ Σⱼ∈S₀ wⱼzᵢⱼ / μ₀) + Σˡ (Σᵢ∈S₁ Σⱼ∈S₀ xᵢₗzᵢⱼ / μ₁ˡ + Σᵢ∈S₁ Σⱼ∈S₀ xⱼₗzᵢⱼ / μ₀ˡ) + Σᵢ∈S₁ Σⱼ∈S₀ ||xᵢ - xⱼ||₂. This meticulous matching process ensures that the subsequent analysis focuses on comparable groups, minimizing the risk of confounding.
To further refine the emulation, cost-sensitive counterfactual models are trained. These models play a crucial role in addressing unobserved confounding by adjusting the prognosis estimates of the treated group to align with those observed in the trial. This step is essential for mitigating biases stemming from factors not captured in the available data. Finally, Optimal Policy Trees (OPTs) are trained on the counterfactual model probabilities to identify subgroups of patients exhibiting heterogeneity in treatment effects (HTE). This step moves beyond average treatment effects and delves into personalized medicine by identifying which patient characteristics are associated with differential treatment responses.
The validation of the R.O.A.D. framework using the JCOG0603 trial cohort for colorectal liver metastasis provides compelling evidence of its effectiveness. The results demonstrate a remarkable alignment between the emulated cohort and the actual trial data in terms of baseline risk and covariate distribution. The weight-tuning method successfully addresses unobserved confounding, aligning the predicted treatment benefit in the observational cohort with that observed in the RCT. Notably, the OPTs identified subgroups with distinct benefits from adjuvant chemotherapy, achieving a high concordance rate between two different OPTs. The framework's ability to accurately predict the proportion of patients not benefiting from chemotherapy further underscores its clinical relevance. This ability to extract clinically meaningful insights from real-world data positions the R.O.A.D. framework as a powerful tool for advancing precision medicine and improving patient outcomes.
Using a Two-Parameter Sensitivity Analysis Framework to Efficiently Combine Randomized and Non-randomized Studies by Ruoqi Yu, Bikram Karmakar, Jessica Vandeleest, Eleanor Bimla Schwarz https://arxiv.org/abs/2412.03731
Caption: Performance of Combined RCT-OS Inference
This paper presents a novel methodology for integrating data from Randomized Controlled Trials (RCTs) and Observational Studies (OSs) to improve the robustness and generalizability of causal inferences. Recognizing the inherent strengths and limitations of each data source, the authors propose a method that leverages the internal validity of RCTs and the external validity of OSs to generate more reliable and broadly applicable causal estimates. This approach addresses a fundamental challenge in causal inference: balancing the need for rigorous internal validity with the desire for findings that can be generalized to broader populations.
The proposed method employs a triplet matching algorithm to align samples from the RCT and OS. This innovative approach matches each treated unit in the OS with a variable number of control units from both the OS and the RCT. The number of RCT units matched to each treated OS unit is determined by a generalization score, v(x) = π(x)*(1-e(x))/e(x), where π(x) is the propensity score for the OS within the overlapping domain and e(x) is the selection probability for the RCT. This matching strategy ensures that the matched OS control group and the RCT sample accurately reflect the covariate distribution of the treated OS population within the overlapping domain, facilitating a more accurate comparison between the treatment and control groups.
To address potential biases inherent in both RCTs and OSs, the method incorporates a two-parameter sensitivity analysis framework. For the OS, a sensitivity analysis based on Rosenbaum's framework quantifies the robustness of inferences to unmeasured confounding, using a sensitivity parameter Γ. For the RCT, a new sensitivity analysis model accounts for generalizability bias using a sensitivity parameter Δ. This two-pronged approach allows researchers to assess the impact of both unmeasured confounding in the OS and limited generalizability of the RCT on the causal estimates. The combined inference is based on the product of the two sensitivity analysis p-values, pOS and pRCT, and the combined confidence interval is constructed by finding the supremum of β such that pOS * pRCT ≥ κα, where κα is a critical value.
The combined approach demonstrates improved performance compared to using either data source alone. Simulation studies confirm the validity of the method, achieving approximately 95% coverage when sensitivity parameters are correctly specified. Importantly, the combined confidence intervals are generally shorter than those obtained from the OS alone and substantially shorter than those from the RCT alone, indicating greater precision in the causal estimates. Furthermore, the combined inference exhibits greater robustness to misspecified sensitivity parameters, highlighting its resilience to uncertainties about the extent of bias in the data.
The practical utility of the method is demonstrated through an application investigating the effects of lactation on postpartum maternal weight. Using a small RCT and a larger observational dataset, the analysis reveals a modest positive effect of lactation on three-month postpartum weight but no significant effect at six months. While the OS data alone yielded similar conclusions, the combined approach provides greater confidence in the findings due to its increased robustness to potential biases. This application underscores the practical value of integrating RCT and OS data for generating scientifically sound and actionable causal estimates.
Optimizing Returns from Experimentation Programs by Timothy Sudijono, Simon Ejdemyr, Apoorva Lal, Martin Tingley https://arxiv.org/abs/2412.05508
Caption: Comparison of p-value vs. Optimal Experimentation Strategies
This paper challenges the conventional reliance on p-values in A/B testing and proposes a shift towards a return-aware framework for optimizing experimentation programs. The authors argue that traditional null hypothesis significance testing, while useful for assessing statistical significance, often falls short when the objective is to maximize the cumulative returns from experimentation, especially in resource-constrained environments. This work offers a valuable perspective on how organizations can move beyond simply identifying statistically significant effects and focus on maximizing the overall impact of their experimentation efforts.
The core of the proposed framework is the A/B Testing Problem, which formulates experimentation as a constrained optimization problem. The goal is to maximize the total expected returns, represented by:
Σᵢ ∈ ₛ E[u(Δᵢ)]
where Δᵢ represents the return associated with idea i, S is the subset of chosen ideas, and u is a utility function. This formulation explicitly incorporates the magnitude of the treatment effects and the opportunity costs associated with choosing one idea over another, factors often ignored in traditional p-value based approaches.
The paper demonstrates that the A/B Testing Problem can be efficiently solved using dynamic programming, regardless of the prior distribution of treatment effects (G) or the available resources. Furthermore, it establishes a connection between the optimal Bayesian solution and p-value based decision frameworks. By tuning the significance level (α) based on the prior G, organizations can achieve the optimal solution without abandoning their existing p-value based infrastructure. This finding offers a practical pathway for adopting the return-maximizing framework without requiring extensive changes to existing experimentation systems.
The application of this framework to real-world experimentation programs at Netflix reveals that traditional p-value driven practices are often suboptimal for maximizing returns. Under a costless assumption, the optimal strategy involves conducting a larger number of tests with smaller allocation sizes, a strategy referred to as "lean experimentation". Moreover, the implied optimal one-sided p-values are often significantly larger than the conventional 0.05 threshold, suggesting that current practices tend to be overly conservative. The paper also explores the impact of incorporating costs and risk aversion, demonstrating that the optimal p-value threshold and allocation strategy are sensitive to these factors.
The framework is further extended to address more complex scenarios, such as managing multiple experimentation programs with shared resources and handling mutually exclusive treatments. These extensions provide valuable tools for managers to make informed investment decisions across various initiatives. The paper concludes by highlighting the limitations of the framework and suggesting directions for future research, including the estimation of the prior distribution G and the development of sequential frameworks. By challenging the conventional focus on p-values and emphasizing a return-aware approach, this work contributes significantly to the ongoing evolution of experimentation practices in industry and academia.
This newsletter highlights a convergence of methodological advancements in statistical modeling and causal inference. From emulating clinical trials with observational data to optimizing experimentation programs and combining RCTs with observational studies, these papers offer valuable tools and insights for researchers and practitioners. The common thread is a focus on extracting meaningful information from complex data while addressing the inherent limitations of different data sources and analytical approaches. The emphasis on robustness, generalizability, and practical application makes these contributions particularly relevant for real-world decision-making in various domains, including healthcare, public health, and online platforms. The innovative methods and frameworks presented in these papers have the potential to significantly impact how we design studies, analyze data, and draw causal inferences, ultimately leading to more informed and effective interventions.