Beyond Scaling Thesis In Biology: Why Instrumentation, Not Compute, Sets The Ceiling For AI Driven Biomedicine

Abstract

This article examines that assumption through the lens of control theory and information theory. It proposes a three loop framework for evaluating the rate of biomedical knowledge acquisition, distinguishing between improvements in data processing, experimental design, and physical measurement infrastructure. The analysis argues that the third loop, the instrumentation layer, determines the ceiling for the other two, and most of the current AI-biology efforts operate below that ceiling. The binding constraint on AI-driven biomedicine is not computation but observational. The progress in biomedicine depends on instruments capable of recording multi-scale biological systems in real time.

Introduction

Recent developments in AI-driven biomedicine have produced a striking paradox. An engineer with no formal biology training used machine learning to design a personalized mRNA cancer vaccine for a terminally ill dog, leveraging open-source models and genomic data [1]. Meanwhile, despite billions of dollars invested in longevity research and drug development, some of the most potent regenerative interventions, such as sleep, remain non-pharmacological and poorly understood at a mechanistic level. This disconnect suggests that the bottleneck in biomedicine may not lie where most investment is currently directed.

Literature review

The scaling thesis for AI in biology rests on a well-documented precedent. Kaplan et al. (2020) demonstrated that test loss in large language models falls as a predictable power law of compute and data, a finding that has since been extended to vision, code generation, and multimodal systems [3]. Foundation language models collapsed multiple specialist models from legal, drafting, speech recognition into a single representation learning problem. This has subsumed many pre-2021 modality specific AI companies in the process.

Stay Ahead of the Curve!

Don’t miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Subscribe Now

In structural biology, AlphaFold predicted protein structures with near-experimental accuracy. However, Collins et al. at MIT found that molecular docking simulations using AlphaFold structures performed little better than chance for predicting antibacterial mechanisms [3]. AlphaFold’s models are static and do not capture the protein dynamics critical to drug binding [3,4]. Accurate computation cannot substitute for dynamic, multi-scale biological information.

The pharmaceutical industry’s own trajectory reinforces this. Scannell et al. (2012) documented Eroom’s Law: the cost of developing a new drug has roughly doubled every nine years since 1950, despite advances in genomics, combinatorial chemistry, and computation [5]. This trend persists not because biology is unscalable, but because the experimental feedback loop between hypothesis, perturbation, and measurement has never been properly closed.

Technical analysis

Biomedicine as a control systems problem

Across every scale of biological organization, the same architecture recurs: sensing, processing, error minimization, action, and feedback. A cell senses extracellular signals, adjusts its gene expression, and emits signals that alter neighboring cells. Those neighbors generate tissue-level functions that shift the systemic signals the original cell receives. Biology is a single control system running nested feedback loops from molecules to the whole organism.

Every domain that humans have engineered as a closed-loop control system has become predictably improvable. Autopilots closed the sensing-actuation loop in flight. Semiconductor fabs closed the lithography-inspection loop, and Moore’s Law emerged. In language models, scaling laws emerged once the train-evaluate-iterate cycle was standardized. The pattern is consistent: standardize the cycle of perturb, measure, reason, repeat, and a scaling law follows.

Three loops for improving biomedical learning rate

The rate of scientific learning is bounded by the causal knowledge extractable per experimental cycle. Three hierarchical loops govern improvement:

Loop 1 — Signal processing. Better models extract more information from data that existing hardware already produces.

Loop 2 — Experimental design. Model-driven selection of which perturbations, timepoints, and cell types to measure next, maximizing information gain from existing instruments.

Loop 3 — Measurement infrastructure. New instruments or configurations that capture signals no current hardware can observe.

The hierarchy is strict. Loops 1 and 2 cannot extract cross-scale causal information from sensors that were never built to capture it. A model trained on dissociated single-cell data cannot learn tissue-level signaling dynamics regardless of its parameter count and those dynamics were destroyed during sample preparation.

Where the field stands today

The vast majority of AI-biology investment operates within Loops 1 and 2. Foundation models for genomics improve representation learning over existing datasets. Active learning frameworks optimize experiment selection. Robotic automation increases throughput of existing assays. These are genuine advances, but they optimize within the information ceiling set by current instrumentation

Future directions

Several emerging technologies point toward closing the instrument gap. Spatial multi-omics platforms combine transcriptomic, proteomic, and epigenomic readouts within intact tissue [6]. Organ-on-chip systems reconstitute tissue-level dynamics in controllable microenvironments. Three-dimensional in situ sequencing techniques extend spatial profiling into thick tissue blocks [7]. None yet achieve real-time, multi-scale recording across molecular, cellular, and tissue levels simultaneously.

The ultimate objective is self-improving observation tools: instruments whose configurations are continuously optimized by the models they feed, closing the loop between observation and instrument design.

References

[1] Conyngham, P. (2026). AI-designed personalized mRNA cancer vaccine for a dog. Case reported in multiple outlets, March 2026. Conyngham used ChatGPT, AlphaFold, Grok, and Gemini to analyze tumor DNA from his rescue dog Rosie; the UNSW RNA Institute (Thordarson lab) manufactured the resulting mRNA-lipid-nanoparticle vaccine. See: The Australian (13 March 2026); Fortune (15 March 2026), https://fortune.com/2026/03/15/australian-tech-entrepreneur-ai-cancer-vaccine-dog-rosie-unsw-mrna/; AFP/France24 (30 March 2026), https://www.france24.com/en/live-news/20260330-one-man-his-dog-and-chatgpt-australia-s-ai-vaccine-saga.

[2] Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. https://arxiv.org/abs/2001.08361

[3] Wong, F., Krishnan, A., Zheng, E. J., Stärk, H., Manson, A. L., Earl, A. M., Jaakkola, T., & Collins, J. J. (2022). Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Molecular Systems Biology, 18(9), e11081. https://doi.org/10.15252/msb.202211081

[4] Trafton, A. (2022, September 6). Analyzing the potential of AlphaFold in drug discovery. MIT News. https://news.mit.edu/2022/alphafold-potential-protein-drug-0906

[5] Scannell, J. W., Blanckley, A., Boldon, H., & Warrington, B. (2012). Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews Drug Discovery, 11(3), 191–200. https://doi.org/10.1038/nrd3681

[6] Vandereyken, K., Sifrim, A., Thienpont, B., & Voet, T. (2023). Methods and applications for single-cell and spatial multi-omics. Nature Reviews Genetics, 24(8), 494–515. https://doi.org/10.1038/s41576-023-00580-2

[7] Wang, X., Allen, W. E., Wright, M. A., Sylwestrak, E. L., Samusik, N., Vesuna, S., Evans, K., Liu, C., Ramakrishnan, C., Liu, J., Nolan, G. P., Bava, F.-A., & Deisseroth, K. (2018). Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science, 361(6400), eaat5691. https://doi.org/10.1126/science.aat5691

Featured image credit

Latest post

AI needs a strong data fabric to deliver business value

The Download: introducing the 10 Things That Matter in AI Right Now

Gateway API v1.5: Moving features to Stable

Beyond Scaling Thesis In Biology: Why Instrumentation, Not Compute, Sets The Ceiling For AI Driven Biomedicine

Multimodal Data Integration: Production Architectures for Healthcare AI

Nissan Hits Solid-state Battery Milestone Ahead Of 2028 EV Launch

GitHub pauses Copilot sign-ups as AI coding drives up compute demand

Deezer Reports AI Music Reaches 44% Of New Uploads

AI needs a strong data fabric to deliver business value

The Download: introducing the 10 Things That Matter in AI Right Now

Gateway API v1.5: Moving features to Stable

Behind the unraveling of Dan Crenshaw

AI needs a strong data fabric to deliver business value

The Download: introducing the 10 Things That Matter in AI Right Now

Gateway API v1.5: Moving features to Stable

Behind the unraveling of Dan Crenshaw

Latest post

Beyond Scaling Thesis In Biology: Why Instrumentation, Not Compute, Sets The Ceiling For AI Driven Biomedicine

Abstract

Introduction

Literature review

Stay Ahead of the Curve!

Technical analysis

Biomedicine as a control systems problem

Three loops for improving biomedical learning rate

Where the field stands today

Future directions

References

Related Posts