Building AI Neuroscience: From Atoms to Bits
Neuroscience is slow. How can we make it faster?
Neuroscience, driven by small-scale labs in universities, proceeds meticulously; it is not uncommon to see projects carried heroically by undaunted postdocs and graduate students for a decade. This research could provide the seeds for treating neurodegenerative disorders or for understanding our intelligence. How can we accelerate neuroscience?
If we could use AI scientist agents — systems that can read literature, generate hypotheses, read data, write analysis code, and design experiments — to study Brains and Behavior — either directly or compiled as atlases and digital twins—, we could potentially vastly accelerate neuroscience. Indeed, in Machines of Loving Grace, Dario Amodei described how advanced AI, acting as a country of geniuses in a datacenter, could make rapid advances towards building a science of intelligence and curing all neuropsychiatric disease. While these are lofty goals, the essay doesn’t tell us how we might reach them. Here, I paint a picture of how to build AI neuroscience and what funders should prioritize.
A working definition of AI scientists
AI neuroscientists are a special instance of AI scientist agents, which are already taking shape. The May 2026 issue of Nature featured three of these systems demonstrating their use for writing empirical software and testing hypotheses in the biomedical sciences.
In broad strokes, these systems are LLMs with agentic harnesses that manage context, memory, and access to skills. They are architected similarly to coding agents, like Claude Code or OpenAI Codex, with the caveat that the end product is not a piece of software or website, but rather insight derived from an analysis.
Several groups are building AI scientists across the sciences, and more specialized agents for the biosciences, such as the ones from FutureHouse. We can already see glimpses of the relevance of these agents to neuroscience. For example, Aygün et al. (2026) demonstrate an auto-research agent that can perform state-of-the-art neural activity prediction, a fundamental building block towards a science of intelligence. AI scientists iteratively designing better proteins and treatments is already part of the OpenAI Foundation’s theory of change for how AI can advance Alzheimer’s disease research.
Currently, AI scientists have limited autonomy, though we expect them to become more capable as LLM capabilities grow, similar to what we’ve seen in coding agents. A fundamental bottleneck will remain, however, in building skills that are highly specialized to neuroscience. We expect this will require data and software engineering that are beyond the scope of a single lab to match the throughput of the base model.
Studying Brains and Behavior with AI scientists
While AI scientists are relatively generic across scientific disciplines, they differ in the subjects that they interrogate. The subjects of conventional and AI neuroscientists are Brains and Behavior. Unlike verifiable domains like code and math, where AI agents can test hypotheses rapidly and cheaply, running experiments on brains and behavior is expensive.
To make progress, we have to move as much of our study of brains and behavior as possible from the world of atoms to the world of bits: collecting atlases, building digital twins, and closing the loop with hypothesis-driven experiments on real subjects.
The shape of the subject in AI neuroscience
Atlases

For a static dataset to be useful for an AI neuroscientist, it has to rise to the level of an atlas: a high-coverage, high-entropy map of the brain that can answer many more questions than could have been anticipated by the original experiment designers. The Natural Scenes Dataset, the Allen Brain Cell Atlas, and FlyWire are recent examples of this idea across fMRI, transcriptomics, and connectomics, respectively. These datasets are collected with completeness in mind, containing a representative sample of their respective domain (or in FlyWire’s case, the complete domain). They are distributed according to FAIR principles: highly annotated, distributed on open platforms, with sample code and programmatic access.
Importantly, atlases must be generated in a virtuous cycle between experimental and computational teams: the best atlases are made in conjunction with computational teams that have clear ideas of how they will use the data. Tom Kalil, one of the architects of the BRAIN Initiative, says that researchers may not be able to immediately identify the right objective function that will result in an immediately useful model. “What is even more valuable than a static dataset is a data generation capability”. This is how we engineer serendipity.
It’s often the case that building atlases is simply impossible without the right neurotechnology, and it makes sense to co-design the atlas and the technology to acquire that atlas. The Enigma project at Stanford, which is building the largest functional atlas ever collected, was empowered by the creation of next-generation Neuropixels, increasing throughput by a factor 4. A holistic AI strategy should optimize the creation of new tools vs. using existing tools to build atlases.
Digital Twins
Static datasets, by themselves, do not allow us to run what-if experiments. To do that, we have to compile the data into digital twins that allow us to predict how different inputs and conditions will nudge the system. In neuroscience, this digital twin is typically either a neural net trained directly to imitate neural data, or a biophysical simulation anchored by data. The key figure of merit here is predictive validity: the correlation between the twins’ predictions on a value of interest (e.g. the effect of a drug, the firing rate of a neuron in response to a stimulus) and the same phenomenon in the real organism, ideally measured out-of-distribution.
While cheap experimental proxies allow us to sift through countless interventions, they are only as good as their predictive validity. In drug discovery, for example, Scannell et al. (2022) argue that increasing predictive validity (Spearman’s ρ, a value between 0 and 1) by 0.1 beats scanning orders of magnitudes more compounds. GSK screened 10 million compounds in vitro in the 1990s and found zero antibiotics; Domagk, working with rudimentary tools in the 1930s, tested a few hundred dyes in mice and found Prontosil. Mice are simply a better proxy of antibiotic response in humans than isolated cells.
We should strive for models with high predictive validity. This can be achieved by fitting models with a large amount of high-entropy data that include causal manipulations. For visual neuroscience, that means collecting large amounts of data in response to diverse visual inputs that span naturalistic vision; for drug discovery, testing broad classes of drugs; for connectomics, collecting calibration data for compiling ultrastructure into ever more precise, valid simulations (Kording et al. 2026).
Real subjects for validation
The best material model for a cat, is another, and preferably, the same cat
—Rosenblueth & Wiener
Replacing conventional neuroscientists with tireless AI neuroscientists running in the cloud won’t make conventional experiments on conventional subjects faster. Indeed, the bottleneck will move from data analysis to data collection. To see real speedups in running real subjects, we foresee two paths:
The AI neuroscientist comes up with better, more discriminative experiments to run on real subjects. That is a high bar to clear, as neuroscientists integrate implicit and procedural knowledge about running lab experiments, which are often not reflected in the literature.
The AI neuroscientist runs experiments faster, e.g. through lab automation. We see this as feasible in the short term in bio-heavy neuroscience, and also for cognitive science that can be run through crowdsourcing platforms. There is clearly a gap, however, in the automation of systems neuroscience, which may ultimately require breakthroughs in soft-body robotics.
For the next few years, then, running hypothesis-driven experiments on real subjects will remain a bottleneck. We should focus these precious resources on validating predictions made by an AI neuroscientist based on atlases and digital twins. This will enable the creation of better models in a virtuous loop.
The right shape of projects to fund
Given this model for how AI neuroscience can advance over the next few years, funders should direct their attention to high-leverage points in the value chain:
Data that rise to the level of atlases. Atlases will rise in value as AI agents automate analyses. That means we’ll need more connectomes, transcriptomes, protein annotations, and functional recordings, especially in species that are phylogenetically closer to humans. These atlases should be complete within their measurement domain, high entropy, highly annotated with metadata, and multimodal to make it easier to join with other atlases.
Better neurotechnology for building atlases. We are still limited by the paucity of tools in neuroscience. Ideally, we should be able to record spike trains of every neuron in the brain; the synaptic weights of an entire dendritic arbor; and neuropeptide and neuromodulator identity and concentration. Funders should focus on scalable technologies that make building atlases cheaper and faster, and on projects that co-design neurotechnologies and atlases.
Better digital twins. Where applicable, atlas data should be distilled into digital twins to answer what-if questions. This makes data more accessible; much like LLMs compress trillions of tokens, digital twins can distill petabytes of data into models that can run, ideally, on a single GPU. Making evaluation cheap means industrial-scale training runs, and projects should allocate compute accordingly, taking into account improvements in training recipes.
Better benchmarks. Critically, a good project should strive to quantify and refine KPIs for predictive validity in humans. In addition to first-party benchmarks, third-party evaluations and open competitions remain critically underfunded. A good benchmark can create new games to catalyze neuroscience, much as the ImageNet competition catalyzed modern deep learning.
Planning for the peri-AGI future means factoring in a broad range of outcomes, as different sections of the jagged frontier will fall in an unpredictable sequence. We’ll thus see projects that co-exist along a broad range of autonomy classes, from AI scientists primarily operating on static datasets and relying heavily on human experimenters in the loop, to fully autonomous research where automated data acquisition capabilities exist today.
Above all, the best projects have a clear theory of what they unblock: a specific reason that answering their question removes a constraint for everyone else, the way ImageNet unblocked deep learning or FlyWire unblocked fly neuroscience. With capabilities changing month-to-month, this long-term vision, guided by what some call research taste, can mean the difference between a good project and a transformative one.
How do we make this happen?
AI neuroscience holds great promise, but we must not fall prey to the fallacy that sprinkling AI on top of the current system will address systemic bottlenecks. For a country of geniuses in a datacenter to have a shot on goal of compressing a century of neuroscience progress in a decade—a tall order!—we must build tools, datasets and projects today. This is especially true for our goal of building a deep science of intelligence to align AI: the most critical time period for alignment is in the next few years. To build AI neuroscience—distilling atlases collected using new neurotechnology in digital twins to ask what-if questions about the origins of intelligence and pro-sociality that are then validated on real subjects—will require building a coalition of forward-thinking funders.
The third wave of American philanthropy could be the catalyst. Driven by IPOs across frontier AI labs, it could bring up to ~$100B in new yearly philanthropic capital from principals who are believers in both the promise and the dangers of AGI. Neuroscience could be vastly accelerated, and countless QALYs saved, by allocating these funds to A(G)I neuroscience.
Capital should be allocated to focused and frontier research organizations (FROs): nimble, mission-driven groups that create critical tools and datasets. One of our most celebrated institutions in neuroscience, the Allen Brain Institute, started out as a proto-FRO, tasked with building what became the Allen Mouse Brain Atlas. Projects that we’ve supported, including E11 Bio, Forest Neurotech, the Enigma Project, and Neuropixels 3.0, are building new tools for connectomics, functional ultrasound imaging, and large-scale electrophysiological recordings to make AI neuroscience a reality.
At the Amaranth Foundation, we fund ambitious, end-to-end projects in neuroscience. We also help other philanthropists and venture capitalists allocate capital towards the highest leverage projects, stringently evaluating existing and future projects with subject matter experts to separate fact from fiction. We’re always on the lookout for other funders to join us in funding the next wave of neuroscience. Let’s get to work.
Many thanks to Sophia Sanborn for feedback and review.




