Why Longevity Trials Struggle With Endpoints — and What Researchers Are Using Instead

The most seductive endpoint in longevity science is also the least practical one: living longer. In mice, that is manageable. In humans, it is close to impossible as a primary trial endpoint for most interventions. Human lifespans are long, causes of death are heterogeneous, and even if a therapy truly slows biological aging, proving that through all-cause mortality would usually require very large studies running for many years. That is why serious geroscience has gradually moved away from asking whether a treatment simply extends life, and toward a harder but more realistic question: can it delay the cluster of diseases, disabilities, and functional losses that aging makes more likely?

That shift sounds straightforward, but it creates a technical problem at the center of the field. Traditional drug development usually targets a single disease with a relatively clear endpoint — fewer heart attacks, smaller tumors, lower viral load, improved symptoms. Geroscience is different. Its premise is that targeting the biology of aging could reduce risk across many age-related diseases at once, while also preserving function and resilience. That promise is exactly what makes the field so interesting. It is also what makes endpoint selection so difficult.

Table of Contents

The problem starts with what aging is — and what it isn’t

Aging is the strongest risk factor for most chronic diseases, frailty, disability, cognitive impairment, and death. But it is not handled in medicine the way diabetes, heart failure, or osteoporosis are handled. In human trials, “aging” itself is not yet a routine regulatory indication, which means researchers cannot simply run a trial on “slowing aging” and expect a standard approval pathway to exist. A 2025 review in The Journal of Clinical Investigation put it plainly: translating geroscience into humans requires new clinical trial paradigms, new biomarker and surrogate-marker development, and novel endpoints that may include diseases, geriatric syndromes, or functional outcomes for which the FDA does not provide a straightforward indication.

That leaves the field stuck between two bad options. If a trial uses mortality as the endpoint, it becomes unwieldy, slow, and expensive. If it uses a narrow disease endpoint, it may no longer be testing the geroscience hypothesis at all; it may simply be testing whether a drug helps one disease. A 2022 GeroScience paper on endpoints laid this out clearly: total mortality has high face validity but is too rare and slow for most trials, while disease-specific endpoints may lead to an FDA-approved indication without necessarily showing that the intervention meaningfully affected aging biology more broadly.

Why biomarkers do not solve the problem — at least not yet

This is the point where many longevity discussions reach for biomarkers as if they are an easy escape hatch. If aging takes too long to observe clinically, why not measure biological age directly and use that as the endpoint? The answer is that regulators and trialists need more than an interesting biological signal. They need evidence that the measure predicts a real clinical benefit. The FDA defines a surrogate endpoint as a substitute for a direct measure of how a patient feels, functions, or survives, and stresses that validated surrogates require strong evidence that changing the surrogate predicts a specific clinical benefit. Candidate biomarkers are not enough on their own.

That matters because biomarkers of aging, however promising, are still mostly in the candidate or exploratory stage. The 2023 Cell consensus framework on biomarkers of aging argued that such measures are critically needed for human longevity trials because lifespan itself is impractical, and because healthspan may improve even without a dramatic impact on lifespan. But the same paper was careful: many current biomarkers were initially built to predict chronological age rather than intervention-responsive aging biology, and their usefulness depends on whether they predict morbidity, functional decline, and mortality better than age alone.

Black-and-white editorial illustration of a clinical trial framework, with grayscale pathway lines linking a central trial node to icons representing frailty, disease, function, and biomarkers, while a few muted red accents highlight the challenge of choosing meaningful endpoints in longevity research.

A 2025 Delphi-style consensus paper on geroscience trial endpoints sharpened the point even further. Experts agreed that outcome measures should include multiple health dimensions — age-related disease, function, and patient-reported outcomes tailored to the study population — and they explicitly felt that blood-based biomarkers were unlikely to be accepted as primary endpoints of efficacy trials at present. In other words, biomarkers are important, but the field itself does not yet treat them as regulatory-grade replacements for clinically meaningful outcomes.

So what are researchers using instead?

The answer, increasingly, is a menu of imperfect substitutes, each trying to capture a different piece of aging-related decline without waiting decades for mortality data. The field is not converging on one endpoint. It is converging on a toolbox.

1) Composite multimorbidity endpoints

The clearest geroscience answer to the endpoint problem is the composite endpoint: instead of waiting for death or focusing on one disease, a trial counts the occurrence of any one of several major age-related outcomes. This is the logic behind the widely discussed TAME design. In published descriptions of the study, TAME’s primary endpoint is the incidence of any one of several age-related chronic diseases — myocardial infarction, congestive heart failure, stroke, most cancers, mild cognitive impairment or dementia, and death — with additional outcomes focused on physical and cognitive function and common geriatric syndromes. Its architects explicitly wrote that the primary outcome was shaped with FDA input to help create aging-relevant indications that could spur future drug development.

This approach has obvious appeal. It increases event rates compared with any one disease alone and aligns more closely with the geroscience hypothesis that one intervention might delay several conditions in parallel. But it also comes with a built-in weakness: a trial can “win” on a composite outcome because it moved only one of the included components. That means a positive result does not automatically prove a broad anti-aging effect. The field knows this. A 2023 geroscience task-force paper warned that composite outcomes are not without limitations because efficacy on a single item can drive an apparently broader indication.

2) Functional decline and geriatric syndromes

Because older adults often care as much about independence and capability as about diagnosis counts, many researchers argue that function should sit closer to the center of geroscience endpoints. The 2022 GeroScience review listed disability-free survival, frailty or deficit indices, frailty phenotype, and advancing multimorbidity as plausible clinical endpoints. It also stressed that outcomes should be persuasive not only to researchers, but to regulators, insurers, clinicians, and patients — which is one reason the familiar FDA logic of how a person feels, functions, or survives keeps resurfacing.

This is where measures such as gait speed, physical performance, frailty progression, cognitive decline, and loss of independence come in. The 2025 JCI review noted that heterogeneous older adults at elevated risk of age-related disease, geriatric syndromes, or physical or cognitive decline may be especially useful trial populations because these outcomes occur sooner than mortality and are deeply relevant to healthspan. It also highlighted gait speed as a highly integrative measure that predicts life expectancy and disability risk in observational work.

The attraction of function-based endpoints is that they are more meaningful to real life than an abstract biomarker shift. The problem is that they can be noisy, multidetermined, and sensitive to study design. Frailty measures are not fully standardized across all contexts, disability status can fluctuate, and change over a relatively short trial may be modest. Still, if the ambition of longevity science is to extend healthy years rather than merely biological persistence, it is hard to avoid the conclusion that function has to remain part of the picture.

3) Resilience and response-to-stress endpoints

Another increasingly important approach is to measure not gradual decline but resilience: how well an older person responds to or recovers from an acute stressor. This idea has been present in geroscience for years. A 2016 NIH-backed design paper argued that interventions targeting fundamental aging processes could be tested not only through long-term healthspan outcomes but also through resilience scenarios, such as recovery after myocardial infarction, chemotherapy, surgery, or other acute physiological stress.

One concrete example is immune resilience, especially vaccine response. In a 2014 human trial, low-dose RAD001, an mTOR inhibitor, improved influenza-vaccine response in older adults by about 20% at tolerated doses and reduced the proportion of T cells expressing PD-1, a marker associated with age-related immune dysfunction. Later work in the field has continued to treat vaccine response as a plausible geroscience-style endpoint because it captures a clinically relevant age-linked decline over a short time horizon. The 2025 JCI review explicitly cited vaccine responses as provocative tests that could serve as useful outcomes in larger geroscience trials.

Resilience endpoints are attractive because they are quicker and more mechanistically interpretable than waiting for multimorbidity or death. But they also risk being too narrow unless they are clearly linked back to broader aging biology. A better vaccine response is encouraging; it is not automatically proof of slower aging across the organism.

4) Biomarkers as secondary, exploratory, or proof-of-concept endpoints

Even though biomarkers are not yet widely accepted as primary efficacy endpoints in geroscience trials, researchers are still using them aggressively — just usually not as the sole basis of success. Instead, they are used to enrich populations, prioritize interventions, monitor response, and generate validation data in parallel with more clinically legible outcomes. That is increasingly the field’s working compromise. A 2025 Nature Aging recommendations paper argued that biomarkers of aging could transform geroscience trials through participant stratification, intervention prioritization, and response monitoring, while calling for standardized data-collection practices so these markers can be benchmarked and validated across studies.

The proof-of-concept literature shows what this looks like in practice. In the MILES study, metformin-treated older adults underwent tissue biopsies and transcriptomic analysis rather than being evaluated through mortality or multimorbidity endpoints. The investigators reported that metformin influenced metabolic and nonmetabolic pathways linked to aging, including mTORC1, MYC, TNF, TGFβ1, mitochondrial fatty-acid oxidation, and DNA repair, and concluded that the data could inform development of biomarkers for metformin and potentially other drugs acting on aging pathways. This is not a registration-style efficacy trial. It is a mechanistic signal-generating trial.

The DO-HEALTH biological-age analysis offers another version of the same compromise. In a post hoc analysis of 777 older adults over three years, omega-3 treatment alone slowed several DNA-methylation clocks, and the combination of omega-3, vitamin D, and exercise showed additive benefit on PhenoAge, with effect sizes corresponding to roughly 2.9 to 3.8 months over three years. That is interesting and potentially important, but it is still best interpreted as a signal — especially since the study authors themselves framed it as a protective effect across clocks rather than a definitive proof that aging had been clinically slowed in a regulatory sense.

What this means in practice

The practical lesson is that longevity trials do not fail because researchers are unimaginative. They struggle because the field is trying to study something medicine has historically sliced into separate diseases, while regulators still require endpoints that correspond to meaningful benefit. That tension is not going away soon. A sensible trial today often ends up using several layers at once: a clinically meaningful endpoint, some measure of function or frailty, and a biomarker package that may help explain mechanism and support future validation.

That also explains why there is so much diversity in current trial design. Some studies aim at multimorbidity. Some target frailty, mobility, or cognitive decline. Some use disease-specific indications with a geroscience rationale. Some lean on vaccine response or recovery from acute stress as resilience measures. And many smaller proof-of-concept studies use tissue, blood, or methylation markers to show that an intervention touched aging-linked pathways even if it was never designed to prove a clean clinical aging effect on its own.

The real answer

So why do longevity trials struggle with endpoints? Because the cleanest endpoint — longer life — is usually unusable, while the most convenient alternatives — biomarkers — are not yet validated enough to carry the full evidentiary burden. The field is filling that gap with composite disease outcomes, frailty and function measures, resilience tests, and biomarker-heavy proof-of-concept work. None of these is perfect. But together they are beginning to look less like a workaround and more like the early architecture of how aging may eventually be tested in humans.

And that may be the most important point. The endpoint problem is not a side issue in longevity science. It is the field’s central translation problem. Solve that, and the rest of geroscience starts to look much more like medicine.