The bust portion of a gilded 15-foot statue of President Donald Trump commissioned by cryptocurrency investors to curry his favor.Eli Hiller/AFP/Getty
This story was originally published by Undark and is reproduced here as part of the Climate Desk collaboration.
Federal agencies have been branding some of their research and policy work as “gold standard science,” a trend that gained new force after an executive order on the term was issued in May 2025. The phrase now appears in speeches and guidance documents from agencies such as the National Science Foundation and the National Institutes of Health. It shows up in social media posts intended to signal credibility, rigor, and authority. The message is clear: This is science you can trust.
The intention may be to reassure the public, but the framing is misleading. The executive order outlines principles that are broadly consistent with good scientific practice, such as transparency, reproducibility, and peer review. These are not controversial. The problem arises in how those principles are translated into a simplified label that suggests a single hierarchy of evidence.
Treating scientific outputs as if they were competing on a single quality scale misunderstands their purpose.
Science does not work in the way that an easy phrase like “gold standard” suggests. From my experience applying scientific findings in community-based settings, I have seen the risk in turning a methodological metaphor into a brand and how it can confuse the public about how evidence is actually produced, evaluated, and used.
In scientific practice, “gold standard” has never meant universally best. It has always been conditional. Researchers have used the phrase to describe the most appropriate method for answering a very specific type of question, under particular assumptions and constraints. Outside of that narrow context, the phrase loses its meaning.
One of the most common examples comes from medicine. Randomized controlled trials are often described as the gold standard for determining whether a drug or clinical intervention causes a particular outcome. The reason is straightforward. Randomization helps isolate cause and effect by reducing bias and confounding. When the question is whether treatment A is superior to treatment B under controlled conditions, randomized trials can be extraordinarily powerful.
But even in medicine, randomized trials are not always possible, ethical, or sufficient. They may exclude populations who most need treatment. They may fail to capture long-term effects. They may tell us whether something can work in limited settings, but not whether it will work in real-world applications.
That is why medicine relies on many forms of evidence, including observational studies, post-market surveillance, qualitative research, and case reports. None of these are inherently inferior. They answer different questions.
The executive order itself does not mandate a single methodological approach. However, its implementation in agency language risks being interpreted as privileging certain methods over others, regardless of context. The problem arises because the logic of “gold standard” is now being stretched beyond its original purpose. Presenting “gold standard science” as a general category, rather than a context-dependent judgment, implies that some kinds of science are categorically better than others. That implication does not hold up under even modest scrutiny.
Science begins with questions. What are we trying to understand? What decisions need to be informed? What constraints exist: ethical, practical, or temporal? Only after those questions are clearly defined can methods be responsibly selected.
The language of “gold standard science” can make it harder to communicate uncertainty honestly.
Different questions demand different approaches. If the question is whether a new medication lowers blood pressure under controlled conditions, a randomized trial may be appropriate. If the question is how a public health policy affects different communities over time, randomized trials may be impossible or misleading. In that case, natural experiments, administrative data analysis, community-based research, or qualitative methods may provide more useful insight. If the question is how an intervention is implemented in practice, mixed methods (those that use multiple research tools like surveys, interviews, and observations) may be essential.
None of these approaches is automatically better or worse than the others. Their value depends on whether they are suited to the question at hand.
This distinction matters because different questions yield different kinds of answers. Some answers estimate causal effects. Others describe patterns, contexts, or mechanisms. Some inform immediate decisions. Others shape long-term understanding. Treating these outputs as if they were competing on a single quality scale misunderstands their purpose.
When agencies promote a single “gold standard” label, they flatten this diversity. They encourage the view that evidence can be classified as approved or unapproved, rather than evaluated on the basis of relevance, limitations, and uncertainty. That may simplify communication, but it does so at the cost of accuracy.
Branding science in this way also risks undermining scientific literacy. The public already struggles with the idea that evidence can be strong without being definitive, useful without being conclusive. When scientific authority is wrapped in logos and slogans, it reinforces the false expectation that good science produces clear, final answers. When those answers later evolve, as science always does, trust erodes.
Ironically, the language of “gold standard science” can make it harder to communicate uncertainty honestly. If something has been labeled as the gold standard, acknowledging limits or gaps can sound like backtracking rather than transparency. Scientists know that uncertainty is a feature of good research, not a bug.
There is also a policy risk that should not be ignored. Once a single standard is named and institutionalized, it can be used to exclude evidence that does not conform to it, even when that evidence is appropriate to the question at hand. Research can be dismissed not because it is unsound, but because it does not fit a preferred methodological mold. Over time, this narrows the range of questions considered legitimate in the first place.
None of this is an argument against rigor, transparency, or accountability. Those values are central to scientific practice and public trust. But rigor is not a checklist, and credibility is not a logo. They emerge from careful alignment between questions, methods, and interpretation.
If we want science to inform policy responsibly, we need to be precise in how we talk about it. That means explaining why certain methods are appropriate in certain contexts, being honest about what different kinds of evidence can and cannot tell us, and resisting language that suggests a one-size-fits-all hierarchy of truth.
There is no such thing as gold standard science.
There is only science that is well matched to its questions, conducted transparently, and interpreted with care. Anything else may look authoritative, but it ultimately obscures how knowledge is actually made and how it should be used. They are selling pyrite.
