the reality of AI in drug discovery – Part 2

Can automation and AI finally make science run at the speed of thought? Eric Ma shares how disciplined systems, not new models, will drive the next wave of discovery.

A close-up of blue pharmaceutical capsules surrounded by digital hexagon icons representing medical and laboratory technologies, symbolising modern drug discovery and data-driven research.

In the first part of our conversation, Eric Ma exposed the hard truths behind AI’s struggles in drug discovery, from flawed historical data to systemic gaps in statistical discipline. Here in Part 2, we shift from problems to progress and explore how automation, LLMs and better digital foundations can unlock the pace and precision modern science demands.

Automation: the unifying trend

When asked about current trends, Ma distills everything down to a single word: automation. Be it robotic assay execution, pooled screening approaches or AI-assisted literature searches, automation’s removal of routine manual steps is making research organisations more nimble.

In five years, I would expect lab scientists to be designing their assays with robots or as pooled screens.

“In five years, I would expect lab scientists to be designing their assays with robots or as pooled screens,” Ma predicts. For literature review, AI agents will help scientists dig deep into publications. But Ma is careful to add a crucial caveat, hopeful that “in a few years, enough people understand how AI can generate slop and can steer AI to not generate slop.”

This isn’t blind optimism – it’s pragmatic recognition that tools are only as good as the users wielding them.

LLMs in scientific workflows: practical applications

For Ma personally, AI-assisted coding has significantly impacted his daily work. Large language models (LLMs) help him tackle challenging problems more easily by lowering the barrier to entry. “If I can get an LLM to help me translate equations into plain language first and also translate into PyMC code, that helps me understand the underlying math behind the scenes,” he explains.

But the real power of LLMs in scientific workflows extends beyond code generation. Ma describes using them to synthesise context from multiple sources – PowerPoints archived in obscure SharePoint locations, Confluence pages, interview transcripts, etc. He has become known for recording calls – “not to be creepy, but just so I can remember stuff.” These transcripts become source material for understanding project goals and establishing guardrails.

“Without guardrails, what rabbit hole is the system going to drill into?” Ma asks rhetorically. Context isn’t just about truth – it’s about constraining the solution space to what is truly relevant and acceptable.

Vector databases, knowledge graphs and the traceability question

When it comes to data management for LLM applications, Ma takes a refreshingly pragmatic stance. While some practitioners advocate strongly for vector databases or elaborate knowledge graph schemas, Ma sees these as tools that may or may not be appropriate, depending on the specific application.

Vector databases just give you a fast, convenient way to retrieve things without writing your own grep tool or find tool.

“Vector databases just give you a fast, convenient way to retrieve things without writing your own grep tool or find tool,” he notes. Though he’s quick to add: “You’re always going to need a bit of experimentation if you’re really building a serious application.”

Regarding knowledge graphs specifically, Ma is even more cautious. While acknowledging proven use cases, he points out that “it takes a lot of effort to think through and design the schema of a knowledge graph to fit the narrow application it’s intended for.” Inevitably, these graphs get repurposed for uses where the schema wasn’t fit for purpose. Any schema represents a reduction in semantic possibilities compared to plain text with references – a “poor man’s knowledge graph” of markdown files referencing other markdown files can often suffice.

The cost of creating useful knowledge graphs in life sciences is prohibitively high for most organisations. “It’s really impractical to have systems that depend on truth, because you’re not going to get to it,” Ma states bluntly.

Advice for small biotechs: start with vision

When asked how small biotechs without Moderna’s resources should approach ML and AI, Ma’s advice is structural rather than technical: “Make sure one of your co-founders has a digital technical vision.”

Make sure one of your co-founders has a digital technical vision.

If founders don’t understand the importance of disciplined statistical and digital systems from day one, it won’t get baked in. They need not build these systems themselves, but they must know enough to bring in the right consultants and set the right cultural tone.

“Either find a co-founder or find your first few hires who are going to bring in a team of consultants who can help shape that culture,” Ma advises.

This cuts to the heart of a broader misconception in the industry: the artificial distinction between ‘tech bio’ and ‘biotech’ misses the point entirely. As Ma notes, these categories are “another attempt at constructing a knowledge graph of the industry” – an oversimplification that obscures the reality that technology must be integrated from day one, at the systems level.

Conclusion: honest progress over hype

Throughout our conversation, Ma consistently returns to reality checks on AI’s capabilities in drug discovery. The economics don’t always work out. Historical data is messier than we’d like to admit; knowledge graphs are expensive to build and maintain; LLMs can generate “slop” if not properly directed.

Yet despite – or perhaps because of – this unflinching honesty, Ma remains optimistic about automation’s trajectory. The key is matching the right tools to the right problems, maintaining statistical discipline at the systems level and recognising that there are no shortcuts to building proper data infrastructure.

Making science run at the speed of thought is not about deploying the latest AI model, it’s about building systems with integrity from the ground up – systems that trace every decision, document every parameter and maintain coherence as they scale. Only then can machine learning, automation and AI assistants deliver on their promise to accelerate discovery.

The revolution, when it comes, will be built on boring things like metadata management and workflow orchestration. That’s the honest truth.

If you missed it, read Part 1 to learn about the economic and data challenges that shape the reality of AI in drug discovery.

Meet the expert

Eric Ma – Senior Principal Data Scientist, Moderna

As Senior Principal Data Scientist at Moderna, Eric leads the Data Science and Artificial Intelligence (Research) team to accelerate science to the speed of thought. Before joining Moderna, he worked at the Novartis Institutes for Biomedical Research, conducting biomedical data science research with a focus on applying Bayesian statistical methods to support the discovery of new medicines for patients. Earlier in his career, he was an Insight Health Data Fellow in the summer of 2017 and completed his doctoral thesis in the Department of Biological Engineering at the Massachusetts Institute of Technology (MIT) in the spring of the same year.

Eric is also an open-source software developer and has led the development of pyjanitor, a clean API for data cleaning in Python, and nxviz, a visualisation package for NetworkX. He is a core developer for both NetworkX and PyMC. In addition, he contributes to the wider data science community through coding, blogging, teaching and writing.

Source link

Automation: the unifying trend

LLMs in scientific workflows: practical applications

Vector databases, knowledge graphs and the traceability question

Advice for small biotechs: start with vision

Conclusion: honest progress over hype

Leave a Reply Cancel reply