📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Recent developments indicate AI systems are approaching near-complete automation of engineering tasks in AI research. However, the automation of AI research itself remains uncertain, with some aspects potentially requiring human creativity and insight. This shift could reshape how AI development progresses.
Recent empirical evidence shows that AI systems are now capable of automating most core engineering tasks involved in AI research, with some benchmarks reaching near saturation. However, the automation of AI research itself—particularly the creative and hypothesis-driven aspects—remains less certain, leaving a residual role for human researchers. This development signals a potential shift in AI R&D practices and the future of technological innovation.
Multiple benchmarks tracking AI capabilities in core scientific and engineering skills have demonstrated rapid progress over the past 15-16 months. For example, the CORE-Bench, which tests research reproduction, has seen performance improve from 21.5% in September 2024 to 95.5% in December 2025, with the benchmark’s author stating it is ‘solved.’ Similarly, the MLE-Bench, assessing Kaggle competition performance, advanced from 16.9% to 64.4% in roughly 16 months, with the AI reaching mid-tier human-level performance.
These benchmarks indicate that AI can now reliably reproduce research experiments and perform competitive machine learning tasks, reducing the need for human intervention in these engineering phases. The progress across these independent metrics suggests that the bottleneck in AI development is shifting from engineering to the more elusive domain of research—hypothesis generation, creativity, and strategic insight—where AI remains less capable.
Thorsten Meyer, analyzing these trends, notes that while engineering tasks are nearing full automation, the residual challenge is understanding whether AI can independently conduct research at the same level, given the creative and strategic nature of research activities. The institutional response has been to recognize that the pace of automation may accelerate, possibly rendering some traditional research roles obsolete or fundamentally changing their nature.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

AI Tools for Finance and Accounting Professionals: Automate Tasks, Save Hours, Work Smarter
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

Reproducibility-First ML Experiments: A Practical Guide to Versioning, Tracking, and Scaling Your ML Workflows for Consistent Results
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

AI Workflow Tools for Researchers & Analysts: Automating Literature Reviews, Summaries, and Hypothesis Generation with ChatGPT, Claude, and Perplexity
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Ultimate CI/CD for Platform Engineering: Master DevOps Pipelines, GitOps, DevSecOps, Infrastructure as Code, Multi-Cloud Deployment, and AI-Driven Delivery Automation (English Edition)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of Near-Complete Engineering Automation
This trend suggests that AI could soon handle most of the engineering and implementation work involved in AI development, drastically reducing costs and accelerating innovation cycles. If research processes remain less automatable, human researchers might focus more on hypothesis formulation and strategic oversight, but the boundary between engineering and research is blurring. This could lead to a paradigm shift in how AI research is conducted, with implications for research institutions, industry, and the pace of technological progress.
Progress in AI Capabilities and Benchmark Saturation
Over the past two years, multiple independent benchmarks—such as CORE-Bench and MLE-Bench—have demonstrated rapid improvements in AI’s ability to perform core scientific and engineering tasks. The CORE-Bench, which measures research reproduction fidelity, has moved from 21.5% to nearly complete saturation at 95.5%, with the benchmark’s author declaring it ‘solved.’ Similarly, performance on Kaggle competitions has improved significantly, with AI systems reaching competitive levels with mid-tier human practitioners. These advances are part of a broader pattern of AI capabilities approaching measurement limits across various R&D skills.
Deep research into kernel design and infrastructure optimization further illustrates that AI is moving from experimental to production-grade capabilities, with models generating optimized GPU kernels and automating complex infrastructure tasks. This progress underscores a structural shift: engineering tasks are becoming fully automatable, while research remains a more complex, less predictable domain.
“The pattern across multiple benchmarks indicates that engineering tasks in AI research are nearing full automation, but research itself remains less automatable, at least for now.”
— Thorsten Meyer
Uncertainties Around AI Research Automation Potential
It remains unclear how much of AI research can be fully automated, especially the creative, hypothesis-driven aspects that require strategic insight. While engineering tasks are nearing automation, the residual role of human researchers in generating novel ideas and theories persists. The pace at which AI might overcome these challenges is still uncertain, and some experts question whether true autonomous research is achievable in the near future.
Next Steps in Monitoring AI R&D Automation Progress
Researchers and industry leaders will continue to track benchmark developments and explore new metrics for AI research capabilities. Expect further advancements in automation tools for research synthesis, hypothesis generation, and experimental design. Additionally, institutional and policy responses will likely focus on defining the evolving role of human researchers and preparing for potential shifts in the research ecosystem over the next 24-36 months.
Key Questions
What does near-complete automation of engineering tasks mean for AI development?
It suggests that most of the technical work involved in designing, testing, and deploying AI models can be handled by AI systems, potentially reducing costs and speeding up innovation cycles.
Why is research still considered residual despite engineering automation?
Because research involves creative, hypothesis-driven activities that are less structured and more complex for AI to perform independently, especially in generating novel ideas and strategic insights.
Are there risks in fully automating AI engineering tasks?
Yes, including over-reliance on AI systems, potential loss of human expertise, and challenges in ensuring AI-generated research maintains quality and safety standards.
What industries might be most affected by these developments?
AI research labs, tech companies, and academic institutions will experience shifts in workflows, with possible impacts on employment, research funding, and innovation timelines.
Source: ThorstenMeyerAI.com