People testing out AI tools for professional, rigorous research tasks. As well as designers and engineers thinking about responsible and transparent AI interface design.
Deep Research is having a bit of a moment. “Deep Research” seems to be the new consensus name for mildly agentic, multi-step workflows that use web search and RAG to create long-form research reports.
Google Gemini was the first to create a Deep Research feature back in December 2024. They presented it as an agentic workflow that “explores complex topics on your behalf and provides you with findings in a comprehensive, easy-to-read report”.
The base flow is:
[ Pics of gemini deep research ]
I’m calling this “mildly agentic” because agentic implies a model making its own decisions about what task to do next based on the current context. But the choice of actions here is pretty limited and the workflow is ultimately linear and predictable.
This release didn’t get much fan fare, possible because we all chronically underestimate Google Gemini. It wasn’t until the hot, popular boy OpenAI released their own Deep Research feature on February 2nd, 2025 that people began to pay attention.
[ pics of OpenAI deep research ]
OpenAI’s release was quickly followed by Perplexity’s Deep Research feature twelve days later on February 14th.
[ pics of OpenAI and perplexity deep research ]
I’ve personally been thrilled to see all these developments, since we began exploring exactly this kind of workflow at Elicit back in late 2023. For context, I was the sole designer on Elicit from 2022-2024. While I’m no longer with the team, I’m still a huge fan and care a lot about the problem space and mission. Elicit was already focused on leveraging language models to help researchers to do systematic literature review ; a very rigorous type of research report that involves painstakingly reviewing hundreds or thousands of previous studies on a topic and summarising the state of the current literature. It traditionally can take months or even years to do one.
Naturally, we’d often speculate on when models might be able to do a fully automated systematic review. And what the stepping stones were between our current product where humans do the majority of the fact-checking and sense-making work, and a faster version where models took on more reasoning tasks without compromising on quality and accuracy.
And not to be left out, the Elicit team shipped the final version of that work on Feb 14th. Though they’ve called it “research reports.” And competitor SciSpace released a feature called “Deep Review” a few days later.
Deep Notes on the Deep Researchers
Google Gemini
While Gemini was the first out the gate, their implementation doesn’t strike me as an attempt to serve serious researchers. The product presumption of something called “deep research” is that it’s doing deep and rigorous research for you. But we hopefully all know we’re still at the stage where language models aren’t capable of doing deep, rigorous research autonomously. Their accuracy and reliability rates simply aren’t high enough. So it’s slightly daft and almost irresponsible to design this as if they are. Instead, we still need to design structures around these models that enable lots of human collaboration and sense checking.
I do like that they tell you the research plan ahead of time and ask you if you’d like to edit it. Good human-in-the-loop sense-checking and collaboration.
[ pic of Gemini research plan ]
But that high-level plan is the only piece of methodology we see. The final result is presented as a complete and final document, ready to be shared to exported to Google Docs. There’s no visible reasoning in the final output so it’s difficult to understand how Gemini “wrote” this report, or for me as a user to check its reasoning.
Gemini also doesn’t surface the sources by default and makes it inconvenient to check which claim aligns with which source. You have to open a hidden dropdown section. And even then, the reference numbers aren’t interactive elements so you can’t quickly jump to the supposed source. Which implicitly suggests they don’t think the reader cares that much about checking sources. Or at least they’re not trying to encourage or enable it.
[ pics of checking sources in Gemini ]
These flaws might just be first mover disadvantage. Or it might be that the product managers who scoped this feature intended it to be used by laypeople doing low stakes research. In which case I don’t know that “deep research” is the right way to frame this feature, but perhaps the marketing department made that call.
You have to sign up for Gemini Advanced to access it, but they offer a 1 month free trial if you just want to poke around.
OpenAI
Since I’m not about to shell out $200 just to test drive a feature, I haven’t used OpenAI’s version.
[ But here are YouTube video demos? ]
Perplexity
Perplexity’s implementation has some lovely details to it.
Elicit
Research Reports
SciSpace
Deep Review