Building Lodestone: A Critical Thinking Tool

Assumed Audience

Designers and engineers building AI products who are interested in making models more transparent, practising responsible interface design, and enhancing our critical thinking skills.

We’re currently being told that AI is going to do a lot of work for is. It’ll write blog posts, reports, and essays for us, research for us, draw for us, code for us, develop drugs for us, draft legal documents for us, and think for us. Consulting firms keep releasing robot-laden PDF reports on which white-collar, knowledge work jobs will be automated first. To spoil the surprise, credit authorisers, insurance underwriters, customer support staff, and office administrators are first on the chopping block.

The implication here is that automating away a large chunk of humanity’s hard cognitive labour is both inevitable and mostly net good. Why expend precious human energy thinking when we don’t have to? Why not have the machines do most of the research, analysis, decision-making, and debugging for us while we take long coffee breaks and occasionally check it hasn’t gone off the rails? I say this as someone who spends a lot of their “working day” sipping tea and waiting for Cursor to complete requests and then evaluating its responses.

Let’s briefly put aside the fact we don’t fully understand how models ‘think’ and ‘reason’, and that whatever they’re doing is fundamentally different to what humans do when they ‘think’ and ‘reason’, even if the final output looks surprisingly similar and achieves the intended goal.

Even if we assume their outputs are just as valuable, valid, and useful as human-made outputs, I’m not claiming this is broadly true, and debating whether they are or not is frankly a bit pointless outside of extremely specific tasks, contexts, and model architectures. offloading all our thinking work comes with the heavy price of dependence, skill degradation, and vulnerability. We’re seeing this problem actively play out in the developer community as junior roles disappear and computer science students use ChatGPT to do their homework instead of using first principles to solve problems and debug their code. Young developers are already less skilled and less employable thanks to everyone’s increasing dependence on AI coding assistants.

I’m especially interested in how our dependence on AI tools affects critical thinking. Delegating your cognition to an opaque model that always responds with authoritative and professional-sounding responses is becoming effortless. We’re all at risk of mindlessly accepting its outputs while forgetting what thinking hard about a problem truly feels like. Scepticism, careful analysis, lateral thinking, and metacognition are muscles that atrophy without regular exercise.

Two recent studies found higher usage of generative AI tools and higher confidence in their outputs correlated with less critical thinking:

A Microsoft Research team ( Lee 2025 ) surveyed 319 knowledge workers on their use of generative AI and found “higher confidence in GenAI is associated with less critical thinking, while higher self-confidence is associated with more critical thinking.”
Michael Gerlich ( Gerlich 2025 ) interviewed 666 participants from a range of educational backgrounds and age groups, and found “a significant negative correlation between frequent AI tool usage and critical thinking abilities, mediated by increased cognitive offloading. Younger participants exhibited higher dependence on AI tools and lower critical thinking scores compared to older participants.”

It doesn’t feel inevitable to me that machine learning models will reduce our critical thinking capacities and take on the bulk of our essential cognitive work. I think the nuance lies in how we design these models, how we decide to use them, and the affordances of the interfaces we wrap around them.

Complementary vs. Competitive Cognitive Artefacts

It’s useful to consider this problem though the frame of complementary vs. competitive cognitive artefacts. David Krakauer first proposed these terms in his prescient 2016 Nautilis article outlining his concerns about the trajectory of AI. Building off Don Norman’s theory of cognitive artefacts , Krakauer looks at historical artefacts and tools that augment and amplify our cognition by giving us “a store of memory, a mechanism for search, and an instrument of calculation.” What some of us might call Tools for Thought .

Complementary cognitive artefacts act as teachers and coaches that expand our existing skills. When we use them, we learn new mental models and ways of thinking. Tools like abacuses, astrolabes, maps, compasses, flashcards, and memory palaces are all good examples. Even when we don’t have the tool to hand, we can visualise them and continue to use their principles to solve problems. We are more capable after we’ve used these tools because they complement our natural abilities.

[Illustration of astrolabe, map, compass, abacus, language flashcards]

Competitive cognitive artefacts compete with our skills. They act as serfs rather than teachers. Meaning they do the cognitive work for us, without revealing their inner workings or showing us how they’re solving the problem. They enhance our cognition but in a way that makes us dependent on them. When we don’t have them to hand, we are no more capable than when we started. Tools like digital calculators, GPS navigation systems, translation apps, spellcheckers, and search algorithms fall into this category.

[Illustration of digital calculator, GPS / google maps, chatgpt, google translate, spell check]

You might read that description and instantly recognise that our current machine learning models, and the ways we interact with them, make them the competitive type. For the most part, we cannot see how they work. While most of major models have recently added visible reasoning into their web interfaces, these “reasoning” outputs only give us shallow insight into how the model is solving the problem. They are not very good at explaining their internal reasoning or process. Without them, you are no more capable than when you started. You may even be less skilled as you slowly forget how to do tasks without the help of AI assistance.

We’re becoming reliant on them in a way that then makes us beholden to the companies that own and control them. Not to mention the larger AI safety implications of outsourcing essential tasks and decision making to inscrutable, black-box systems that may have misaligned incentives that lead them to deceive or mislead us. AI safety researchers are quick to point out this is exactly how we’ll blindly walk down the path of civilizational collapse and self-destruction.

Just as machine learning models don’t inherently threaten critical thinking, I don’t believe they are inherently competitive as cognitive artefacts. The line between a tool being complementary or competitive lies in how its designed, and it’s not a strict either/or. Every cognitive artefact sits on a spectrum between the two types. We should intentionally design them to be as complementary as possible, while accepting there’s always going to be some degree of competitiveness. We can still choose to craft models and interfaces that reveal their inner workings, teach us new ways of thinking, and make us more capable in their absence.

The Lodestone Project

With that long introductory setup out of the way, let’s get to the point. I have some questions I’m trying to answer. So I’ve started a little research project to contain them called Lodestone. Named after a type of natural magnet used to make the first compasses.

A lodestone with small pieces of metal magnetically attracted to it

At the moment this project is primarily a set of questions and a bunch of prototypes intended to address those questions. I started by asking Can a language model help me to think more, not less?

It’s a good question, but it’s not very specific.

When I ask that, I’m selfishly thinking about contexts where I wish I were a better thinker. For me, that’s almost always in writing essays like this one. Non-fiction, heavily researched, expository essays that aim to inform, persuade, and entertain.

Writing these pieces requires a lot of critical thinking meta-skills like paying attention to industry trends, asking interesting questions, crafting clear arguments, finding evidence to support claims, evaluating the quality of that evidence, questioning my own assumptions, and pre-emptively addressing counter-arguments. In short, trying really hard to be a smart person.

A more specific version of the question might be Can a model improve my expository writing and critical thinking skills?

Better, but what kind of writing and critical thinking skills am I referencing here?

So let’s just say the thing in detail: Can a model teach me to ask deeper questions, make stronger claims, defend them with good evidence, evaluate the quality of evidence, question my own assumptions, consider and address counter-arguments, and draw more nuanced conclusions?

That’s a lot of jobs to be done, but I intend to build a lots of different prototypes to explore each of those jobs. Maybe I can achieve them all within one Mega App, but I’m more inclined to explore what the ideal interface for each looks like on its own terms, before trying to slickly unify them.

Lodestone is primarily meant to feed my own curiosity, but also act a guide for my future work. As an product designer and builder in this space, it’s my responsibility to understand what makes a tool complementary and not competitive on a very pragmatic level; what kinds of system architectures and design patterns encourage transparency, legibility, discoverability, and user skill-building?

Version One: The Claim Maker

For my first pass at a prototype, I focused on clearly identifying claims from sets of rough notes, and evaluating the evidence I had for each claim. The idea was to give myself a narrow path to walk: consider each claim one-by-one, check whether it’s valid by searching for evidence, and then adapt it, ditch it, or strengthen it.

[ Pics of sketches of the initial idea ]

[ Screenshot of first prototype ]

If checking every single claim and piece of evidence one-by-one sounds laborious, you’re right. It’s the kind of thing I know I should do when I write, but I am just as lazy as the next person, and as a new parent I’m operating at 15% brain capacity. I need some serious cognitive help over here.

The idea here was to try and make this process easier. Having an interface that prompts me to clearly identify and work through each claim is meant to take on some of that cognitive load. It’s a scaffolding step that reminds me

And to go even further than that, I do want to use models to do some of this labour for me, provided I’m prompted to check it and question it.

I didn’t make it very far into this prototype before realising it didn’t feel expandable. There is so much more to crafting a piece of writing than just outlining your claims and evidence. The interface also felt a bit clunky and heavy-handed for what it did. So I quickly abandoned it and started building the next version.

Version Two: Question Nudging and Classification Labelling

There were many more aspects of a writing draft I wanted to analyse beyond just claims and evidence. There are also questions, assumptions, costs, benefits, causes, consequences, and counter-arguments to consider.

Version Three: Argument Maps

Once I had the labelling view complete, I wanted to try using the same structured data to display different views. Some kind of argument map felt like a clear next experiment.