5

Smidgeons Stream

A stream of interesting links, papers, and tiny thoughts. Roughly what I'm reading and thinking about at the moment.

If you’re not distressingly embedded in the torrent of AI news on Twixxer like I reluctantly am, you might not know what DeepSeek is yet. Bless you.

From what I’ve gathered:

  • On January 20th, a Chinese company named DeepSeek released a new reasoning model called R1.
  • A reasoning model is a large language model told to “think step-by-step” before it gives a final answer. This “ chain of thought ” technique dramatically improves the quality of its answers. These models are also fine-tuned to perform well on complex reasoning tasks.
  • R1 reaches equal or better performance on a number of major benchmarks compared to OpenAI’s o1 (our current state-of-the-art reasoning model) and Anthropic’s Claude Sonnet 3.5 but is significantly cheaper to use.
  • DeepSeek R1 is open-source , meaning you can download it and run it on your own machine.
  • They offer API access at a much lower cost than OpenAI or Anthropic. But given this is a Chinese model, and the current political climate is “complicated,” and they’re almost certainly training on input data, don’t put any sensitive or personal data through it.
  • You can use R1 online through the DeepSeek chat interface . You can turn on both reasoning and web search to inform your answers. Reasoning mode shows you the model “thinking out loud” before returning the final answer.
DeepSeek R1 showing its thinking
  • You can use Ollama to run R1 on your own machine, but standard personal laptops won’t be able to handle the larger, more capable versions of the model (32B+). You’ll have to run the smaller 8B or 14B version, which will be slightly less capable. I have the 14B version running just fine on a Macbook Pro with an Apple M1 chip. Here’s a Reddit guide on getting it running locally.
  • DeepSeek claims it only cost $5.5 million to train the model, compared to an estimated $41-78 million for GPT-4. If true, building state-of-the-art models is no longer just a billionaires game.
  • The thoughtbois of Twixxer are winding themselves into knots trying to theorise what this means for the U.S.-China AI arms race. A few people have referred to this as a “ sputnik moment .”
  • From my initial, unscientific, unsystematic explorations with it, it’s really good. Using it as my default LM going forward (for tasks that don’t involve sensitive data). Quirks include being way too verbose in its reasoning explanations and using lots of Chinese language sources when it searches the web. Makes it challenging to validate whether claims match the source texts.

Here’s the announcement Tweet:

TLDR high-quality reasoning models are getting significantly cheaper and more open-source. This means companies like Google, OpenAI, and Anthropic won’t be able to maintain a monopoly on access to fast, cheap, good quality reasoning. This is net good for everyone.

Common Misconceptions About the Complexity in Robotics vs AI

A roboticist breaks down common misconceptions about what’s hard and easy in robotics. A response to everyone asking “can’t we just stick a large language model into its brain to make it more capable?”

Contrary to the assumptions of many people, making robots perceive and move in the world in the way humans can turns out to be an extraordinarily hard problem to solve. While seemingly “hard” problems like scoring well on intelligence tests, winning at chess, and acing the GMAT turn out to be much easier.

Everyone thought it would be extremely hard and computationally expensive to teach computers language, and easy to teach them to identify objects visually. The opposite turned out to be true. This is known as Moravec’s Paradox .

Especially liked the ending where Dan explores why people are so resistant to the idea picking up a cup is more complex than solving logic puzzles. Partly anthropocentrism; humans are special because we can do higher order thinking. Any lowly animal can sense the world and move through it. Partly social class bias; people who work manual labour jobs using their bodies are less valued then people who sit still using their intellect to solve problems.

A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study

Researchers submitted entirely AI-generated exam answers to the undergraduate psychology department of a “reputable” UK university. The vast majority went undetected and the AI answers achieved higher scores than real students.

“We report a rigorous, blind study in which we injected 100% AI written submissions into the examinations system in five undergraduate modules, across all years of study, for a BSc degree in Psychology at a reputable UK university. We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students. Across modules there was an 83.4% chance that the AI submissions on a module would outperform a random selection of the same number of real student submissions.”

I have to assume educators are swiftly moving to hand-written exams under supervised conditions and oral exams. Anything else seems to negate the point of exams.

Unbaited

A browser extension that filters out engagement bait from your feed on Twixxer. Uses Llama 3.3 under the hood to analyse Tweets in real time and then blurs out sensationalist political content. Or whatever else you prompt it to blur – the system prompt is editable:

System settings and a customisable prompt for the Unbaited app
System settings and a customisable prompt for the Unbaited app

This is certainly a way to try and manage Twixxer’s slow demise into right-wing extremist content. Though I’m taking this more as a thought experiment and interesting prototype than a sincere suggestion we should spend precious energy burning GPUs on clickbait filtering. Integrating LLMs into the browsing experience and using them to selectively curate content for you is the more interesting move here.

Welcome to the smidgeon stream. This is a new kind of content on the Garden . One that was overdue. They’re called smidgeons. Teeny, tiny entries. The kinds of things I used to put in Tweets, before Twitter died a terrible death.

Most are only a few sentences long. They’re mainly links to notable things – good articles, papers, and ideas. I’ve been meaning to do this for a while, but a recent migration to Astro suddenly made it much easier.