How DeepSeek Thinks

OpenAI vs DeepSeek War

You have already heard a lot of noise about AI agents. Today’s newsletter breaks down how DeepSeek works.

I added some content, news, and resources about AI agents, Elon Musk, and Google.

In today’s edition:

  • AI Roundup— OpenAI vs DeepSeek

  • Dive Deep Drill— how does DeepSeek think

  • Build Together— here’s how I can help you.

AI Leadership Academy—Join leaders, PMs, VPs, CEOs, Consultants & professionals from various domain & get do it yourself & done with you sessions.

Self-Paced with Community

AI Roundup

I found these resources, content, and news.

— [news] Apple CEO says DeepSeek shows ‘innovation that drives efficiency’
— [content] DeepSeek stole our tech... says OpenAI
— [resource] I built a DeepSeek R1-powered VS Code extension
— [news] India to develop own generative AI model
— [content] building agentic systems by the Claude team.
— [resource] white paper by Google on Agents.
— [news] Andrew Ng on DeepSeek

Andrew Ng on DeepSeek

Seeking impartial news? Meet 1440.

Every day, 3.5 million readers turn to 1440 for their factual news. We sift through 100+ sources to bring you a complete summary of politics, global events, business, and culture, all in a brief 5-minute email. Enjoy an impartial news experience.

Dive Deep Drill

How DeepSeek Works?

You ask DeepSeek a tricky math problem, expecting an answer within seconds.

But how exactly it works?

Let’s explore how it learns, improves, and even distills its intelligence into smaller models.

It follows 3 major steps

  1. DeepSeek-R1-Zero (pure reinforcement learning)

  2. DeepSeek-R1 (reinforcement learning with cold start)

  3. Knowledge distillation

Steps as per the research paper

1 — DeepSeek-R1-Zero (pure reinforcement learning)

When you submit a question, DeepSeek doesn’t just rely on memorized knowledge.

It has been trained using reinforcement learning, in which the model continuously improves its reasoning by receiving rewards for correct answers.

The training begins with DeepSeek-R1-Zero, a base model trained only using RL (without prior supervision, only receiving feedback, and not copying examples).

The model learns through trial and error, evolving its reasoning skills over time, similar to how humans learn from experience.

What’s Special About DeepSeek-R1-Zero?

  • DeepSeek-R1-Zero learns purely through self-evolution, with no pre-labeled data.

  • It uses Group Relative Policy Optimization (RL algorithm) which helps the model refine its approach by evaluating multiple potential answers and selecting the best one.

The model gets two types of rewards

  • accuracy rewards checks if the final answer is correct.

  • format rewards ensure the reasoning process is enclosed within special tags like <think> ... </think>.

The ‘Aha’ Moments

The model gradually improves by allocating more thought time to complex problems and developing problem-solving strategies.

Issues DeepSeek-R1-Zero face

  • readability

  • language mixing—it sometimes blends different languages within its responses.

2 — DeepSeek-R1 (reinforcement learning with cold start)

To overcome these issues,

They introduce DeepSeek-R1, which follows a multi-stage training process.

This model starts with cold-start data (a set of high-quality, human-readable reasoning examples).

How DeepSeek Learns Better

3 — Knowledge Distillation

DeepSeek’s intelligence isn’t just limited to large models.

Its reasoning capabilities are distilled into smaller models, similar to how an experienced teacher trains students.

How Does This Work?

  • DeepSeek-R1’s best responses are used to train smaller models like Qwen and Llama.

  • These distilled models do not go through RL but still achieve impressive reasoning performance.

  • The 14B distilled model even surpasses previous open-source AI models, proving that distillation is an effective way to scale AI intelligence.

5 DeepSeek Unique

  1. self-evolution enables reasoning improvements without initial fine-tuning.

  2. carefully selected examples enhance readability and reasoning quality.

  3. smaller models inherit intelligence from larger ones—more accessibility.

  4. breaking problems into logical steps improves complex problem-solving.

  5. helps filter out weak responses and retain high-quality outputs.

DeepSeek-R1 vs OpenAI-o1

Source - DeepSeek Research

Next time you interact with an AI model, remember

it’s not just retrieving information—it’s reasoning, learning, and evolving with every step.

If you want to build your first AI Agent, here is my video on [YouTube]:

Build Together

Want to work together? Here’s How I can help you

I use BeeHiiv to send this newsletter.

Paper Unfold

A breakdown of complex research papers into easy-to-understand pointers.
If you missed the previous ones:

How satisfied are you with today's Newsletter?

This will help me serve you better

Login or Subscribe to participate in polls.

PS: Reply to this email if you want me to write on the topic you are interested in.

—Him

Reply

or to participate.