Close Menu
  • Breaking News
  • Business
  • Career
  • Sports
  • Climate
  • Science
    • Tech
  • Culture
  • Health
  • Lifestyle
  • Facebook
  • Instagram
  • TikTok
Categories
  • Breaking News (5,580)
  • Business (328)
  • Career (4,690)
  • Climate (222)
  • Culture (4,681)
  • Education (4,918)
  • Finance (222)
  • Health (888)
  • Lifestyle (4,528)
  • Science (4,608)
  • Sports (349)
  • Tech (185)
  • Uncategorized (1)
Hand Picked

How fears of being labeled 'racist' helped 'provide cover' for the exploding Minnesota fraud scandal

December 14, 2025

SpaceX aims for 550th booster landing amid Saturday night flight – Spaceflight Now

December 14, 2025

Fall into a new career at these upcoming events 

December 14, 2025

Sherrone Moore’s firing and its shocking aftermath raises tough questions about the culture of Michigan’s athletic department

December 14, 2025
Facebook X (Twitter) Instagram
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
Facebook X (Twitter) Instagram
onlyfacts24
  • Breaking News

    How fears of being labeled 'racist' helped 'provide cover' for the exploding Minnesota fraud scandal

    December 14, 2025

    In Khartoum, exhumation of makeshift graves reawakens families’ grief | Sudan war News

    December 14, 2025

    Global week ahead: Europe under fire

    December 14, 2025

    Multiple dead, several wounded after mass shooting at Brown University

    December 14, 2025

    Thailand launches new offensive as Cambodia halts all border crossings | Conflict News

    December 14, 2025
  • Business

    Communicators know business acumen matters. Most don’t feel ready.

    December 12, 2025

    AI investment is a hot topic in the business community and policy authorities these days. As global ..

    November 26, 2025

    Hedy AI Unveils ‘Topic Insights’: Revolutionizing Business Communication with Cross-Session Intelligence

    November 25, 2025

    Revolutionizing Business Communication with Cross-Session Intelligence

    November 25, 2025

    Parking top topic at Idaho Springs business meeting | News

    November 25, 2025
  • Career

    Fall into a new career at these upcoming events 

    December 14, 2025

    Faculty invited to join Community of Practice focused on career competency integration

    December 14, 2025

    Starr Announces Retirement Following Storied 45-Season Career at Field Hockey Helm

    December 14, 2025

    Former construction worker finds career shift as nurse | Health-care

    December 13, 2025

    Provo High’s College & Career Center at Work

    December 13, 2025
  • Sports

    Collective bargaining for college sports becomes hot topic for athletic directors

    December 12, 2025

    Fanatics Launches a Prediction Market—Without the G-Word

    December 5, 2025

    Mark Daigneault, OKC players break silence on Nikola Topic’s cancer diagnosis

    November 20, 2025

    The Sun ChronicleThunder guard Nikola Topic diagnosed with testicular cancer and undergoing chemotherapyOKLAHOMA CITY (AP) — Oklahoma City Thunder guard Nikola Topic has been diagnosed with testicular cancer and is undergoing chemotherapy..3 weeks ago

    November 19, 2025

    Olowalu realignment topic of discussion at Nov. 18 meeting | News, Sports, Jobs

    November 19, 2025
  • Climate

    PA Environment & Energy Articles & NewsClips By Topic

    December 8, 2025

    ‘Environmental Resilience’ topic of Economic Alliance virtual Coffee Chat Dec. 9

    December 7, 2025

    Insights from World Bank Group Country Climate and Development Reports covering 93 economies

    December 3, 2025

    PA Environment & Energy Articles & NewsClips By Topic

    November 24, 2025

    Environmental Risks of Armed Conflict and Climate-Driven Security Risks”

    November 20, 2025
  • Science
    1. Tech
    2. View All

    Beware! 5 topics that you should never discuss with ChatGPT

    December 14, 2025

    Off Topic: Vintage tech can help Gen Z fight digital fatigue

    December 6, 2025

    Snapchat ‘Topic Chats’ Lets Users Publicly Comment on Their Interests

    December 5, 2025

    AI and tech investment ROI

    December 4, 2025

    SpaceX aims for 550th booster landing amid Saturday night flight – Spaceflight Now

    December 14, 2025

    How to watch the Geminids meteor shower in Colorado, where over 100 meteors per hour could be visible

    December 14, 2025

    US lab creates clear window gel that traps heat to cut energy loss

    December 14, 2025

    Scientists uncover the hidden survival trick that lets cancer bounce back

    December 14, 2025
  • Culture

    Sherrone Moore’s firing and its shocking aftermath raises tough questions about the culture of Michigan’s athletic department

    December 14, 2025

    Galveston City Council considers changing island’s downtown parking ‘culture’ | Local News

    December 14, 2025

    The Detroit NewsUM board expands Moore probe, will look at firing, athletic dept. cultureThe University of Michigan board authorized law firm Jenner & Block to investigate the "situation" around Sherrone Moore's firing and other….1 day ago

    December 14, 2025

    Michigan launches probe into “practices and culture” of athletic department

    December 14, 2025

    Daily Dose – Daily Dose: Tech & Pop Culture Financial News

    December 13, 2025
  • Health

    The Herald PalladiumWomen's heart health topic in Niles Feb. 20By Staff NILES – Janel Groth, RN, care manager with Lakeland's "Heart Safe" program, will speak about women's heart health to the Breast….3 days ago

    December 14, 2025

    Abortion

    December 12, 2025

    Off Topic: ICE is creating a public health crisis

    December 10, 2025

    Universal Health Coverage Overview

    December 9, 2025

    Billings GazetteVideo: Max Baucus on why health care is a hot topicClick here to view this video from https://billingsgazette.com..36 minutes ago

    December 9, 2025
  • Lifestyle
Contact
onlyfacts24
Home»Science»A look under the hood of DeepSeek’s AI models doesn’t provide all the answers
Science

A look under the hood of DeepSeek’s AI models doesn’t provide all the answers

December 14, 2025No Comments
Facebook Twitter Pinterest LinkedIn Tumblr Email
120225 A DeepSeek feat.jpg
Share
Facebook Twitter LinkedIn Pinterest Email

It’s been almost a year since DeepSeek made a major AI splash.

In January, the Chinese company reported that one of its large language models rivaled an OpenAI counterpart on math and coding benchmarks designed to evaluate multi-step problem solving capabilities, or what the AI field calls “reasoning.” DeepSeek’s buzziest claim was that it achieved this performance while keeping costs low. The implication: AI model improvements didn’t always need massive computing infrastructure or the very best computer chips but might be achieved by efficient use of cheaper hardware. A slew of research followed that headline-grabbing announcement, all trying to better understand DeepSeek models’ reasoning methods, improve them and even outperform them.

Sign up for our newsletter

We summarize the week’s scientific breakthroughs every Thursday.

What makes the DeepSeek models intriguing is not only their price — free to use — but how they are trained. Instead of training the models to solve tough problems using thousands of human-labeled data points, DeepSeek’s R1-Zero and R1 models were trained exclusively or significantly through trial and error, without explicitly being told how to get to the solution, much like a human completing a puzzle. When an answer was correct, the model received a reward for its actions, which is why computer scientists call this method reinforcement learning.

To researchers looking to improve the reasoning abilities of large language models, or LLMs, DeepSeek’s results were inspiring, especially if it could perform as well as OpenAI’s models but be trained reportedly at a fraction of the cost. And there was another encouraging development: DeepSeek offered its models up to be interrogated by noncompany scientists to see if the results held true for publication in Nature— a rarity for an AI company. Perhaps what excited researchers most was to see if this model’s training and outputs could give us look inside the “black box” of AI models.

In subjecting its models to the peer review process, “DeepSeek basically showed its hand,” so that others can verify and improve the algorithms, says Subbarao Kambhampati, a computer scientist at Arizona State University in Tempe who peer reviewed DeepSeek’s September 17 Nature paper. Although he says it’s premature to make conclusions about what’s going on under any DeepSeek model’s hood, “that’s how science is supposed to work.”

Why training with reinforcement learning costs less

The more computing power training takes, the more it costs. And teaching LLMs to break down and solve multistep tasks like problem sets from math competitions has proven expensive, with varying degrees of success. During training, scientists commonly would tell the model what a correct answer is and the steps it needs to take to reach that answer. That’s a lot of human-annotated data and a lot of computing power.

You don’t need that for reinforcement learning. Rather than supervise the LLM’s every move, researchers instead only tell the LLM how well it did, says reinforcement learning researcher Emma Jordan of the University of Pittsburgh.

How reinforcement learning shaped DeepSeek’s model

Researchers have already used reinforcement learning to train LLMs to generate helpful chatbot text and avoid toxic responses, where the reward is based on its alignment to the preferred behavior. But aligning with human reading preferences is an imperfect use case for reward-based training because of the subjective nature of that exercise, Jordan says. In contrast, reinforcement learning can shine when applied to math and code problems, which have a verifiable answer.

September’s Nature publication details what made it possible for reinforcement learning to work for DeepSeek’s models. During training, the models try different approaches to solve math and code problems, receiving a reward of 1 if correct or a zero otherwise. The hope is that, through the trial-and-reward process, the model will learn the intermediate steps, and therefore the reasoning patterns, required to solve the problem.

In the training phase, the DeepSeek model does not actually solve the problem to completion, Kambhampati says. Instead, the model makes, say, 15 guesses. “And if any of the 15 are correct, then basically for the ones that are correct, [the model] gets rewarded,” Kambhampati says. “And the ones that are not correct, it won’t get any reward.”

Sponsor Message

But this reward structure doesn’t guarantee that a problem will be solved. “If all 15 guesses are wrong, then you are basically getting zero reward. There is no learning signal whatsoever,” Kambhampati says.

For the reward structure to bear fruit, DeepSeek had to have a decent guesser as a starting point. Fortunately, DeepSeek’s foundation model, V3 Base, already had better accuracies than older LLMs such as OpenAI’s GPT-4o on the reasoning problems. In effect, that made the models better at guessing. If the base model is already good enough such that the correct answer is in the top 15 probable answers it comes up with for a problem, during the learning process, its performance improves so that the correct answer is its top-most probable guess, Kambhampati says.

There is a caveat: V3 Base might have been good at guessing because DeepSeek researchers scraped publicly available data from the internet to train it. The researchers write in the Nature paper that some of that training data could have included outputs from OpenAI’s or others’ models, however unintentionally. They also trained V3 Base in the traditional supervised manner, so therefore some component of that feedback, and not solely reinforcement learning, could go into any model emerging from V3 Base. DeepSeek did not respond to SN‘s requests for comment.

When training V3 Base to produce DeepSeek-R1-Zero, researchers used two types of reward — accuracy and format. In the case of math problems, verifying the accuracy of an output is straightforward; the reward algorithm checks the LLM output against the correct answer and gives the appropriate feedback. DeepSeek researchers use test cases from competitions to evaluate code. Format rewards incentivize the model to describe how it arrived at an answer and to label that description before providing the final solution.

On the benchmark math and code problems, DeepSeek-R1-Zero performed better than the humans selected for the benchmark study, but the model still had issues. Being trained on both English and Chinese data, for example, led to outputs that mixed the languages, making the outputs hard to decipher. As a result, DeepSeek researchers went back and implemented an additional reinforcement learning stage in the training pipeline with a reward for language consistency to prevent the mix-up. Out came DeepSeek-R1, a successor to R1-Zero.

Can LLMs reason like humans now?

It might seem like if the reward gets the model to the right answer, it must be making reasoning decisions in its responses to rewards. And DeepSeek researchers report that R1-Zero’s outputs suggest that it uses reasoning strategies. But Kambhampati says that we don’t really understand how the models work internally and its outputs have been overly anthropomorphized to imply that it is thinking. Meanwhile, interrogating the inner workings of AI model “reasoning” remains an active research problem.

DeepSeek’s format reward incentivizes a specific structure for its model’s responses. Before the model produces the final answer, it generates its “thought process” in a humanlike tone, noting where it might check an intermediate step, which might make the user think that its responses mirror its processing steps.

How an AI model “thinks”

This string of text and equations shows an example of the DeepSeek model’s output format, outlining its “thinking process” before generating the final solution.

An output from a DeepSeek model showing how it "thinks"

The DeepSeek researchers say that the model’s “thought process” output includes terms like ‘aha moment’ and ‘wait’ in higher frequency as the training progresses, indicating the emergence of self-reflective and reasoning behavior. Further, they say that the model generates more “thinking tokens” — characters, words, numbers or symbols produced as the model processes a problem — for complex problems and fewer for easy problems, suggesting that it learns to allocate more thinking time for harder problems.

But, Kambhampati wonders if the “thinking tokens,” even when clearly helping the model, provide any actual insight about its processing steps to the end user. He doesn’t think that the tokens correspond to some step-by-step solution of the problem. In DeepSeek-R1-Zero’s training process, every token that contributed to a correct answer gets rewarded, even if some intermediate steps the model took along the way to the correct answer were tangents or dead ends. This outcome-based reward model isn’t set up to reward only the productive portion of the model’s reasoning to encourage it to happen more often, he says. “So, it is strange to train the system only on the outcome reward model and delude yourself that it learned something about the process.”

Moreover, performance of AI models measured on benchmarks like a prestigious math competition’s dataset of problems are known to be inadequate indicators of how good the model is at problem-solving. “In general, telling whether a system is actually doing reasoning to solve the reasoning problem or using memory to solve the reasoning problem is impossible,” Kambhampati says. So, a static benchmark, with a fixed set of problems, can’t accurately convey a model’s reasoning ability since the model could have memorized the correct answers during its training on scraped internet data, he says.

AI researchers seem to understand that when they say LLMs are reasoning, they mean that they’re doing well on the reasoning benchmarks, Kambhampati says. But laypeople might assume that “if the models got the correct answer, then they must be following the right process,” he says. “Doing well on a benchmark versus using the process that humans might be using to do well in that benchmark are two very different things.” A lack of understanding of AI’s “reasoning” and an overreliance on such AI models could be risky, leading humans to accept AI decisions without critically thinking about the answers.

Some researchers are trying to get insights into how these models work and what training procedures are actually instilling information into the model, Jordan says, with a goal to reduce risk. But, as of now, the inner workings of how these AI models solve problems remains an open question.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

SpaceX aims for 550th booster landing amid Saturday night flight – Spaceflight Now

December 14, 2025

How to watch the Geminids meteor shower in Colorado, where over 100 meteors per hour could be visible

December 14, 2025

US lab creates clear window gel that traps heat to cut energy loss

December 14, 2025

Scientists uncover the hidden survival trick that lets cancer bounce back

December 14, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

How fears of being labeled 'racist' helped 'provide cover' for the exploding Minnesota fraud scandal

December 14, 2025

SpaceX aims for 550th booster landing amid Saturday night flight – Spaceflight Now

December 14, 2025

Fall into a new career at these upcoming events 

December 14, 2025

Sherrone Moore’s firing and its shocking aftermath raises tough questions about the culture of Michigan’s athletic department

December 14, 2025
News
  • Breaking News (5,580)
  • Business (328)
  • Career (4,690)
  • Climate (222)
  • Culture (4,681)
  • Education (4,918)
  • Finance (222)
  • Health (888)
  • Lifestyle (4,528)
  • Science (4,608)
  • Sports (349)
  • Tech (185)
  • Uncategorized (1)

Subscribe to Updates

Get the latest news from onlyfacts24.

Follow Us
  • Facebook
  • Instagram
  • TikTok

Subscribe to Updates

Get the latest news from ONlyfacts24.

News
  • Breaking News (5,580)
  • Business (328)
  • Career (4,690)
  • Climate (222)
  • Culture (4,681)
  • Education (4,918)
  • Finance (222)
  • Health (888)
  • Lifestyle (4,528)
  • Science (4,608)
  • Sports (349)
  • Tech (185)
  • Uncategorized (1)
Facebook Instagram TikTok
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
© 2025 Designed by onlyfacts24

Type above and press Enter to search. Press Esc to cancel.