DeepSeek’s new AI model can generate 200K pages of training data daily on a single GPU | Technology News


Chinese AI startup DeepSeek has released a new multimodal AI model, which it said is capable of processing large and complex documents using significantly fewer tokens.

The Huangzhou-based company said that DeepSeek-OCR uses visual perception as a medium to compress text for large language models (LLMs) more efficiently. Both the source code and weights of the model are publicly available via online developer platforms Hugging Face and GitHub. In its research, DeepSeek found that using “vision encoders” to compress text for LLMs would enable them to process massive amounts of text at lower computing costs.

“Through DeepSeek-OCR, we demonstrate that vision-text compression can achieve significant token reduction (7-20×) for different historical context stages, offering a promising direction for addressing long-context challenges in large language models,” the company said in a technical paper accompanying the model’s release.

The launch of DeepSeek-OCR reflects the company’s continued focus on improving the efficiency of LLMs while driving down the costs of building and using them. The company is said to have taken a similar approach in developing its breakthrough open-weight models V3 and R1which made waves across the tech industry for achieving performance comparable to cutting-edge models like OpenAI’s o1 at only a fraction of the cost.

Story continues below this ad

Technical specs

With DeepSeek-OCR, the company aims to tackle a key limitation of LLMs: handling long contexts without running into memory limits. Its core hypothesis is that processing text as images can be more computationally efficient than processing raw digital text. The new OCR model serves as a proof-of-concept for this idea.

The model comprises two parts: a 380 million-parameter DeepEncoder used to analyse each image and produce a compressed version of it; and a 570 million-active parameter text generator built on top of another three billion-parameter mixture of experts (MoE) language model.

DeepSeek’s researchers said that they trained the OCR model with 30 million PDF pages in roughly 100 languages, including 25 million in Chinese and English, along with 10 million synthetic diagrams, five million chemical formulae, and one million geometric figures.

Performance on benchmarks

The OCR model is capable of compressing text by up to a factor of ten while retaining 97 per cent of the original information, as per the technical paper. It can be used to process a wide range of document types including plain text, diagrams, chemical formulae, and geometric figures while being able to keep the original formatting, output plain text, and even provide general image descriptions. However, the requirement of ‘vision tokens’ is also likely to vary based on the document size and image resolution.

Story continues below this ad

In sum, DeepSeek-OCR can generate training data for LLMs and vision language models (VLMs) at a scale of more than 200,000 pages per day while running on a single Nvidia A100 GPU.

The OCR model was evaluated on two benchmarks, the OmniDocBench test that is used to evaluate a model’s document parsing capabilities and the Fox benchmark test used to evaluate the focusing capabilities of vision language models on dense PDF documents.

“On OmniDocBench, it surpasses GOT-OCR2.0 (256 tokens/page) using only 100 vision tokens, and outperforms MinerU2.0 (6000+ tokens per page on average) while utilising fewer than 800 vision tokens,” the paper read.




Related Posts

Netflix rolls out Playground app with games, boosts kids and preschool shows | Technology News

4 min readNew DelhiApr 7, 2026 02:29 PM IST Popular streamer Netflix, one Monday, April 6, said that it was expanding its preschool and kids’ content lineup. The platform made the…

NASA’s Artemis II breaks Apollo 13’s distance record as humans travel farther from Earth than ever before | Technology News

5 min readHoustonUpdated: Apr 7, 2026 04:57 AM IST With the moon now filling their windows, the Artemis II astronauts set a record Monday as the farthest humans from Earth…

Leave a Reply

Your email address will not be published. Required fields are marked *

You Missed

Netflix rolls out Playground app with games, boosts kids and preschool shows | Technology News

  • By admin
  • April 7, 2026
  • 3 views
Netflix rolls out Playground app with games, boosts kids and preschool shows | Technology News

Oh. Another Moonshot – The Health Care Blog

  • By admin
  • April 7, 2026
  • 2 views
Oh. Another Moonshot – The Health Care Blog

Shreyas Iyer’s sister takes dig at KKR after washout chaos in Kolkata against Punjab Kings

  • By admin
  • April 7, 2026
  • 6 views
Shreyas Iyer’s sister takes dig at KKR after washout chaos in Kolkata against Punjab Kings

Rajpal Yadav urges fans not to criticise Saurabh Dwivedi, Zakir Khan over dig: ‘They are like brothers to me’

  • By admin
  • April 7, 2026
  • 4 views
Rajpal Yadav urges fans not to criticise Saurabh Dwivedi, Zakir Khan over dig: ‘They are like brothers to me’

Karan Johar shares fanboy moment with Meryl Streep, Anne Hathaway ahead of The Devil Wears Prada 2 release: ‘My knees were trembling | Bollywood News

  • By admin
  • April 7, 2026
  • 2 views
Karan Johar shares fanboy moment with Meryl Streep, Anne Hathaway ahead of The Devil Wears Prada 2 release: ‘My knees were trembling | Bollywood News

00 PM deadline for all train passengers in Iran

  • By admin
  • April 7, 2026
  • 2 views
00 PM deadline for all train passengers in Iran