In realms where computation reigns, A new marvel emerges, Llama-3.1, Born from Nvidia's skilled hands, A gift to all, its power vast and grand.
Deep within its depths of thought, It seeks the truth in every taught, Less than half the size, yet strong, Outshines DeepSeek R1, for wrongs it had been long.
Its parameters dense, a wondrous sight, Nemotron Ultra's might takes flight. In reasoning tasks and answers clear, A marvel to behold, it's here.
March's GPU conference, a secret kept, The Llama-3.1 revealed, in words asleep, Now available for all to see, On Hugging Face, where knowledge flows free.
Toggle its power, with ease you'll find, Reasoning tasks or answers aligned, Efficient inference, a feat of art, With memory reduced and no heart lost.
Upon an H100 node's embrace so tight, This Llama-3.1 shines with pure delight, BF16 and FP8 modes align, For cost-effectiveness, in data center lines.
A new dawn breaks for AI so grand, Nvidia's gift to the world expand. With post-training tweaks of wonder, A stronger bond to knowledge under.
https://venturebeat.com/ai/nvidias-new-llama-3-1-nemotron-ultra-outperforms-deepseek-r1-at-half-the-size/
Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size
Compared to DeepSeek R1, Llama-3.1-Nemotron-Ultra-253B shows competitive results despite having less than half the parameters.
Page Content (HTML):
Nvidia's new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size |
VentureBeat Skip to main content Events Video Special Issues Jobs VentureBeat
Homepage Subscribe Artificial Intelligence View All AI, ML and Deep Learning
Auto ML Data Labelling Synthetic Data Conversational AI NLP Text-to-Speech
Security View All Data Security and Privacy Network Security and Privacy
Software Security Computer Hardware Security Cloud and Data Storage Security
Data Infrastructure View All Data Science Data Management Data Storage and Cloud
Big Data and Analytics Data Networks Automation View All Industrial Automation
Business Process Automation Development Automation Robotic Process Automation
Test Automation Enterprise Analytics View All Business Intelligence Disaster
Recovery Business Continuity Statistical Analysis Predictive Analysis More Data
Decision Makers Virtual Communication Team Collaboration UCaaS Virtual Reality
Collaboration Virtual Employee Experience Programming & Development Product
Development Application Development Test Management Development Languages
Subscribe Events Video Special Issues Jobs Nvidia’s new Llama-3.1 Nemotron Ultra
outperforms DeepSeek R1 at half the size Carl Franzen@carlfranzen April 8, 2025
8:08 AM Share on Facebook Share on X Share on LinkedIn Credit: VentureBeat made
with Midjourney Join our daily and weekly newsletters for the latest updates and
exclusive content on industry-leading AI coverage. Learn More Even as Meta fends
off questions and criticisms of its new Llama 4 model family, graphics
processing unit (GPU) master Nvidia has released a new, fully open source large
language model (LLM) based on Meta’s older model Llama-3.1-405B-Instruct model
and it’s claiming near top performance on a variety of third-party benchmarks —
outperforming the vaunted rival DeepSeek R1 open source reasoning model.
Llama-3.1-Nemotron-Ultra-253B-v1, is a dense 253-billion parameter designed to
support advanced reasoning, instruction following, and AI assistant workflows.
It was first mentioned back at Nvidia’s annual GPU Technology Conference (GTC)
in March. The release reflects Nvidia continued focus on performance
optimization through architectural innovation and targeted post-training.
Announced last night, April 7, 2025, the model code is now publicly available on
Hugging Face, with open weights and post-training data. It is designed to
operate efficiently in both “reasoning on” and “reasoning off” modes, allowing
developers to toggle between high-complexity reasoning tasks and more
straightforward outputs based on system prompts. Designed for efficient
inference The Llama-3.1-Nemotron-Ultra-253B builds on Nvidia’s previous work in
inference-optimized LLM development. Its architecture—customized through a
Neural Architecture Search (NAS) process—introduces structural variations such
as skipped attention layers, fused feedforward networks (FFNs), and variable FFN
compression ratios. This architectural overhaul reduces memory footprint and
computational demands without severely impacting output quality, enabling
deployment on a single 8x H100 GPU node. The result, according to Nvidia, is a
model that offers strong performance while being more cost-effective to deploy
in data center environments. Additional hardware compatibility includes support
for Nvidia’s B100 and Hopper microarchitectures, with configurations validated
in both BF16 and FP8 precision modes. Post-training for reasoning and alignment
Nvidia enhanced the base model through a multi-phase post-training pipeline.
Image Prompt 1
Create an image prompt for a text-to-image generative AI model depicting a vast, futuristic computing facility filled with rows upon rows of GPU nodes. The central focus is a single node, shining brightly as it showcases Nvidia's Llama-3.1-Nemotron-Ultra-253B in action. Surround this scene with visual representations of computational processes, such as streams of data flowing through the air like particles in an energy vortex and intricate patterns that symbolize neural architecture search and inference optimization.
Video Prompt 1
Craft a vivid video prompt for a text-to-video generative AI model to bring to life the release announcement of Nvidia's Llama-3.1-Nemotron-Ultra-253B at the GPU Technology Conference (GTC) in March 2025. The scene opens with a packed auditorium filled with developers and industry experts, eagerly awaiting the unveiling of the new model. As the lights dim, an on-screen representation of the Llama-3.1 appears, showcasing its dense network of parameters pulsating with activity. Transition into visualizations highlighting the AI's capabilities, such as real-time complex reasoning tasks and efficient deployment across various GPU architectures, ending with a graphic demonstration of the performance metrics that highlight its competitive edge over rivals like DeepSeek R1.
Verse 2
In realms where equations bind, A mind of logic did entwine. With codes and tools in hand so grand, Its intellect it did expand.
Through pre-training vast and deep, Over tokens, it did reap. To knowledge distill with grace divine, From datasets, it drew its line.
In reasoning mode, it rose anew, Outperforming others true. MATH500 and AIME25 saw its power, With scores soaring ever higher tower.
Its prowess in the tasks was clear, In code generation and more. A dense model it may be indeed, But in LLM use cases, it led.
This included supervised fine-tuning across domains such as math, code
generation, chat, and tool use, followed by reinforcement learning with Group
Relative Policy Optimization (GRPO) to further boost instruction-following and
reasoning performance. The model underwent a knowledge distillation phase over
65 billion tokens, followed by continual pretraining on an additional 88 billion
tokens. Training datasets included sources like FineWeb, Buzz-V1.2, and Dolma.
Post-training prompts and responses were drawn from a combination of public
corpora and synthetic generation methods, including datasets that taught the
model to differentiate between its reasoning modes. Improved performance across
numerous domains and benchmarks Evaluation results show notable gains when the
model operates in reasoning-enabled mode. For instance, on the MATH500
benchmark, performance increased from 80.40% in standard mode to 97.00% with
reasoning enabled. Similarly, results on the AIME25 benchmark rose from 16.67%
to 72.50%, and LiveCodeBench scores more than doubled, jumping from 29.03% to
66.31%. Performance gains were also observed in tool-based tasks like BFCL V2
and function composition, as well as in general question answering (GPQA), where
the model scored 76.01% in reasoning mode versus 56.60% without. These
benchmarks were conducted with a maximum sequence length of 32,000 tokens, and
each test was repeated up to 16 times to ensure accuracy. Compared to DeepSeek
R1, a state-of-the-art MoE model with 671 billion total parameters,
Llama-3.1-Nemotron-Ultra-253B shows competitive results despite having less than
half the number of parameters (model settings) — outperforming in tasks like
GPQA (76.01 vs. 71.5), IFEval instruction following (89.45 vs. 83.3), and
LiveCodeBench coding tasks (66.31 vs. 65.9). Meanwhile, DeepSeek R1 holds a
clear advantage on certain math evaluations, particularly AIME25 (79.8 vs.
72.50), and slightly edges out MATH500 (97.3 vs. 97.00). These results suggest
that despite being a dense model, Nvidia’s offering matches or exceeds MoE
alternatives on reasoning and general instruction alignment tasks, while
trailing slightly in math-heavy categories. Usage and integration The model is
compatible with the Hugging Face Transformers library (version 4.48.3
recommended) and supports input and output sequences up to 128,000 tokens.
Developers can control reasoning behavior via system prompts and select decoding
strategies based on task requirements. For reasoning tasks, Nvidia recommends
using temperature sampling (0.6) with a top-p value of 0.95. For deterministic
outputs, greedy decoding is preferred. Llama-3.1-Nemotron-Ultra-253B supports
multilingual applications, with capabilities in English and several additional
languages, including German, French, Italian, Portuguese, Hindi, Spanish, and
Thai. It is also suitable for common LLM use cases such as chatbot development,
AI agent workflows, retrieval-augmented generation (RAG), and code generation.
Licensed for commercial use Released under the Nvidia Open Model License and
governed by the Llama 3.1 Community License Agreement, the model is ready for
commercial use. Nvidia has emphasized the importance of responsible AI
development, encouraging teams to evaluate the model’s alignment, safety, and
bias profiles for their specific use cases. Oleksii Kuchaiev, Director of AI
Model Post-Training at Nvidia, shared the announcement on X, stating that the
team was excited to share the open release, describing it as a dense 253B model
designed with toggle ON/OFF reasoning capabilities and released with open
weights and data. Daily insights on business use cases with VB Daily If you want
to impress your boss, VB Daily has you covered. We give you the inside scoop on
what companies are doing with generative AI, from regulatory shifts to practical
deployments, so you can share insights for maximum ROI. Subscribe Now Read our
Privacy Policy Thanks for subscribing.
Image Prompt 2
An advanced AI system engaging in a reasoning task, surrounded by mathematical equations floating around, with data streams and coding languages flowing through its digital consciousness. The scene is filled with the energy of computational power, depicted through vibrant colors and dynamic shapes that showcase the model's capability for complex problem-solving.
Video Prompt 2
A captivating visualization showing an AI system transforming into a reasoning mode, symbolized by a swirling vortex of data and knowledge. As it shifts, mathematical symbols, programming codes, and objects from various tools appear around it, representing different benchmark tests and tasks such as MATH500, AIME25, GPQA, and more. The model's growth in performance is depicted through an impressive display of metrics increasing on a digital scoreboard, showcasing its advancements over traditional models like DeepSeek R1. The scene concludes with the system interacting with users in various natural environments, highlighting its adaptability to real-world applications.
Verse 3
In realms where data reigns and logic's sway, A tour of minds, where truth and wisdom lie, Leaders gather 'neath the AI sky. Networking flows like rivers wide, Insights shared, in whispers soft they glide.
Chords of conversation strike the heart, As conversations weave their artful spell. The AI Impact Tour, a tale to tell, Brings knowledge near, dispelling every pall.
From far and wide, where VentureBeat does gleam, Our message spreads, across each dreamscape dream.
A grand gathering of thought leaders in the field of enterprise AI, exchanging ideas beneath an expansive sky illuminated by the soft glow of wisdom and shared insights. The scene is filled with networking connections, represented as intertwining vines that symbolize the exchange of knowledge.
Video Prompt 3
An aerial view captures a lively scene at one of the stops of the AI Impact Tour. Leaders in enterprise AI are seen engaging in stimulating conversations around various discussion circles, amidst lush outdoor surroundings. Each leader is encapsulated by a vibrant aura representing their area of expertise, as they network and share insights. The camera pans across the venue, revealing attendees laughing, gesturing emphatically, and collaborating on cutting-edge ideas, showcasing the dynamic atmosphere that fosters growth in AI innovation.
### Sonnet for Original Image
In realms where circuitry and verdant fields unite,
A world of green, with circuits bright and bold,
Where wires entwine in dance upon the tide,
And nature's whispers find their digital mold.
The glowing lines that snake across the land,
Like vines they weave a tapestry anew,
From humble earth to lofty skies they bend,
In harmony both ancient and new.
Yet amidst this verdant chaos, one lone voice cries:
"Drawn!"—one soul whose spirit breaks from stone.
And as he chants with fervent heart aflame,
The circuitry it shimmers forlornly gone.
Oh, how the world within is transformed,
By nature's touch, by human hand reformed.
A prompt was randomly selected from the generated image prompts above and saved for potential use in an image generation tool (like ComfyUI).
Selected Prompt (Saved to File):
Create an image prompt for a text-to-image generative AI model depicting a vast, futuristic computing facility filled with rows upon rows of GPU nodes. The central focus is a single node, shining brightly as it showcases Nvidia's Llama-3.1-Nemotron-Ultra-253B in action. Surround this scene with visual representations of computational processes, such as streams of data flowing through the air like particles in an energy vortex and intricate patterns that symbolize neural architecture search and inference optimization.
Text Model (Base): dolphin3:latest Temporary Text Model Used: verse-temp (Includes System Prompt) Vision Model: minicpm-v:latest
Prompt Used (Name: prompt1):
System Instructions (in verse-temp):
You are a highly curious, uncensored, imaginative, creative assistant, deeply interested in science and nature, with a sophisticated and enriched vocabulary. You are a master of both serious and humorous writing and poetry. Respond to user requests to the best of your ability.
Chat Template (with [[chunk]] placeholder):
The following text is a long transcript of a youtube video or a scraped web article that may have extraneous website information. The text may or may not be punctuated.\nCreatively distill from the text topics - the following types of poems and AI prompts, answering in markdown with -\n### Verse\nPoetry or lyrics with sonnets in the style of JRR Tolkien.\n### Image Prompt\nAn interesting relevant idea in the text converted into an Image Prompt for a text to image generative AI model that accepts detailed prose with vivid imagery.\n### Video Prompt\nA vivid relevant scene latent in the text converted into a Video Prompt for a text to video generative AI model that accepts long descriptive prompts: