Ornith 1.0: This is new class of self-improving model

Post Content

Ornith 1.0: This is new class of self-improving model [[{“value”:”Ornith 1: Open-Weight Agentic Coding Models That Write Their Own Harnesses (and Beat Bigger Models)

In this video: I break down Ornith 1, a new family of open-weight models built for agentic coding that can outperform much larger models on benchmarks like Terminal Bench, with the 397B model nearing closed-source performance (Opus 4.8). The key idea isn’t just scores—it’s how Ornith is trained to generate both solution rollouts and a task-specific harness (memory, retries, error handling) in a single loop, using reinforcement learning (GRPO) so rewards update both the solution and the scaffold. I cover reward-hacking risks and the three-layer defenses: locked boundaries, deterministic monitoring, and a frozen judge model. I also share my own Ollama tests on an M2 Max comparing Qwen 3.5 9B base vs Ornith 1 9B (8-bit): similar accuracy, but Ornith is ~3× cheaper (up to 20× on some tasks), while long-horizon “honesty under pressure” seems to require larger scale (35B+).

LINKS:
https://deep-reinforce.com/ornith_1_0.html
https://huggingface.co/collections/deepreinforce-ai/ornith-10

My voice to text App: whryte.com
Website: https://engineerprompt.ai/
RAG Beyond Basics Course:
https://prompt-s-site.thinkific.com/courses/rag
Signup for Newsletter, localgpt:
https://tally.so/r/3y9bb0

Let’s Connect:
🦾 Discord: https://discord.com/invite/t4eYQRUcXB
☕ Buy me a Coffee: https://ko-fi.com/promptengineering
|🔴 Patreon: https://www.patreon.com/PromptEngineering
💼Consulting: https://calendly.com/engineerprompt/consulting-call
📧 Business Contact: engineerprompt@gmail.com
Become Member: http://tinyurl.com/y5h28s6h

💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off).

Signup for Newsletter, localgpt:
https://tally.so/r/3y9bb0

00:00 Ornith Models Overview
02:46 Self Written Harnesses
04:31 Reward Hacking Risks
05:30 Three Layer Defenses
06:17 My Ollama Test Setup
07:40 Private Bench Results
08:33 Long Horizon Honesty Test
09:53 Key Takeaways on 9B
10:42 Caveats”}]] Read More Prompt Engineering

#Promptengineering #AI

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Ornith 1.0: This is new class of self-improving model

Byali

By ali

Related Post

Advent Accelerates Deals with ChatGPT + Codex

DeepSeek’s New Trick Makes LLMs 85% Faster

Builders Unscripted: Ep. 4 – Pietro Schirano

Leave a Reply Cancel reply

You missed

20 Linux Commands That Turn Your Terminal Into a Loaded Weapon

Why Being in the Docker Group Is a Backdoor to Your Whole System

Eski Bir Kindle’ı Low-Power E-Ink Dashboard’a Çevirmek

TryHackMe — Simple CTF: The Note That Gave Everything Away

Alicloud.my.id