
Anthropic expands Claude Sonnet 4’s context window to 1M tokens
With this bigger context window, Claude can course of codebases with 75,000+ strains of code in a single request. This permits it to higher perceive challenge structure, cross-file dependencies, and make ideas that match with the whole system design.
Longer context home windows at the moment are in beta on the Anthropic API and Amazon Bedrock, and can quickly be accessible in Google Cloud’s Vertex AI.
For prompts over 200K tokens, pricing will improve to $6 / million tokens (MTok) for enter and $22.50 / MTok for output. The pricing for requests below 200K tokens might be $3 / MTok for enter and $15 / MTok for output.
The corporate additionally prolonged its studying mode designed for college students into Claude.ai and Claude Code. Studying mode asks customers inquiries to information then by ideas as a substitute of offering fast solutions, to advertise important considering of issues.
OpenAI provides GPT-4o as a legacy mannequin in ChatGPT
With this replace, paid customers will now be capable to choose GPT-4o when utilizing ChatGPT, together with different fashions like o3, GPT-4.1, and GPT-5 Considering mini.
The mannequin picker for GPT-5 additionally now contains Auto, Quick, and Considering mode. Quick prioritizes giving the quickest solutions, considering prioritizes giving deeper solutions that take longer to assume by, and auto chooses between the 2.
The corporate additionally elevated the message restrict for Plus and Group customers to three,000 per week on GPT-5 Considering.
Google releases Gemma 3 270M
This new mannequin is “designed from the bottom up for task-specific fine-tuning with sturdy instruction-following and textual content structuring capabilities already skilled in,” in accordance with Google.
It’s best in conditions the place there’s a high-volume, well-defined activity; velocity and value issues; consumer privateness must be protected; or there’s a want for a fleet of specialised activity fashions.
Each pretrained and instruction tuned variations of the mannequin can be found for obtain from Hugging Face, Ollama, Kaggle, LM Studio, and Docker. Alternatively, the fashions will be tried out in Vertex AI.
NVIDIA releases newest fashions in Llama Nemotron household
Llama Nemotron are a household of reasoning fashions, and the most recent updates embrace a brand new hybrid mannequin structure, compact quantized fashions, and a configurable considering funds to provide builders extra management over token era.
This mixture lets the fashions cause extra deeply and reply quicker, without having extra time or computing energy. This implies higher outcomes at a decrease value,” the corporate wrote in an announcement.
Google’s coding agent Jules will get critique performance
Google is enhancing its AI coding agent, Jules, with new performance that evaluations and critiques code whereas Jules continues to be engaged on it.
“In a world of fast iteration, the critic strikes the overview to earlier within the course of and into the act of era itself. This implies the code you overview has already been interrogated, refined, and stress-tested … Nice builders don’t simply write code, they query it. And now, so does Jules,” Google wrote in a weblog submit.
In accordance with the corporate, the coding critic is sort of a peer reviewer who’s aware of code high quality ideas and is “unafraid to level out whenever you’ve reinvented a dangerous wheel.”
GitHub to be folded into Microsoft’s CoreAI org
GitHub’s CEO Thomas Dohmke has introduced his plans to depart the corporate on the finish of the 12 months.
In a memo to workers, he mentioned that Microsoft doesn’t plan to exchange him; reasonably, GitHub and its management workforce will now function below Microsoft’s CoreAI group, a bunch inside the firm targeted on creating AI-powered instruments, together with GitHub Copilot.
“At present, GitHub Copilot is the chief of essentially the most profitable and thriving market within the age of AI, with over 20 million customers and counting,” he wrote. “We did this by innovating forward of the curve and displaying grit and willpower when challenged by the disruptors in our house. In simply the final 12 months, GitHub Copilot turned the primary multi-model resolution at Microsoft, in partnership with Anthropic, Google, and OpenAI. We enabled Copilot Free for tens of millions and launched the synchronous agent mode in VS Code in addition to the asynchronous coding agent native to GitHub.”
Sentry launches MCP monitoring software
Software monitoring firm Sentry is making it simpler to achieve visibility into MCP servers with the launch of a brand new monitoring software.
With MCP monitoring, builders can perceive issues like which shoppers are experiencing errors, which instruments are most used, or which instruments are working sluggish. They will additionally correlate errors with occasions like visitors spikes or new launch deployments, or determine if errors are solely taking place on one sort of transport.
In accordance with Cody De Arkland, head of developer expertise at Sentry, when Sentry launched its personal MCP server, it was getting over 30 million requests monthly. He mentioned that at that scale, it’s inevitable that errors will happen, and present monitoring instruments have been battling MCP servers.
bitHuman launches SDK for creating AI avatars
AI firm bitHuman has introduced a visible SDK for creating avatars to be used as chat brokers, instructors, digital coaches, companions, and specialists in numerous fields.
In accordance with the corporate, the SDK permits avatars to be created on Arm-based and x86 methods with no GPU. The avatars have a small footprint and will be run on-line or offline on units like Chromebooks, Mac Minis, and Raspberry Pis.
Due to their small footprint, these characters will be delivered to a variety of environments, together with school rooms, kiosks, cell apps, or edge units.
Learn final week’s updates right here: This week in AI dev instruments: GPT-5, Claude Opus 4.1, and extra (August 8, 2025)