Context Window

The Better AI Gets, The More It Needs Us

Hunting rare data and teaching models—the human expertise that's more valuable than ever

Was this newsletter forwarded to you? Sign up to get it in your inbox.

This past week, a federal judge ruled in favor of Anthropic in a copyright case contested by five authors. At the center of the case was a creative, almost analog act: Anthropic purchased millions of physical books, ripped off their bindings, scanned each page, and used the resulting digital files to train its AI models. In a summary judgment, the court called this act "transformative" and ruled that it was protected under the principle of fair use.

While explaining his rationale, Judge William Alsup said, "They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable." (The ruling didn't greenlight everything Anthropic did. The court took issue with another set of books: pirated files, downloaded en masse and stored in Anthropic's systems even though the company decided not to train on them, and that part of the case will go to trial.)

This case underscores that data is the nexus of AI's value. But once the data is in hand, the real work begins—making it useful for LLMs as they take on increasingly complex tasks.

Teaching AI to hunt: New methods of reinforcement learning

One way to do that is reinforcement learning. In simplistic terms, reinforcement learning (RL) is like training a puppy: The model tries different actions, and you reward it for good ones and not for bad ones, Over time it figures out which actions get the most rewards, and does more of that.

Make email your superpower

Not all emails are created equal—so why does our inbox treat them all the same? Cora is the most human way to email, turning your inbox into a story so you can focus on what matters and getting stuff done instead of on managing your inbox. Cora drafts responses to emails you need to respond to and briefs the rest.

Try Cora today

Want to sponsor Every? Click here.

Machine learning researcher Nathan Lambert has found that OpenAI's reasoning model o3 is incredible for search. In particular, Lambert noted its relentlessness in finding an obscure piece of information, comparing it to a "trained hunting dog on the scent." This is a big deal in RL, where models are known to give up quickly if a tool—in this case, the search engine that the model is accessing—isn't immediately helpful. According to Lambert, o3's persistence suggests that OpenAI has figured out how to get AI not to quit prematurely, turning it into a more effective learner.

Meanwhile, at Japanese research lab Sakana AI, a team is rethinking how to train AI through reinforcement learning entirely. Instead of traditional RL methods that reward models for their ability to solve problems, Sakana is training models to teach. The models are given problems—along with the correct solution—and evaluated on their ability to explain the solution in a clear, helpful way. If you can train small, efficient models to teach well, you can use them to educate larger, more capable models much faster and cheaper than before. And long-term, you might even get models that teach themselves.

Why setting the stage is everything

While RL shapes how models learn, another development is augmenting how we use them effectively: context engineering...

Become a paid subscriber to Every to unlock this piece and get the AI news of the week, including:

Context engineering emerges as the critical framework beyond prompting
Human expertise remains the bottleneck for truly exceptional AI
Governmental experimentation with AI, a new study, and two new tools for developers

Upgrade to paid