Deep thoughts: AI shows its math chops
Google's Gemini is the first AI system to win gold at the International Mathematical Olympiad
Complex math hasn't always been AI's strongest suit, but the technology showcased its progress at one of the world's premiere competitions, said Cade Metz in The New York Times. A Google Deep-Mind system became the first machine to achieve "gold medal" status at the annual International Mathematical Olympiad (IMO) last month. OpenAI said that its AI "achieved a similar score on this year's questions, though it did not officially enter the competition." The results had experts buzzing. Last year, Google's AI took home a silver medal, but it required several days to complete the test and "could only answer questions after human experts translated them into computer programming language." This time, Google's chatbot, Gemini Deep Think, "read and responded to questions in English" and used techniques that scientists are calling "reasoning" to solve the problems. Google says it took the chatbot the same amount of time to finish the test as the human participants: 4.5 hours.
"They still got beat by the world's brightest teenagers," said Ben Cohen in The Wall Street Journal. Twenty-six human highschool students outscored the computers, which managed to answer five of the six increasingly difficult problems correctly. The one that tripped up the AI was a brainteaser in the "notoriously tricky field of combinatorics" dealing with counting and arranging objects. The solution required "ingenuity, creativity, and intuition," qualities only humans (so far) can muster. "I would actually be a bit scared if the AI models could do stuff on Problem 6," said one of the gold medalists, Qiao Zhang, a 17-year-old on his way to MIT. AI models struggle with math more than you might think, said Andrew Paul in Popular Science. The reason has to do with how they process prompts. AI models "break the words and letters down into 'tokens,' then parse and predict an appropriate response," whereas humans can simply process them as complete thoughts.
That's why Google trained this Gemini version differently, said Ryan Whitwam in Ars Technica. The IMO's scoring is "based on showing your work," so to succeed the AI had to consider and analyze "every step on the way to an answer." Google gave Gemini a set of solutions to math problems, then added more general tips on how to approach them. The company's scientists were especially proud of one Gemini solution that used relatively simple math, working from basic principles, while most human competitors relied on "graduate-level" concepts. This may be just a 45-way tie for 27th place in a youth math contest, but it's rightly being hailed as significant, said Krystal Hu in Reuters. It "directly challenges the long-held skepticism that AI models are just clever mimics." Math problems that require multistep proofs have "become the ultimate test of reasoning, and AI just passed."
The Week
Escape your echo chamber. Get the facts behind the news, plus analysis from multiple perspectives.
Sign up for The Week's Free Newsletters
From our morning news briefing to a weekly Good News Newsletter, get the best of The Week delivered directly to your inbox.
From our morning news briefing to a weekly Good News Newsletter, get the best of The Week delivered directly to your inbox.
A free daily email with the biggest news stories of the day – and the best features from TheWeek.com
-
Will Peter Mandelson and Andrew testify to US Congress?Today's Big Question Could political pressure overcome legal obstacles and force either man to give evidence over their relationship with Jeffrey Epstein?
-
Moltbook: the AI social media platform with no humans allowedThe Explainer From ‘gripes’ about human programmers to creating new religions, the new AI-only network could bring us closer to the point of ‘singularity’
-
Metal-based compounds may be the future of antibioticsUnder the radar Robots can help develop them
-
Claude Code: Anthropic’s wildly popular AI coding appThe Explainer Engineers and noncoders alike are helping the app go viral
-
Will regulators put a stop to Grok’s deepfake porn images of real people?Today’s Big Question Users command AI chatbot to undress pictures of women and children
-
Most data centers are being built in the wrong climateThe explainer Data centers require substantial water and energy. But certain locations are more strained than others, mainly due to rising temperatures.
-
The dark side of how kids are using AIUnder the Radar Chatbots have become places where children ‘talk about violence, explore romantic or sexual roleplay, and seek advice when no adult is watching’
-
Why 2025 was a pivotal year for AITalking Point The ‘hype’ and ‘hopes’ around artificial intelligence are ‘like nothing the world has seen before’
-
What is Roomba’s legacy after iRobot bankruptcy?In the Spotlight Tariffs and cheaper rivals have displaced the innovative robot company
-
AI griefbots create a computerized afterlifeUnder the Radar Some say the machines help people mourn; others are skeptical
-
Metaverse: Zuckerberg quits his virtual obsessionFeature The tech mogul’s vision for virtual worlds inhabited by millions of users was clearly a flop