• The Tensor
  • Posts
  • OpenAI announces o1, an AI that "thinks" before it speaks

OpenAI announces o1, an AI that "thinks" before it speaks

Google’s new tool creates podcasts on your notes and pdfs.

Welcome to today’s edition of The Tensor, get smarter with executive insights on the latest in AI & Tech Industry, 5 min reads, 3x a week.

Lot has happened in the last two days, In today's Tensor:

  • OpenAI’s new o1 : The AI that 💬 thinks before it speaks

  • Google’s new tool creates podcasts on your notes and pdfs.

  • DataGemma: Google’s attempt to solve hallucinations in LLMs

  • Quick Bytes: Adobe Firefly video, Mistral Visual Language Model, OpenAI white house meet, Deepmind 2 AI robotic systems

Read Time: 5 mins

OpenAI announces o1, an AI that “thinks” before it speaks

The scoop: OpenAI has released o1, a groundbreaking language model that elevates AI reasoning. Unlike GPT-4 or Claude, o1 doesn't just generate responses—it thinks through problems step by step, bringing us closer to machines capable of tackling complex, multi-step tasks with human-like logic. And it's blowing past benchmarks along the way.

The details:

  • Chain of Thought Technique: o1 uses a novel approach to break down intricate problems into smaller, manageable steps, much like how we doodle on a napkin to solve a tough puzzle.

  • Reinforcement Learning Boost: We’ve had faster AI with turbo, but this time OpenAI gave it’s AI a strategy coaching not just speed. Instead of just scaling up compute power while training, o1 focuses on thinking time through reinforcement learning—training the model to reason better, not just compute faster. Think of it as coaching an athlete to improve strategy, not just speed.

  • Benchmark Performance: In tests, o1 solved 83% of International Mathematics Olympiad problems, leaving previous leader, GPT-4's 13% in the dust. It’s smarter than PhDs in STEM questions and costs as much too.

  • Cost and Time Trade-offs: As the costs of top AI models has come down 10x in 18 months, o1 pushes it right back up. With o1’s advanced reasoning comes 3x more costs and 20-30s latency to respond. Developers will need to re-architect their systems to find where to integrate o1 as it won’t be a blanket replacement.

Why it matters: I believe with o1, we’re officially in Level 2 of AI according to OpenAI. This will have two major effects: First, AI apps which worked well are going to get supercharged and perform 10x better. Second, new applications of AI in healthcare, finance and more will be unlocked. Agentic capabilities will get much much better.

Bottom line: OpenAI has pushed the frontier again with o1, signaling a shift toward AI that doesn't just process information but reasons through it. Expect the industry to follow suit, with Google Gemini 2 and Claude 4. We’re in the level 2 era, where the your AI went from an intern who is siloed to a specific task to a senior executive who can reason broadly across disciplines. Check it out at ChatGPT. I’ll be testing it out and bringing more insights in our next editions

The scoop: Google labs’ NotebookLM tool now lets you turn documents into an engaging audio discussion with its new audio overview feature. Just upload your docs, news, whatever and within minutes it’ll generate realistic charismatic AI hosts that are having a dynamic podcast.

How it works:

  • Upload sources like Google Docs, PDFs, or web pages to NotebookLM.

  • Gemini analyses, creates a script which adds relevant commentary and insights as a podcaster.

  • Then google creates, two AI podcaster voices with emotion to act the script.

  • The result is an audio file you can download and listen. I tried it with the last edition and it created an amazing podcast about the Tensor. I’ll share that soon!

Why it matters: We’ve all heard about how AI can help with education, this takes your private AI tutor to the next level, and not just in education. It will make the barrier to create engaging audio content so low that we’re going to have with podcasts what we had with short-form video with TikTok.

Bottom line: While Audio Overview is still an experiment with some limitations, it represents a peek into the future towards what’s coming next in education (personal & corporate), podcasting and more. step towards more dynamic and personalized learning experiences. Who knows, your next audiobook might just be an AI-generated discussion about that 100-page company report you've been putting off.

The scoop: Since Google’s embarrassing launch of AI overviews with it recommending people to put glue on pizza and eat it few months ago, they have been focused on a way to ground their LLMs to avoid mistakes and hallucinations. It announced DataGemma, which are two new open models.

How it works:

  • DataGemma is tapping into Google’s Data Commons, the world’s largest knowledge graph packed with over 250 Billion data points from organisations like the UN and WHO.

  • Retrieval-Interleaved Generation (RIG): Proactively queries Data Commons during response generation and

  • Retrieval-Augmented Generation (RAG): Retrieves relevant context before generating responses.

Bottom Line: Since the big bang moment with ChatGPT in Nov 22, the biggest barrier to enterprise adoption has been unreliability, Google’s attempt with open sourcing these models built for RIG and RAG is to establish a future where you can actually trust AI. It’s fundamental to Google’s own search business which runs on users trusting it to find the correct information.

  • Adobe unveiled that it will bring generative AI video capabilities to its Firefly platform later this year, including Text-to-Video and Image-to-Video features in Premiere Pro

  • Mistral AI released Pixtral, an open-source multimodal AI model capable of understanding both text and images, based on their Mistral large language model

  • OpenAI CEO Sam Altman and other tech executives met with White House officials to discuss AI infrastructure needs, including data centers and energy requirements

  • Google DeepMind introduced two new AI robotics systems: ALOHA Unleashed for complex two-armed manipulation tasks, and DemoStart for improving real-world performance of multi-fingered robotic hands