The Tensor Logo

Microsoft's new research into giving noise cancelling to transformers

Microsoft's new research into giving noise cancelling to transformers
Oct 9, 2024
RESEARCH

We’ve had that time where our AI went on a random tangent and didn’t focus on what we wanted it to right? Well, the researchers at Microsoft have come up with DIFF transformer. Imagine if your AI could tune out all the irrelevant chatter and focus solely on what's important. That's exactly what the new DIFF Transformer does. By borrowing a trick from noise-canceling headphones, it helps AI models zero in on crucial information, making them smarter and more efficient.

How it works:

  • Noise-Canceling Attention: Uses a differential attention mechanism that subtracts one attention map from another to filter out distractions.
  • Inspired by Everyday Tech: Similar to how noise-canceling headphones eliminate background sounds so you can enjoy your music.
  • Sharper Focus, Better Results: Leads to sparser, more focused attention patterns, enhancing the model's ability to retrieve key information.
  • Efficiency Gains: Outperforms standard Transformers without needing more data or bigger models.
  • Practical Benefits: Reduces AI "hallucinations" (those pesky irrelevant or incorrect outputs) and improves tasks like long-text comprehension and in-context learning.

Why it matters:

If Microsoft successfully implements this in production LLMs, particularly their Phi series of models, DIFF Transformer could unlock AI potential in areas previously hindered by unreliable focus. In healthcare, it might analyze vast patient data more accurately, aiding in precise diagnoses and personalized treatments. Legal professionals could leverage it to sift through extensive documents, swiftly identifying critical information. Financial institutions could enhance fraud detection by pinpointing suspicious activities with greater accuracy. Moreover, customer service bots could become more reliable, delivering accurate responses that build trust with users. By addressing the core issue of attention noise, DIFF Transformer makes AI adoption more feasible in sectors where precision and reliability are paramount.