Spotify’s acquisition of Sonantic in June 2022 gave it a strategic hold on text-to-speech (TTS) technology. After the acquisition, Spotify received three U.S. patents for text-to-speech (TTS) technology. One patent US12046226B2 became even more significant, helping Spotify speed up the launch of AI DJ.
As it turns out, the examiner cited this patent during the prosecution of Spotify’s patent application, GB2612624A.
Acquiring Sonantic’s patent portfolio lowered Spotify’s licensing and legal risks, reducing its dependency on third-party IP. At the same time, it created a new compliance challenge for companies developing AI-driven TTS solutions, such as Amazon (AWS Polly), Google AI, Microsoft, Apple, BlueLabel, Synthesia, and ThirdEye Data Inc.
These companies must analyze their TTS products for overlaps to avoid potential infringement risks.
This article explores how this acquisition gave Spotify a legal and competitive advantage, its implication in the AI voice industry, and what companies must consider when navigating the patent landscape in expressive speech synthesis.
Patent Strength & Threat Analysis
The “Text-to-Speech Synthesis Method and System” patent focuses on training a text-to-speech (TTS) synthesis system. This helps generate expressive speech that conveys emotional information and sounds natural and human-like.
Legal Precedent: The examiner cited Sonantic’s prior patent (US’266) when evaluating Spotify’s GB2612624A—reinforcing its originality and legal standing.
The features highlighted in the patent enable Spotify to deliver more immersive and emotionally engaging speech synthesis. Therefore differentiating its AI-driven voice technology from competitors.
Key features & their legal implications
Feature | Function | Patent Risk & Infringement Threat |
Text Reception | The system receives textual input. | Companies using similar text input processing in speech synthesis may risk infringement if their technology overlaps with these claims. |
Prediction Network Processing | The input text is fed into a neural network to generate speech data. | AI-driven speech synthesis companies employing neural network-based text-to-speech processing could face licensing or litigation risks. |
Expressivity Score Integration | The system assigns an expressivity score to audio samples, reflecting emotional nuances. | Systems incorporating expressivity scoring for enhanced speech synthesis without licensing could be vulnerable to assertion claims. |
Progressive Training with Sub-Datasets | The neural network is trained in phases, with increasingly expressive sub-datasets to refine speech quality. | Competitors using structured, multi-stage training methodologies may need to assess potential licensing needs. |
Enhanced Naturalness | The method ensures that the synthesized speech sounds more human-like and emotionally rich. | Any digital assistant or AI-driven voice solution improving expressiveness in speech could fall under the scope of this patent. |
A detailed claim analysis can help you navigate the competitive patent landscape, identify high-risk overlaps, and safeguard your AI innovations. Contact our IP experts for a comprehensive risk assessment and strategic IP insights.

Competitive patent landscape of text-to-speech technology
The table below lists several patents related to expressive TTS technologies, each varying in scope and technical claims.
Patent | Key Functional Overlaps | Distinctive Technical Scope |
US6XXXXXXB1 | Uses machine learning models for emotional modulation. | Focuses more on style adaptation rather than progressive training methodologies. |
US2XXXXXXA1 | Incorporates expressivity scoring to refine TTS output. | Lacks a structured multi-phase training approach as detailed in US12046226B2. |
US2XXXXXXA1 | Enhances synthesized speech expressivity. | Focuses on direct audio modulation rather than structured training methods. |
US9XXXXXXB2 | Incorporates multiple speech styles for diverse outputs. | Lacks expressivity scoring and multi-stage dataset training. |
US7XXXXXXB2 | Focuses on expressivity conveyance. | Lacks hierarchical training approaches. |
US2XXXXXXA1 | Modifies synthesized speech based on context. | Lacks structured expressivity scoring. |
US1XXXXXXB2 | Enhances speech through emotional expressiveness. | Does not include expressivity scoring-based training. |
Patent enforcement & licensing risk
Given the specificity of US12046226B2 in training methodology and expressivity scoring, companies leveraging similar TTS enhancements may be at risk of infringement. Below is a competitive analysis of products implementing related TTS technologies:

Find if your Text-to-Speech solutions risk litigation. GreyB’s IP experts specialize in patent acquisition strategies, licensing negotiations, and competitive FTO analysis. Contact us to secure your patent portfolio and mitigate legal risks.

Conclusion
In 2023, the text-to-speech market was valued at approximately USD 3.6 billion. It is projected to reach USD 14.6 billion by 2033.
This growth comes from the rising adoption of TTS applications across various sectors, including entertainment, education, and customer service. Generating natural, human-like speech enhances user engagement and accessibility, making it a crucial part of the user experience.
Expanding the AI voice technology domain can be cheaper if you acquire patents. This option is often better than licensing or creating everything from scratch.
A well-planned patent acquisition study will help you evaluate your options by identifying:
- Existing patents that could reduce long-term licensing costs.
- Emerging innovations that align with your product roadmap.
- Potential risks from competitors holding key IP.
Want to see what high-value acquisition opportunities are out there?
Contact our experts to find the right patents to strengthen your innovation strategy.
Authored By: Rohit Sood, Kshitij Katoch, and Rahul Mahajan, Patent Infringement Team