Updated Apr 23

Nari Labs Takes on the Giants

Dia: The Open-Source TTS Model Shaking Up the AI Audio Space

Dia, a revolutionary open‑source text‑to‑speech model by Nari Labs, aims to dethrone industry leaders like ElevenLabs and OpenAI. This 1.6 billion parameter powerhouse focuses on natural dialogue, emotional shifts, and impeccable non‑verbal cues. With superior quality and accessibility via GitHub and Hugging Face, Dia is set to democratize TTS technology. However, its high computational needs and ethical considerations keep conversations buzzing.

Introduction to Dia: A New Open‑Source TTS Model

A new contender in the realm of text‑to‑speech (TTS) technologies has emerged with the introduction of Dia by Nari Labs. As an open‑source TTS model, Dia aims to challenge industry giants such as ElevenLabs and OpenAI by offering unprecedented features designed to enhance the naturalness of dialogue generated from text. Leveraging 1.6 billion parameters, Dia is not merely focused on converting text to speech but seeks to amplify the expressiveness of spoken word by integrating emotional tones, precise speaker tagging, and non‑verbal cues. This ambitious project underscores a shift towards more advanced and interactive TTS systems that can simulate human‑like conversations with greater authenticity. ¹

Dia's emergence signals a significant advance in open‑source TTS technologies, reflecting the high stakes of staying competitive in a rapidly evolving field. Unlike its predecessors and competitors, Dia promises superior quality in voice synthesis, potentially raising the bar for what users can expect in terms of clarity and realism. The model allows for intricate manipulations of speech patterns, offering users the ability to not only create dialogues but also refine them with emotional depth and rhythmical sophistication. Its comprehensive approach to narrative flow and engagement puts it in direct competition with proprietary systems, yet it remains freely accessible, underscoring Nari Labs' commitment to democratizing AI‑driven voice technologies. Explore the developments in TTS models.¹

Available for download and integration via both Hugging Face and GitHub, Dia provides an open platform for developers and researchers passionate about TTS technology. Its release under the Apache 2.0 license not only facilitates broader accessibility but also encourages community‑driven innovation. Developers can implement Dia in their own projects using well‑documented guidelines, while contributing to its continuous improvement through a collaborative ecosystem. Whether deployed locally or accessed through Hugging Face Spaces for a more interactive user experience, Dia represents a step forward in making TTS tools both powerful and publicly available. Dive into Dia's technical specifics on.¹

Comparing Dia with ElevenLabs and Other Competitors

Dia, a pioneering text‑to‑speech (TTS) model by Nari Labs, is setting new benchmarks in a field previously dominated by giants such as ElevenLabs and OpenAI. Designed as an open‑source solution, Dia differentiates itself through a focus on producing more naturalistic dialogue from text prompts. Its advanced capabilities allow it to seamlessly incorporate emotional tones, speaker tags, and handle non‑verbal cues with exceptional grace. This not only allows for more engaging and believable interactions but also sets Dia apart as a formidable competitor to established players. The model’s availability for free download and local deployment under the Apache 2.0 license on platforms like Hugging Face and GitHub makes it highly accessible, further challenging existing models that are either proprietary or less versatile in deployment [8](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/).

While ElevenLabs has been lauded for its high‑fidelity voice cloning and nuanced TTS capabilities, Dia’s competitive edge lies in its ability to extend a speaker’s voice style into new lines using audio prompts, making it highly adaptive and ideal for dynamic interactions. Where ElevenLabs and other models may falter in maintaining conversational flow and handling rhythmically complex content, Dia excels, offering smoother transitions and tone shifts. This gives it a significant advantage in applications requiring fluid and coherent dialogue generation. As user demands in the TTS space grow more sophisticated, the inherent flexibility and capabilities of Dia confer a distinct advantage, offering developers and users a tool that not only matches but exceeds the highest industry standards in several key performance areas [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/).

Another critical aspect where Dia shines is the transparency and community‑oriented nature of its development. With robust support from entities like Google TPU Research Cloud and the Hugging Face’s ZeroGPU grant program, Dia's development has been a concerted effort towards democratizing access to TTS technology. The open‑source nature promotes community‑driven innovation and collaboration, reducing barriers to entry that often exist in the high‑tech AI sector. This democratization is essential for fostering new development ecosystems and ensuring that technological advancement is inclusive and widespread [5](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/).

Despite these advantages, Dia's introduction is not without challenges. The model’s high computational requirements mean that users without access to robust hardware may find it challenging to utilize its full potential. Reports of bugs and unusual outputs also indicate that while promising, the model requires further refinement to enhance reliability. However, Nari Labs demonstrates a proactive approach by inviting community contributions via platforms like Discord and GitHub, ensuring continuous improvement and innovation [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/). Additionally, with increasing concerns over the ethical implications of open‑source voice technology, Nari Labs’ commitment to prohibiting misuse underlines their dedication to responsible AI development [8](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/).

Accessing and Utilizing Dia for Text‑to‑Speech

Accessing Dia, the groundbreaking open‑source tool developed by Nari Labs, opens up new avenues for text‑to‑speech applications. Dia is designed to cater to users looking for a naturalistic dialogue experience from text prompts. The model, equipped with 1.6 billion parameters, stands out by integrating emotional tone adjustments, speaker tagging, and non‑verbal cue processing, offering an audio experience that closely mimics real‑life conversation. As highlighted in,¹ Dia aims to surpass even the leading models from ElevenLabs and OpenAI in both quality and flexibility.

For those interested in deploying Dia, it's readily accessible for download from widely‑used platforms like Hugging Face and GitHub, thanks to its release under the Apache 2.0 license. This availability makes it possible for developers and enthusiasts to trial the model without significant financial investment. According to VentureBeat, the code and model weights can be downloaded freely, allowing for local deployment. Furthermore, a user‑friendly demo is accessible on Hugging Face Spaces, providing a platform for seamless experimentation even before opting to deploy locally.

To utilize Dia efficiently, certain system requirements must be met, including PyTorch 2.0 or higher, CUDA 12.6, and a GPU with at least 10GB of VRAM. These specifications ensure the model runs smoothly, allowing users to take full advantage of its capabilities in handling complex audio and dialogue generation tasks. The setup prerequisites, detailed in,¹ underline the need for powerful hardware in ensuring optimal performance.

The innovative capabilities of Dia extend beyond traditional text‑to‑speech functionalities. Nari Labs has emphasized advancing the user's control over dialogues, particularly with features that enhance conversational flow by interpreting non‑verbal cues. This aspect is particularly compelling for industries focusing on interactive media and customer engagement, as detailed in articles such as.¹ With Dia, users can expect unprecedented levels of nuance and interaction in their audio outputs, making it a versatile tool in dynamic environments.

Licensing and Usage Restrictions of Dia

Dia, developed by Nari Labs, emerges as a significant player in the text‑to‑speech (TTS) landscape, particularly in terms of licensing and usage restrictions. Released under the Apache 2.0 license, Dia allows for commercial use, providing businesses with the flexibility to integrate this cutting‑edge technology into a variety of applications. However, this open licensing comes with notable conditions aimed at preventing misuse. Nari Labs explicitly prohibits using Dia for purposes like impersonation, misinformation, or any illegal activities, reflecting a strong stance on ethical use. ¹

The availability of Dia under the Apache 2.0 license is a strategic move by Nari Labs, aiming to balance open innovation with responsible technology deployment. This licensing choice not only reduces barriers for developers seeking advanced TTS solutions but also promotes wider adoption by virtue of its permissive nature. Nevertheless, by enforcing strict usage restrictions, Nari Labs seeks to mitigate the risks associated with open‑source technology, such as unauthorized voice cloning and fraudulent activities. These measures underline the company's commitment to fostering innovation while safeguarding against potential ethical and legal pitfalls. ¹

Nari Labs’ proactive approach to licensing Dia reflects an awareness of both the opportunities and challenges presented by advanced TTS models. By allowing commercial use under the Apache 2.0 license, Dia is poised to democratize access to sophisticated voice synthesis technology, encouraging developers and businesses to leverage its capabilities in novel ways. However, the implementation of usage restrictions underscores the necessity of balancing innovation with ethical responsibility, particularly in preventing nefarious uses such as identity theft and the spread of disinformation. ¹

The choice to make Dia available as an open‑source model under a permissive license represents Nari Labs' dedication to community engagement and collaborative development. However, the accompanying usage restrictions highlight the company's awareness of the risks involved in voice technology's rapid proliferation. By restricting unethical applications, such as impersonation and deceptive practices, Nari Labs aims to create a safeguard that not only protects the integrity of the technology but also aligns with broader societal and legal standards. These efforts are pivotal in ensuring that as TTS technology advances, it does so responsibly and ethically. ¹

Future Developments: A Consumer‑Friendly Version of Dia

The prospect of a consumer‑friendly version of Dia is an exciting development for enthusiasts and everyday users alike. As Nari Labs explores the possibility of making this advanced text‑to‑speech (TTS) model more accessible, it could revolutionize how users interact with voice technologies. A simplified version of Dia could allow users to easily remix and share audio, opening up new possibilities for creative expression and interaction. The consumer version is reportedly aimed at casual users, which indicates an intent to lower the barriers to entry for sophisticated TTS interactions. By offering such a user‑friendly portal, Nari Labs is positioning itself to engage a broader audience, making advanced TTS capabilities available for educational, entertainment, and personal use purposes. More so, by joining the waitlist for early access, interested users can stay ahead in experiencing these forthcoming innovations firsthand. ¹

Developing a consumer‑friendly version of Dia addresses a significant market need for accessible voice synthesis tools. Currently, most state‑of‑the‑art TTS models require substantial computational power, limiting their usability to a smaller, tech‑savvy audience. By contrast, a consumer‑tailored iteration of Dia could democratize these tools, making advanced speech synthesis accessible to individuals without high‑end hardware. This aligns with Dia's open‑source ethos, promoting wider participation in the digital dialogue space. Such a move reflects a broader trend towards user‑centric AI developments, where the focus lies on enhancing utility and personalization. Particularly in an era characterized by rapid advancements in natural language processing and AI, Nari Labs' initiative is likely to catalyze further innovation and adoption in the TTS field, thereby shaping the competitiveness of the digital voice market. ¹

Moreover, the consumer version of Dia could have educational and social implications. For educational settings, this version can offer students and educators enhanced tools for learning and teaching languages. The ability to generate clear, expressive speech could support language learners or those with learning disabilities by providing auditory material that is both engaging and accessible. Socially, bringing advanced TTS systems into everyday use might foster a more inclusive environment, particularly for individuals who rely on voice generation due to physical disabilities. As user interaction with AI becomes increasingly vocal, the consumer version of Dia stands to influence not just technological landscapes but societal ones as well. Integrating TTS into daily applications, from reading news articles aloud to assisting visually impaired users or crafting custom voice experiences, highlights the potential impact of broadening access to such technology. ¹

Behind the Scenes: Who is Nari Labs?

Nari Labs, a rising star in the technology arena, is the powerhouse behind Dia, an innovative open‑source text‑to‑speech model. Founded by two enterprising individuals, the startup draws upon their deep well of expertise to develop solutions that push the boundaries of text‑to‑speech technology [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/). Despite its small size, Nari Labs has earned support from significant industry players, such as the Google TPU Research Cloud and Hugging Face’s ZeroGPU grant program, reflecting a robust endorsement of its vision and potential in the AI field [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/).

In its quest to outpace competitors, Nari Labs builds upon foundational technologies like SoundStorm, Parakeet, and Descript Audio Codec, layering innovation on top of proven frameworks [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/). This strategic approach not only enhances the model's capabilities but also shows Nari Labs’ commitment to crafting technologies that are both evolutionary and revolutionary. Their collaborative spirit shines through in their active encouragement of community contributions via platforms like Discord and GitHub, inviting both seasoned developers and newcomers to join them in refining and advancing Dia [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/).

Nari Labs' work is a testament to their passion for democratizing technology. By releasing Dia under the Apache 2.0 license, they enable wide accessibility, paving the way for rapid community‑driven improvements and innovations. Notably, Dia is crafted to outperform notable industry players like ElevenLabs and OpenAI by focusing on the subtleties of natural dialogue, including emotional tone and non‑verbal cues [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/). As they continue this journey, Nari Labs is mindful of maintaining responsible use, explicitly prohibiting exploitative applications like misinformation and impersonation, thus aligning ethical guidelines with technological advancement [0](https://venturebeat.com/ai/a‑new‑open‑source‑text‑to‑speech‑model‑called‑dia‑has‑arrived‑to‑challenge‑elevenlabs‑openai‑and‑more/).

The Impact of Dia in the Open‑Source TTS Landscape

Dia, the latest open‑source text‑to‑speech (TTS) model introduced by Nari Labs, is set to make significant waves in the open‑source TTS landscape. With 1.6 billion parameters, Dia seeks to challenge leading models like those from ElevenLabs and OpenAI.¹ This ambitious model excels in producing highly naturalistic dialogue, something that is crucial in applications demanding emotional intelligence and nuanced verbal interactions. The ability of Dia to incorporate emotional tones, speaker tagging, and handle non‑verbal cues with superior precision sets it apart as a robust tool for a diverse range of real‑world applications.

Open‑source models like Dia are pivotal in democratizing access to cutting‑edge technology. By being freely available on platforms such as Hugging Face and GitHub under the Apache 2.0 license, Dia not only challenges proprietary models but also invites community engagement and development. This accessibility supports a broader adoption across various fields, enabling smaller developers and enterprises to integrate advanced TTS capabilities without prohibitive costs.¹ Furthermore, by promising a consumer‑friendly version in the future, Nari Labs aims to cater to a wider audience, thereby expanding the usability of TTS systems beyond traditional industrial applications.

The competitive landscape in the AI voice sector is fiercely growing, with the global market projected to reach $5.4 billion in 2024. Dia's introduction contributes to this dynamic by not only elevating technical standards but also emphasizing ethical considerations in development.¹ As the sector expands, models like Dia underscore the necessity for responsible use, integrating guidelines against unethical uses such as misinformation, impersonation, and more. Nari Labs' clear stance on these issues, including banning such uses of Dia, highlights the importance of aligning technological progress with ethical frameworks.

Experts and the public alike are responding to Dia with a mix of enthusiasm and caution. The promise of superior audio quality and easier access to high‑performance TTS technology is balanced by concerns over the model's high computational demands and potential for misuse. Users appreciate its open‑source availability, which not only allows for widespread innovation but also opens discussions on the ethical implications of voice cloning capabilities.¹ These discussions are critical as voice synthesis technology becomes more pervasive, requiring ongoing dialogue and adaptation of legal and social frameworks to harness its benefits while mitigating risks.

Competitive Market Analysis: The Growing Voice AI Sector

The voice AI sector is rapidly evolving, driven by cutting‑edge technologies and innovative models like Dia from Nari Labs. As a 1.6 billion parameter open‑source text‑to‑speech model, Dia directly challenges established players such as ElevenLabs and OpenAI by prioritizing naturalistic dialogue and incorporating emotional tone and speaker tagging. This emphasis on conversational flow is a testament to the growing importance of lifelike interaction in voice technology, a trend that shows no signs of abating. The availability of Dia via platforms like Hugging Face and GitHub under the Apache 2.0 license underscores a broader industry shift towards open‑source solutions, enabling community‑driven development that enriches the competitive landscape. More about Dia's introduction and its impact can be understood.¹

Ethical Concerns in Voice Cloning Technology

The rapid advancement in voice cloning technology has ushered in a new era of potential and peril, reshaping industries from entertainment to customer service. However, it also poses substantial ethical dilemmas that demand careful scrutiny and action. As technology like Nari Labs' Dia grows more sophisticated, the lines between genuine and artificial voices blur, raising critical issues of consent and authenticity. The ability to replicate someone’s voice without their explicit permission can lead to privacy violations and even reputational damage, as these cloned voices could be misused for deceitful purposes such as fraud and identity theft. Thus, safeguarding personal data and implementing stringent consent protocols become paramount to preventing misuse. The open‑source nature of Dia compounds these challenges by making cutting‑edge voice synthesis tools widely available, necessitating a collective responsibility towards ethical usage.

With Dia's release under the Apache 2.0 license, Nari Labs has explicitly prohibited using the model for misinformation and impersonation. However, the ethical concerns associated with voice cloning extend beyond the prohibition of specific actions. Developers and users must adopt a framework that includes ethical guidelines addressing potential misuse. For instance, incorporating watermarks or detectable digital fingerprints could help verify the authenticity of audio files. Furthermore, AI detection tools might serve as frontline defenses against malicious uses of synthetic voice technology. As the technology evolves, so too must the legal frameworks that govern such innovations, ensuring that they are equipped to tackle new forms of voice impersonation and digital deception. It is this intersection of technology, law, and ethics that will guide the responsible evolution of voice cloning.

Despite the complex ethical landscape, the advancements represented by models like Dia also provide immense potential benefits. These include advancing accessibility, such as aiding individuals with disabilities, and enhancing user experiences in digital communications. By democratizing access to high‑fidelity voice synthesis, Dia opens new avenues for creativity and innovation, allowing both consumers and creators to reimagine digital storytelling and interactive media. However, these benefits must be balanced with robust safeguards to protect against unintended consequences. The challenge lies in fostering an environment that encourages innovation while maintaining a vigilant stance against ethical breaches. This dual approach will be essential in ensuring that voice cloning technologies can be both transformative and responsibly managed in society.

Expert Opinions: Dia's Innovations and Challenges

Dia, developed by Nari Labs, stands out in the crowded field of text‑to‑speech models due to its innovative capabilities, yet it faces several challenges. Experts in the field have recognized Dia’s superiority in handling non‑verbal cues and emotional transitions, setting a new standard for natural‑sounding dialogue . Despite its strengths, the model's computational demands present a significant barrier, limiting accessibility for users lacking high‑performance hardware . Additionally, reports of bugs and unexpected outputs indicate that further refinement is necessary to enhance its reliability and usability .

Moreover, Dia's open‑source nature signifies a pivotal shift in the text‑to‑speech industry by potentially reducing the cost of high‑quality voice synthesis and sparking further innovations . This democratizes access to advanced TTS technology, yet it also brings ethical issues to the forefront, particularly concerning possible misuse in generating misleading or inappropriate speech.¹ Experts suggest that safeguards and tighter regulations are crucial to mitigate such risks, maintaining a balance between innovation and ethical responsibility.¹ Though Dia is pioneering in its technological approach, ensuring its responsible use remains a priority for developers and users alike.

Public Reactions and Feedback on Dia

The introduction of Dia, Nari Labs’ groundbreaking open‑source text‑to‑speech (TTS) model, has sparked diverse reactions from the public. Many enthusiasts on platforms like Hacker News and Twitter have praised Dia for its cutting‑edge capabilities, such as generating naturalistic dialogue and effectively managing non‑verbal cues. This praise often centers around its ability to perform tasks previously dominated by companies like ElevenLabs and Google. The open‑source nature and availability under the Apache 2.0 license further heighten its appeal, especially among developers who appreciate the freedom to innovate without significant financial constraints (¹).

However, public feedback is not entirely positive. Users have identified several issues that need addressing, particularly concerning performance reliability and the high computational requirements needed to run the model efficiently. The model's demand for robust hardware can be prohibitive, potentially limiting its accessibility for many users. Moreover, some unexpected outputs and bugs have been reported, pointing towards areas where refinement is necessary ().

Concerns regarding ethical implications also surface regularly in discussions surrounding Dia. As an open‑source tool, it is susceptible to misuse, including impersonation and the generation of misleading or harmful speech content. This potential for misuse raises ethical questions that must be addressed through regulatory measures and the implementation of safeguard mechanisms. Additionally, the community has expressed a need for more transparency regarding the model's training data to alleviate ethical and safety concerns ().

Despite these concerns, the exciting potential of Dia is undeniable. Nari Labs' plan to develop a consumer‑friendly version of the model indicates a commitment to making this technology accessible to a wider audience. By targeting the casual user demographic for remixing and sharing generated conversations, they aim to democratize voice technology further. This vision is a double‑edged sword, as it seeks to increase accessibility while simultaneously highlighting the need for responsible use and effective moderation mechanisms (¹).

Overall, Dia represents a significant advancement in TTS technology. It stands as a testament to the potential of open‑source models to challenge industry giants while fostering innovation and competition. Yet, it simultaneously emphasizes the pressing need to address ethical challenges and ensure that such powerful technologies are harnessed for positive and constructive purposes.

Economic, Social, and Political Implications of Dia

The release of Dia, an open‑source text‑to‑speech (TTS) model by Nari Labs, holds significant economic potential within the rapidly growing AI voice sector. With its availability under the Apache 2.0 license, Dia is poised to disrupt the market by lowering costs and promoting innovation in voice synthesis. This open‑source model challenges the dominance of existing players by providing sophisticated capabilities in handling non‑verbal cues and emotional transitions. Such democratization of technology may lead to the emergence of new business models and increase competition among developers aiming to leverage Dia's advanced features [5](https://opentools.ai/news/nari‑labs‑launches‑dia‑the‑new‑open‑source‑tts‑model‑challenging‑giants‑in‑ai). However, the open‑source nature also carries risks of misuse, potentially impacting businesses adversely by facilitating fraudulent activities [6](https://opentools.ai/news/meet‑dia‑the‑open‑source‑ai‑revolutionizing‑speech).

On a social level, Dia's democratization of advanced TTS technology presents both opportunities and challenges. By making powerful speech synthesis tools accessible, Dia can greatly assist those with visual impairments or literacy challenges, improving their interactions with technology and the world [2](https://zilliz.com/ai‑faq/what‑are‑the‑social‑implications‑of‑widespread‑tts‑adoption). However, the ease of voice cloning brings forth ethical concerns. The technology could be leveraged for nefarious purposes such as spreading misinformation or conducting impersonation scams, posing a risk to social trust and integrity [2](https://zilliz.com/ai‑faq/what‑are‑the‑social‑implications‑of‑widespread‑tts‑adoption). As this technology spreads, ensuring responsible use through education and regulation becomes imperative.

Politically, Dia's capabilities could impact the landscape significantly by enabling the creation of highly realistic and personalized audio content. Such capabilities are double‑edged; they can be used to craft tailored messages for various audiences or, conversely, manipulated for disinformation and propaganda efforts [6](https://opentools.ai/news/meet‑dia‑the‑open‑source‑ai‑revolutionizing‑speech). The political implications necessitate urgent development of robust regulatory frameworks and ethical guidelines to prevent misuse in political arenas. Governments and regulatory bodies will need to collaborate closely with technology developers to create safeguards that protect against potential threats to democratic processes [6](https://opentools.ai/news/meet‑dia‑the‑open‑source‑ai‑revolutionizing‑speech).

Sources

1.here(venturebeat.com)

Related News

May 26, 2026

Perplexity Open-Sources Bumblebee to Scan Developer Machines for Supply-Chain Threats

Perplexity has open-sourced Bumblebee, a read-only security scanner that checks developer machines for compromised packages, browser extensions, and AI tool configurations without ever executing potentially malicious code. The tool, written in Go with zero external dependencies, already protects the systems behind Perplexity Search, Comet browser, and Computer agent.

perplexitybumblebeesupply-chain-security

May 18, 2026

OpenAI Open-Sources Symphony: An Autonomous Coding Agent Orchestrator

OpenAI has open-sourced Symphony, a SPEC.md and Elixir reference implementation that turns project management boards into control planes for autonomous coding agents. Early adopters report 14 merged PRs from 20 issues in a four-day sprint — but the shift from interactive coding to agent supervision demands rethinking how engineering teams structure their work.

openaisymphonycodex

May 9, 2026

OpenAI Ships GPT-Realtime-2 — A Voice Model That Reasons Inside the Audio Loop

OpenAI launched GPT-Realtime-2 and two companion voice models on May 7, 2026. The flagship brings GPT-5-class reasoning to live voice with 128K context window.

openaigpt-realtime-2voice-ai