OpenAI unveils GPT-Realtime API with GPT-5-class reasoning

OpenAI released a Realtime API with three voice models for live reasoning, translation across 70+ languages and streaming transcription.

OpenAI on Wednesday released a Realtime API that adds three voice models-GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper-to let applications listen, reason and act during live conversations. The models are available through the Realtime API and connect via WebRTC, WebSocket and SIP.

OpenAI described GPT-Realtime-2 as offering GPT-5-class reasoning. The company reported the model scored 15.2% higher on Big Bench Audio and 13.8% higher on Audio MultiChallenge than GPT-Realtime-1.5. GPT-Realtime-2 supports a 128,000-token context window, up from 32,000, offers five selectable reasoning levels from minimal to xhigh, and can call multiple external tools at once. It includes spoken error recovery and short bridging phrases such as “let me check that” while processing requests.

GPT-Realtime-Translate accepts more than 70 input languages and produces outputs in 13 languages to translate live speech in near real time. GPT-Realtime-Whisper provides streaming speech-to-text, transcribing words as they are spoken rather than waiting for a completed utterance, which reduces lag and aims to improve responsiveness for voice applications.

Pricing differs by model: Translate and Whisper are billed by the minute; GPT-Realtime-2 is metered by token consumption. OpenAI built content-moderation triggers into the models that can stop conversations detected as violating harmful-content policies, intended to block spam, fraud and other abuse.

Several companies received early access. Zillow used GPT-Realtime-2 to build a voice assistant for complex real estate queries and reported a 26-point improvement in call success rate on an adversarial benchmark after prompt tuning, rising from 69% to 95%. Deutsche Telekom is piloting real-time translation for customer support to let callers speak in their preferred language. Priceline is testing a single-session voice assistant to handle flight searches, hotel changes and on-the-ground translation.

OpenAI positioned the models for customer service automation and said they may be used in education, media, live events and creator tools where low-latency, multi-turn voice interaction and translation are required. Developers can adjust reasoning effort levels and integrate external tools to balance processing depth, latency and cost.

Content on BlockPort is provided for informational purposes only and does not constitute financial guidance.
We strive to ensure the accuracy and relevance of the information we share, but we do not guarantee that all content is complete, error-free, or up to date. BlockPort disclaims any liability for losses, mistakes, or actions taken based on the material found on this site.
Always conduct your own research before making financial decisions and consider consulting with a licensed advisor.
For further details, please review our Terms of Use, Privacy Policy, and Disclaimer.

Articles by this author

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.