US20250203164
2025-06-19
Electricity
H04N21/472
The patent application describes a system for enhancing video streaming using an artificial intelligence virtual assistant. This assistant is represented by a synthetic human host embedded within an interface. The system is designed to interact with users through audio inputs related to products for sale, allowing for a more engaging and interactive user experience.
Users provide audio input through the embedded interface, which is captured by a natural language processing engine. This engine transforms the audio into a data segment that can be further divided into subsegments. These subsegments are then processed independently by a large language model (LLM), which analyzes the input to generate appropriate responses based on product information stored in its database.
The LLM generates responses that are then converted from text to speech using a text-to-speech (TTS) converter. This conversion allows the system to produce audio responses that mimic human interaction. The TTS-generated audio is integral to creating a seamless dialogue experience between the user and the synthetic human host.
The audio responses are utilized to create video segments featuring the synthetic host. These segments simulate a human-like conversation, enhancing the user's engagement with the virtual assistant. The continuous processing of user audio ensures that the interaction remains dynamic and responsive.
By maintaining a flow of dialogue between the user and the synthetic host, the system provides an interactive experience similar to conversing with a real human. This setup not only aids in product information dissemination but also improves user satisfaction by offering personalized and immediate responses.