Loading...
Please wait, while we are loading the content...
Similar Documents
Speech Processing Method and Apparatus, Device, Storage Medium and Program
| Content Provider | The Lens |
|---|---|
| Abstract | The present disclosure provides a speech processing method and apparatus, a device, a storage medium and a program, relates to the fields of speech technologies and natural language processing technologies in artificial intelligence. A specific implementation solution is: a terminal device sends at least one speech intention to a server in a process of receiving first speech information, where each speech intention is a speech intention corresponding to a part of speech information in the first speech information; the server acquires response information corresponding to the at least one speech intention; the terminal device sends the first speech information to the server in response to completion of receiving the first speech information; the server acquires a second speech intention corresponding to the first speech information, and sends the response information corresponding to the first speech intention to the terminal device, where the first speech intention is a same speech intention as the second speech intention in the at least one speech intention, and the terminal device outputs the response information. By the above process, the speech interaction delay is reduced. |
| Related Links | https://www.lens.org/images/patent/EP/4075425/A2/EP_4075425_A2.pdf |
| Language | English |
| Publisher Date | 2022-10-19 |
| Access Restriction | Open |
| Alternative Title | Verfahren Und Gerät Zur Sprachverarbeitung, Vorrichtung, Speichermedium Und Programm Procédé Et Appareil De Traitement De La Parole, Dispositif, Support De Stockage Et Programme |
| Content Type | Text |
| Resource Type | Patent |
| Date Applied | 2022-08-10 |
| Agent | J A Kemp Llp |
| Applicant | Apollo Intelligent Connectivity Beijing Technology Co Ltd |
| Application No. | 22189665 |
| Claim | A speech processing method, comprising: sending (S301) at least one speech intention to a server in a process of receiving first speech information, wherein each speech intention is a speech intention corresponding to a part of speech information in the first speech information; sending (S303) the first speech information to the server in response to completion of receiving the first speech information; receiving response information corresponding to a first speech intention from the server, wherein the response information is determined by the server after receiving the first speech intention, the first speech intention is the same as a second speech intention corresponding to the first speech information, and the at least one speech intention comprises the first speech intention; and outputting (S306) the response information. The method according to claim 1, wherein the sending (S301) the at least one speech intention to the server comprises: determining (S401) an i-th speech intention corresponding to an i-th part of speech information after receiving the i-th part of speech information, and sending (S402) the i-th speech intention to the server, wherein i takes 1, 2,..., N in sequence, an (i+1)-th part of speech information comprises the i-th part of speech information, and N is an integer greater than or equal to 1; wherein a difference between a speech duration corresponding to the first speech information and a speech duration corresponding to a N-th part of speech information is less than or equal to a first threshold, or a difference between a number of a syllable corresponding to the first speech information and a number of a syllable corresponding to a N-th part of speech information is less than or equal to a second threshold. The method according to claim 2, wherein the determining (S401) the i-th speech intention corresponding to the i-th part of speech information comprises: inputting the i-th part of speech information into an intention prediction model to acquire probabilities corresponding to a plurality of prediction intentions output by the intention prediction model; and determining the i-th speech intention corresponding to the i-th part of speech information according to the probabilities corresponding to the plurality of prediction intentions. The method according to claim 3, wherein the determining the i-th speech intention corresponding to the i-th part of speech information according to the probabilities corresponding to the plurality of prediction intentions comprises: determining a target prediction intention from the plurality of prediction intentions, wherein the target prediction intention has a highest probability; and determining the target prediction intention as the i-th speech intention corresponding to the i-th speech information. The method according to claim 3 or 4, wherein the intention prediction model is obtained by learning a plurality of groups of training samples, and each group of training samples comprises: sample speech information and a sample intention corresponding to the sample speech information; wherein the sample speech information is a part of the speech information extracted from historical speech information. The method according to any one of claims 2-5, wherein when i is an integer greater than 1, the sending the i-th speech intention to the server comprises: sending the i-th speech intention to the server when the i-th speech intention is different from a previous i-1 speech intention. A speech processing method, comprising: receiving at least one speech intention sent by a terminal device in a process of receiving first speech information, and acquiring (S302) response information corresponding to the at least one speech intention, wherein each speech intention is a speech intention corresponding to a part of speech information in the first speech information; receiving the first speech information sent by the terminal device and acquiring (S304) a second speech intention corresponding to the first speech information; and sending (S305) response information corresponding to the first speech intention to the terminal device, wherein the first speech intention is a same speech intention as the second speech intention in the at least one speech intention. The method according to claim 7, wherein the receiving the at least one speech intention sent by the terminal device in the process of receiving the first speech information, and acquiring (S302) the response information corresponding to the at least one speech intention comprises: receiving an i-th speech intention sent by the terminal device and acquiring (S403) response information corresponding to the i-th speech intention; wherein the i-th speech intention is determined by the terminal device after receiving an i-th part of speech information, and an (i+1)-th part of speech information comprises the i-th part of speech information, i takes 1, 2,..., N in sequence, and N is an integer greater than or equal to 1; wherein a difference between a speech duration corresponding to the first speech information and a speech duration corresponding to a N-th part of speech information is less than or equal to a first threshold, or a difference between a number of a syllable corresponding to the first speech information and a number of a syllable corresponding to a N-th part of speech information is less than or equal to a second threshold. The method according to claim 8, wherein the acquiring (S403) the response information corresponding to the i-th speech intention comprises: determining a target resource server according to the i-th speech intention, wherein the target resource server is configured to store the response information corresponding to the i-th speech intention; sending a request message to the target resource server, wherein the request message comprises the i-th speech intention; receiving the response information from the target resource server. The method according to any one of claims 7-9, wherein after the acquiring (S302) the response information corresponding to the at least one speech intention, further comprising: storing each speech intention and respective response information corresponding to the speech intention in a cache; the sending (S305) the response information corresponding to the first speech intention to the terminal device comprises: determining, according to the second speech intention, the first speech intention from the at least one speech intention stored in the cache; acquiring the response information corresponding to the first speech intention from the cache; and sending the response information corresponding to the first speech intention to the terminal device. A speech processing apparatus (800), comprising: a sending module (801), a receiving module (802) and an outputting module (803); the sending module (801) is configured to send at least one speech intention to a server in a process of receiving first speech information, wherein each speech intention is a speech intention corresponding to a part of speech information in the first speech information; the sending module (801) is further configured to send the first speech information to the server in response to completion of receiving the first speech information; the receiving module (802) is configured to receive response information corresponding to a first speech intention from the server, wherein the response information is determined by the server after receiving the first speech intention, the first speech intention is the same as a second speech intention corresponding to the first speech information, and the at least one speech intention comprises the first speech intention; and the outputting module (803) is configured to output the response information. The apparatus (800) according to claim 11, wherein the sending module (801) comprises: a determining unit, configured to determine an i-th speech intention corresponding to an i-th part of speech information after receiving the i-th part of speech information; and a sending unit, configured to send the i-th speech intention to the server, wherein i takes 1, 2,..., N in sequence, an (i+1)-th part of speech information comprises the i-th part of speech information, and N is an integer greater than or equal to 1; wherein a difference between a speech duration corresponding to the first speech information and a speech duration corresponding to a N-th part of speech information is less than or equal to a first threshold, or a difference between a number of a syllable corresponding to the first speech information and a number of a syllable corresponding to a N-th part of speech information is less than or equal to a second threshold. A speech processing apparatus (900), comprising: a receiving module (901), an acquiring module (902) and a sending module (903); the receiving module (901) is configured to receive at least one speech intention sent by a terminal device in a process of receiving first speech information; the acquiring module (902) is configured to acquire response information corresponding to the at least one speech intention, wherein each speech intention is a speech intention corresponding to a part of speech information in the first speech information; the receiving module (901) is further configured to receive the first speech information sent by the terminal device; the acquiring module (902) is further configured to acquire a second speech intention corresponding to the first speech information; the sending module (903) is configured to send response information corresponding to the first speech intention to the terminal device, wherein the first speech intention is a same speech intention as the second speech intention in the at least one speech intention. A non-transitory computer-readable storage medium stored with computer instructions, wherein the computer instructions are configured to enable a computer to execute the method according to any one of claims 1-6, or the method according to any one of claims 7-10. A computer program product, comprising a computer program, wherein when the computer program is executed by a processer to implement the method according to any one of claims 1-6, or the method according to any one of claims 7-10. |
| CPC Classification | SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS;SPEECH RECOGNITION;SPEECH OR VOICE PROCESSING TECHNIQUES;SPEECH OR AUDIO CODING OR DECODING ELECTRIC DIGITAL DATA PROCESSING TRANSMISSION OF DIGITAL INFORMATION; e.g. TELEGRAPHIC COMMUNICATION |
| Extended Family | 158-289-627-067-55X 089-356-426-453-67X 117-468-285-734-119 109-691-854-663-472 025-151-249-528-229 124-779-953-700-144 |
| Patent ID | 4075425 |
| Inventor/Author | Miao Shiqian |
| IPC | G10L15/18 G06F3/16 G10L15/30 |
| Status | Discontinued |
| Simple Family | 025-151-249-528-229 089-356-426-453-67X 117-468-285-734-119 158-289-627-067-55X 109-691-854-663-472 124-779-953-700-144 |
| CPC (with Group) | G10L15/1815 G10L15/22 G10L15/26 G10L15/30 G10L17/22 G10L15/1822 G06F3/167 G06F40/30 G06F40/216 G06F40/35 G10L25/93 H04L67/10 |
| Issuing Authority | European Patent Office (EPO) |
| Kind | Patent Application Publication (Republication) |