YobiYoba web service API - Basic Methods
YobiYoba web service API - Basic Methods
| URI | https://member.yobiyoba.com:8095/api | |
| HTTP method | POST1 | |
| Query arguments | method= | bs_trans, cts_trans, xs_trans, ... |
| model= | eng, fre, ... | |
| audiofile= | audio-filename | |
| Request body | an audio file2 or a MIME multi-part message3 | |
| Response | 200 OK | XML document4 |
| HTTP status | XML error message4 | |
2Supported audio formats: AAC/M4A, ADTS-AAC, AC3, AIFC, AIFF, AMR, ASF/WMA, FLAC, MPEG2/3, MPEG-TS, MS-GSM, MS-PCM8/16/24/32-MPEG3, NIST/shortpack/shorten, Ogg/Vorbis/Opus, Sun AU, WEBM.
3Including an audio file and optionally some text files and some query options.
4The YobiYoba DTD can be found here schema.dtd.
Notes:
- The server closes the connection after each request.
- It is recommended to not go through proxies or firewalls to connect to the YobiYoba server.
The YobiYoba STT service currently offers 3 submission modes: the file mode, the streaming mode, and the real-time mode. This documentation describes the file mode. In this mode the client host submits a request including an audio file and the service answers with the XML result once the file has been entirely treated (one HTTP request per file). The streaming and real-time modes are described here
In the default blocking mode the application client sends a request to the server and waits for the server's response. The client must keep waiting (indefinitely) until the full response is received. Multithreaded clients can submit multiple requests at the same time. A non-blocking mode is also supported but the default blocking mode offer several advantages: it is easier to implement on the client side, it minimizes the operation latency, it minimizes the server load, and it allows the client to abort the operation by closing the connection, thereby avoiding loading the server with operations for which the result is no longer needed.
For more information please contact us at yobiyoba@yobinext.com.
User authentification and data encryption
This is done using the authentication method via an api-key, i.e. you need to add an HTTP header named "api-key" which has as value a valid api key.The HTTPS protocol is used for all communications with the YobiYoba service, meaning that all the exchanges (including the user authentification) are encrypted.
Query arguments
The query arguments can be URI encoded or can be passed to the server in a MIME multi-part message.The table below gives the most important arguments.
| Argument | Meaning | Notes |
|---|---|---|
| method | service method | The method argument is required. Valid values are: hello, status, upload, bs_part,
bs_lid, bs_trans, bs_align, cts_part, cts_lid, cts_trans,
xs_part, xs_lid, xs_trans. The bs methods are for wide band
audio such as radio and TV broadcast with sampling rates of 16kHz or higher. The cts methods are
for telephone data with sampling rates of 8kHz of higher. The xs methods can be used for any type
of data, they are the prefered methods for the supported languages. It is highly recommended to not use a
bit rate lower than 32Kbps.
|
| audiofile | audio file name1 | This argument is required for all methods except hello, upload and status. The audio data corresponding to the audio file (max. size of 300M bytes) must be included in the HTTP message. Supported audio formats: AIFF, ASF/WMA, FLAC, MS-Wave, MPEG, Ogg/Vorbis, Nist Sphere, Sun AU (all single track). |
| model | model name | A set of models is available for each method. For the transcription methods (e.g.
xs_trans) the model name specifies the language (e.g. eng), and optionally the dialect,
the application and the model version. Some models can be specifically designed for a particular user
application.
Availables models per method :
|
| textfile | text file name1,2 | This argument is required for the bs_align method. It is optional for the cts_trans, bs_trans, xs_trans and upload methods. The text file name and the text file must be included in a MIME multipart message along with the audio file name and the audio file. Supported text formats: plain text file using ASCII, ISO-8859 or UTF8 character sets with language dependent restrictions. The text should include one sentence or clause per line ending with a optional punctuation mark and a line-feed character to mark the end of a line (i.e. Unix style). |
| vocfile | vocabulary file name1,2 | Optional argument for the bs_align, cts_trans, bs_trans, xs_trans and upload methods. The vocabulary file name and the vocabulary file must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per word with optional pronunciations. |
| llfile | language list file | Optional argument for the bs_lid, cts_lid, xs_lid, bs_trans, cts_trans, xs_trans and upload methods. The file name and the file content must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per language with optional prior probability. |
| slfile | speaker list file | Optional argument for the bs_trans and upload methods. It can be used to specify a subset of the speaker list (if any) associated to the model. The file name and file content must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per speaker. |
| async | synchronisation method | By default (async=0) the connection is kept until the processing is done and the XML document is sent to the client. If async=1 the server returns a session ID after receiving the audio data. The status method must be used to follow the operation progress and to get the XML result. Requests with async=1 have very low priority3, i.e this mode should only be used for testing purpose. |
| session | session-ID | The session ID is required for the status method. |
| verbose | get session log | This option is effective only if async=0. verbose=1 to add a log header to the server response. verbose=2 to add a progress header to the server response. |
| dlopt | speech duration (s) | Maximum speech duration for language identification. Applies to bs_trans, cts_trans, xs_trans, bs_lid, cts_lid and xs_lid. By default dlopt=30. The entire audio file is used to identify the language if dlopt=0. With the bs_trans, cts_trans and xs_trans methods, you have to use this option if the model is not specified. Alternatively model=und instructs the service that the language must be identified automatically. |
| kopt | max #speakers | The default maximum number of speakers is 1 for cts methods (i.e. kopt=1) and infinity for bs and xs methods |
| qlopt | LID decision threshold | Set the language identification threshold [0.0-1.0]. By default qlopt=0.5. |
| qopt | decoding option | The option value is a string of letters (for example qopt=df): 'd' for dual track decoding (cts), 'n' for no postprocessing (punctuation and numbers), 'p' for no audio partitioning, 'm' for multiple word hypotheses, 'f' for no confidence score filtering. |
| qsopt | speaker segmentation | Use qsopt=1 to add speaker segmentation for the bs_align method. By default qsopt=0. |
| ropt | LID version | This option is use to specify a specific LID version (for example ropt=6.0). By default the service uses the most recent version. |
| uopt | prior language model weight | Optional argument for the bs_trans, cts_trans and xs_trans methods if used with textfile and/or vocfile. Set the weight of the default language model [0.5-1.0]. By default this weight is estimated automatically. |
| priority | set priority | The default priority is 0. Use 10 for a low priority request3. |
2The textfile and vocfile arguments are not supported for all languages.
3Low priority requests are more likely to be denied with the error code 307.
Request examples
Example 1: URI encoded request
Notes:PUT /api?method=bs_trans&model=eng&audiofile=doc1.mp3 HTTP/1.1 api-key: your api key User-Agent: ClientProgram/1.0 Host: member.yobiyoba.com:8095 Content-Type: audio/mpeg Content-Length: 1194336 [blank line] [audio file content]
- Content-Length is equal to the audio file size.
- The YobiYoba server keeps the connection open until the audio file has been entirely processed and the XML transcript sent back to the client.
- Chunked transfer encoding can be used for streaming upload.
- If the client closes the connection while the data is being processed the operation is aborted.
Example 2: MIME multi-part body request
Notes:POST /api HTTP/1.1 api-key: your api key User-Agent: ClientProgram/1.0 Host: member.yobiyoba.com:8095 Content-Length: 1194846 Content-Type: multipart/form-data; boundary=--------------------------0a8606733948 [blank line] [MIME multi-part message]
- This is the standard method used by browsers to upload a file via an HTML form.
- Forming a MIME multi-part message is out of the scope of this help page. This can easily be done using curl
(cf. the first curl example below). Like for an HTTP POST form, each argument must be in a separate MIME part
and the audio part must include the file name as follows:
----------------------------0a8606733948 Content-Disposition: form-data; name="method" [blank line] bs_trans ----------------------------0a8606733948 Content-Disposition: form-data; name="model" [blank line] eng ----------------------------0a8606733948 Content-Disposition: form-data; name="audiofile"; filename="doc1.mp3" Content-Type: application/octet-stream [blank line] [audio file content] ----------------------------0a8606733948--
Example 3: MIME multi-part body with a text file
POST /api HTTP/1.1 api-key: your api key User-Agent: ClientProgram/1.0 Host: member.yobiyoba.com:8095 Content-Length: 1194846 Content-Type: multipart/form-data; boundary=--------------------------0a8606733948 [blank line] ----------------------------0a8606733948 Content-Disposition: form-data; name="method" [blank line] bs_align ----------------------------0a8606733948 Content-Disposition: form-data; name="model" [blank line] eng ----------------------------0a8606733948 Content-Disposition: form-data; name="audiofile"; filename="doc1.mp3" Content-Type: application/octet-stream [blank line] [audio file content] ----------------------------0a8606733948 Content-Disposition: form-data; name="textfile"; filename="doc1.txt" Content-Type: application/octet-stream [blank line] It's friday september thirteenth. I'm David David Brancaccio and here's some ... ----------------------------0a8606733948--
CURL examples
CURL is commonly available on many UNIX/Linux platforms. If you don't have the tool installed, visit the download page on the curl website.1. Here is below a first curl example with the arguments encoded in the URL:
curl -ksS -H "api-key: $api_key" "https://member.yobiyoba.com:8095/api?method=bs_trans&model=eng&audiofile=sample.mp3"
-T sample.mp3 > sample.xml
2. Here is a second example where the arguments and the audio data are assembled in a MIME multipart message like
for an HTML POST form. The audio file name is prefixed with an @ sign to ask curl to send the audio file content and not
only the file name.
curl -ksS -H "api-key: $api_key" https://member.yobiyoba.com:8095/api -F method=bs_trans -F model=eng -F audiofile=@sample.mp3
> sample.xml
3. In this example a text file is provided along with the audio file in order to align the text with the audio:
curl -ksS -H "api-key: $api_key" https://member.yobiyoba.com:8095/api -F method=bs_align -F model=eng -F audiofile=@sample.mp3
-F textfile=@sample.txt > sample.xml
Notes:
- After checking the curl exit code you also need to check the presence of the <Error> tag in the XML output, since by default the HTTP error codes (400 and above) are not relayed to the curl exit code.
- With the -f flag curl returns an exit code 22 for HTTP error code being 400 or above, however this flag is not recommended as it prevents curl from outputting the XML error message.
- The curl option --connect-timeout may be used to detect unsuccessful connections.
- If there is a proxy or a firewall between the curl client and the YobiYoba server, you may want to use the --keepalive-time option to prevent disconnections due to network inactivity. The option value should be less than the proxy or firewall timeout duration.
- You should be very careful if you use the --max-time option as the processing time depends on the audio file size, the audio quality, and the server load. This option is not recommended.
PHP/CURL examples
1. Here is a PHP/CURL script equivalent to the first curl example (PUT method):2. Here is a PHP/CURL script equivalent to the second curl example (POST method with a MIME multipart message):<?php $audiofile = "sample.mp3"; $lang = "eng"; $server = "member.yobiyoba.com"; $curl=curl_init(); curl_setopt($curl,CURLOPT_HTTPHEADER,array("api-key: $your_api_key"); curl_setopt($curl,CURLOPT_SSL_VERIFYPEER,0); $audio = fopen($audiofile,"rb"); curl_setopt($curl,CURLOPT_URL,"https://$server:8095/api?method=bs_trans&model=$lang&audiofile=$audiofile"); curl_setopt($curl,CURLOPT_PUT,1); curl_setopt($curl,CURLOPT_INFILE,$audio); curl_setopt($curl, CURLOPT_RETURNTRANSFER,1); $xml = curl_exec ($curl); if (curl_error($curl)) echo 'Curl error: '.curl_error($curl).PHP_EOL; else { $http_status = curl_getinfo($curl,CURLINFO_HTTP_CODE); if ($http_status != 200) echo 'HTTP status:'.$http_status.PHP_EOL; print $xml; } curl_close ($curl); ?>
<?php $audiofile = "sample.mp3"; $lang = "eng"; $server = "member.yobiyoba.com"; $curl=curl_init(); curl_setopt($curl,CURLOPT_HTTPHEADER,array("api-key: $your_api_key"); curl_setopt($curl,CURLOPT_SSL_VERIFYPEER,0); curl_setopt($curl,CURLOPT_URL,"https://$server:8095/api"); curl_setopt($curl,CURLOPT_POSTFIELDS,array("method" => "cts_trans", "model" => "$lang", "audiofile" => "@$audiofile")); curl_setopt($curl,CURLOPT_RETURNTRANSFER,1); $xml = curl_exec($curl); if (curl_error($curl)) echo 'Curl error: '.curl_error($curl).PHP_EOL; else { $http_status = curl_getinfo($curl,CURLINFO_HTTP_CODE); if ($http_status != 200) echo 'HTTP status:'.$http_status.PHP_EOL; print $xml; } curl_close ($curl); ?>
Messages and HTTP status codes
The YobiYoba server returns an XML message and a 200 HTTP status code for a successful synchronous request, whereas it returns a 202 status code for an asynchronous request. For an unsuccessful request the server returns an XML error message with an error code (between 1 to 317) and a small explanatory text. The HTTP status code is set as specified in the following table.| Error Code | HTTP status code | Message |
|---|---|---|
| -- | OK (200) | xml document |
| -- | Accepted (202) | session-ID |
| 1-34 | OK (200) | standard system call errors |
| 200-232 | OK (200) | library error messages |
| 200 | OK (200) | unspecified error |
| 201 | OK (200) | not a regular file |
| 202 | OK (200) | no read permission |
| 203 | OK (200) | zero size |
| 204 | OK (200) | cannot execute |
| 205 | OK (200) | cannot create file |
| 206 | OK (200) | not enough disk space |
| 207 | OK (200) | some environment variables not defined |
| 208 | OK (200) | not a stereo audio file |
| 209 | OK (200) | working directory not found |
| 210 | OK (200) | invalid audio file |
| 211 | OK (200) | a stereo audio file |
| 212 | OK (200) | problem encountered in reading audio file |
| 213 | OK (200) | missing some models |
| 214 | OK (200) | problem encountered during signal analysis |
| 215 | OK (200) | error in Viterbi decoding |
| 216 | OK (200) | error in lattice processing |
| 217 | OK (200) | error in AM estimation |
| 218 | OK (200) | error in clustering process |
| 219 | OK (200) | incorrect input XML file |
| 220 | OK (200) | error in AM adaptation process |
| 221 | OK (200) | execution time too long |
| 222 | OK (200) | missing audio file path name |
| 223 | OK (200) | not enough memory |
| 224 | OK (200) | invalid langset |
| 225 | OK (200) | unknown language |
| 226 | OK (200) | not enough data to build model |
| 227 | OK (200) | invalid LID model |
| 228 | OK (200) | invalid license |
| 229 | OK (200) | invalid channel number |
| 230 | OK (200) | file too large |
| 231 | OK (200) | invalid phone |
| 232 | OK (200) | invalid speaker set |
| 233 | OK (200) | language mismatch |
| 234 | OK (200) | invalid text or word list file |
| 235 | OK (200) | too many audio tracks |
| 236 | OK (200) | stream timeout |
| 237 | OK (200) | unsupported architecture |
| 238 | OK (200) | invalid vocabulary file |
| 239 | OK (200) | invalid kar file |
| 240 | OK (200) | stream long pause |
| 241 | OK (200) | RT model not found for specified language |
| 242 | OK (200) | missing words in pronunciation vocabulary |
| 243 | OK (200) | not enough processing resources |
| 244 | OK (200) | recording latency too high |
| 245 | OK (200) | no adaptable system found |
| 246 | OK (200) | invalid channel number |
| 301 | Bad Request (400) | invalid method |
| 302 | Bad Request (400) | missing method |
| 303 | Bad Request (400) | missing audio file |
| 304 | Bad Request (400) | missing model |
| 305 | Internal Server Error (500) | server error, can't execute request: 'xxx' |
| 306 | Bad Request (400) | invalid URI |
| 307 | Service Unavailable (503) | resource not available, retry later1 |
| 308 | Bad Request (400) | bad HTTP request |
| 309 | Request Entity Too Large (413) | request too big, check audio data |
| 310 | Bad Request (400) | missing session ID |
| 311 | Bad Request (400) | invalid session ID |
| 312 | Internal Server Error (500) | cannot create session file |
| 313 | Not Found (404) | session unknown |
| 314 | Bad Request (400) | invalid argument: xxx |
| 315 | Internal Server Error (500) | session aborted |
| 316 | Authorization Required (401) | authorization required |
| 317 | Internal Server Error (500) | internal server error |
| 318 | Bad Request (400) | missing audio file name |
| 319 | Not Found (404) | not found |
| 320 | OK (200) | in progress |
| 321 | Not Found (404) | session expired |
| 322 | Bad Request (400) | missing text data |
| 323 | Bad Request (400) | empty file |
| 324 | Internal Server Error (500) | cannot create command file |
| 325 | Bad Request (400) | missing URL argument |
| 326 | Bad Request (400) | missing HTTP message |
| 327 | Bad Request (400) | missing text data |
| 328 | Forbidden (403) | operation denied |
| 329 | OK (200) | in queue |
| 330 | Bad Request (400) | missing data |
| 331 | Bad Request (400) | missing keyword list file |