YobiYoba web service API - Basic Methods

REST API

YobiYoba web service API - Basic Methods

The YobiYoba web service API allows authorized clients to upload an audio file and get back an XML document in return. The following table lists the elements of the HTTP request and the response for a basic speech-to-text conversion.

URI https://member.yobiyoba.com:8095/api
HTTP method POST1
Query arguments method= bs_trans, cts_trans, xs_trans, ...
model= eng, fre, ...
audiofile= audio-filename
Request body an audio file2 or a MIME multi-part message3
Response 200 OK XML document4
HTTP status XML error message4
1GET and PUT HTTP methods are also accepted.
2Supported audio formats: AAC/M4A, ADTS-AAC, AC3, AIFC, AIFF, AMR, ASF/WMA, FLAC, MPEG2/3, MPEG-TS, MS-GSM, MS-PCM8/16/24/32-MPEG3, NIST/shortpack/shorten, Ogg/Vorbis/Opus, Sun AU, WEBM.
3Including an audio file and optionally some text files and some query options.
4The YobiYoba DTD can be found here schema.dtd.

 Notes:

  • The server closes the connection after each request.
  • It is recommended to not go through proxies or firewalls to connect to the YobiYoba server.

The YobiYoba STT service currently offers 3 submission modes: the file mode, the streaming mode, and the real-time mode. This documentation describes the file mode. In this mode the client host submits a request including an audio file and the service answers with the XML result once the file has been entirely treated (one HTTP request per file). The streaming and real-time modes are described here

In the default blocking mode the application client sends a request to the server and waits for the server's response. The client must keep waiting (indefinitely) until the full response is received. Multithreaded clients can submit multiple requests at the same time. A non-blocking mode is also supported but the default blocking mode offer several advantages: it is easier to implement on the client side, it minimizes the operation latency, it minimizes the server load, and it allows the client to abort the operation by closing the connection, thereby avoiding loading the server with operations for which the result is no longer needed.

For more information please contact us at yobiyoba@yobinext.com.

User authentification and data encryption

This is done using the authentication method via an api-key, i.e. you need to add an HTTP header named "api-key" which has as value a valid api key.

The HTTPS protocol is used for all communications with the YobiYoba service, meaning that all the exchanges (including the user authentification) are encrypted.

Query arguments

The query arguments can be URI encoded or can be passed to the server in a MIME multi-part message.The table below gives the most important arguments.

Argument Meaning Notes
method service method The method argument is required. Valid values are: hello, status, upload, bs_part, bs_lid, bs_trans, bs_align, cts_part, cts_lid, cts_trans, xs_part, xs_lid, xs_trans. The bs methods are for wide band audio such as radio and TV broadcast with sampling rates of 16kHz or higher. The cts methods are for telephone data with sampling rates of 8kHz of higher. The xs methods can be used for any type of data, they are the prefered methods for the supported languages. It is highly recommended to not use a bit rate lower than 32Kbps.
  • Audio partitioning: bs_part, cts_part, xs_part
  • Language identification: bs_lid, cts_lid, xs_lid
  • Transcription: bs_trans, cts_trans, xs_trans
  • Audio-text alignment: bs_align
audiofile audio file name1 This argument is required for all methods except hello, upload and status. The audio data corresponding to the audio file (max. size of 300M bytes) must be included in the HTTP message. Supported audio formats: AIFF, ASF/WMA, FLAC, MS-Wave, MPEG, Ogg/Vorbis, Nist Sphere, Sun AU (all single track).
model model name A set of models is available for each method. For the transcription methods (e.g. xs_trans) the model name specifies the language (e.g. eng), and optionally the dialect, the application and the model version. Some models can be specifically designed for a particular user application.

Availables models per method :
  • bs_align :
    ara, chi, dut, eng, fin, fre, ger, gre, heb, hin, hun, ita, kor, lav, lit, per, pol, por, rum, rus, spa, swa, swe, tur
  • cts_align :
    ara, chi, dut, eng, fre, ita, pus, rus, spa, swa
  • xs_align :
    ara, chi, cze, dut, eng, fin, fre, ger, gre, heb, hin, hun, ita, per, pol, por, pus, rum, rus, spa, swa, swe, tur, ukr, urd
  • cts_trans :
    ara, chi, dut, eng, fre, ger, heb, ita, pus, rus, spa, swa
  • bs_trans :
    ara, chi, cze, dut, eng, fin, fre, fre-daily, ger, gre, heb, hin, hun, ita, kor, lav, lit, per, pol, por, rum, rus, spa, swa, swe, tur, ukr, urd
  • xs_trans :
    ara, bam, chi, cze, dut, eng, fin, fre, fre-daily, ger, gre, heb, hin, hun, ita, per, pol, por, pus, rum, rus, spa, swa, swe, tur, ukr, urd
textfile text file name1,2 This argument is required for the bs_align method. It is optional for the cts_trans, bs_trans, xs_trans and upload methods. The text file name and the text file must be included in a MIME multipart message along with the audio file name and the audio file. Supported text formats: plain text file using ASCII, ISO-8859 or UTF8 character sets with language dependent restrictions. The text should include one sentence or clause per line ending with a optional punctuation mark and a line-feed character to mark the end of a line (i.e. Unix style).
vocfile vocabulary file name1,2 Optional argument for the bs_align, cts_trans, bs_trans, xs_trans and upload methods. The vocabulary file name and the vocabulary file must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per word with optional pronunciations.
llfile language list file Optional argument for the bs_lid, cts_lid, xs_lid, bs_trans, cts_trans, xs_trans and upload methods. The file name and the file content must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per language with optional prior probability.
slfile speaker list file Optional argument for the bs_trans and upload methods. It can be used to specify a subset of the speaker list (if any) associated to the model. The file name and file content must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per speaker.
async synchronisation method By default (async=0) the connection is kept until the processing is done and the XML document is sent to the client. If async=1 the server returns a session ID after receiving the audio data. The status method must be used to follow the operation progress and to get the XML result. Requests with async=1 have very low priority3, i.e this mode should only be used for testing purpose.
session session-ID The session ID is required for the status method.
verbose get session log This option is effective only if async=0.
verbose=1 to add a log header to the server response.
verbose=2 to add a progress header to the server response.
dlopt speech duration (s) Maximum speech duration for language identification. Applies to bs_trans, cts_trans, xs_trans, bs_lid, cts_lid and xs_lid. By default dlopt=30. The entire audio file is used to identify the language if dlopt=0. With the bs_trans, cts_trans and xs_trans methods, you have to use this option if the model is not specified. Alternatively model=und instructs the service that the language must be identified automatically.
kopt max #speakers The default maximum number of speakers is 1 for cts methods (i.e. kopt=1) and infinity for bs and xs methods
qlopt LID decision threshold Set the language identification threshold [0.0-1.0]. By default qlopt=0.5.
qopt decoding option The option value is a string of letters (for example qopt=df): 'd' for dual track decoding (cts), 'n' for no postprocessing (punctuation and numbers), 'p' for no audio partitioning, 'm' for multiple word hypotheses, 'f' for no confidence score filtering.
qsopt speaker segmentation Use qsopt=1 to add speaker segmentation for the bs_align method. By default qsopt=0.
ropt LID version This option is use to specify a specific LID version (for example ropt=6.0). By default the service uses the most recent version.
uopt prior language model weight Optional argument for the bs_trans, cts_trans and xs_trans methods if used with textfile and/or vocfile. Set the weight of the default language model [0.5-1.0]. By default this weight is estimated automatically.
priority set priority The default priority is 0. Use 10 for a low priority request3.
1The following 8 characters are removed from file names: space, tab, &, <, >, %, double-quote, and single-quote.
2The textfile and vocfile arguments are not supported for all languages.
3Low priority requests are more likely to be denied with the error code 307.

Request examples

Example 1: URI encoded request

PUT /api?method=bs_trans&model=eng&audiofile=doc1.mp3 HTTP/1.1
api-key: your api key
User-Agent: ClientProgram/1.0
Host: member.yobiyoba.com:8095
Content-Type: audio/mpeg
Content-Length: 1194336
[blank line]
[audio file content]
 Notes:
  • Content-Length is equal to the audio file size.
  • The YobiYoba server keeps the connection open until the audio file has been entirely processed and the XML transcript sent back to the client.
  • Chunked transfer encoding can be used for streaming upload.
  • If the client closes the connection while the data is being processed the operation is aborted.

Example 2: MIME multi-part body request

POST /api HTTP/1.1
api-key: your api key
User-Agent: ClientProgram/1.0
Host: member.yobiyoba.com:8095
Content-Length: 1194846
Content-Type: multipart/form-data; boundary=--------------------------0a8606733948
[blank line]
[MIME multi-part message]
 Notes:
  • This is the standard method used by browsers to upload a file via an HTML form.
  • Forming a MIME multi-part message is out of the scope of this help page. This can easily be done using curl (cf. the first curl example below). Like for an HTTP POST form, each argument must be in a separate MIME part and the audio part must include the file name as follows:
    ----------------------------0a8606733948
    Content-Disposition: form-data; name="method"
    [blank line]
    bs_trans
    ----------------------------0a8606733948
    Content-Disposition: form-data; name="model"
    [blank line]
    eng
    ----------------------------0a8606733948
    Content-Disposition: form-data; name="audiofile"; filename="doc1.mp3"
    Content-Type: application/octet-stream
    [blank line]
    [audio file content]
    ----------------------------0a8606733948--
    

Example 3: MIME multi-part body with a text file

POST /api HTTP/1.1
api-key: your api key
User-Agent: ClientProgram/1.0
Host: member.yobiyoba.com:8095
Content-Length: 1194846
Content-Type: multipart/form-data; boundary=--------------------------0a8606733948
[blank line]
----------------------------0a8606733948
Content-Disposition: form-data; name="method"
[blank line]
bs_align
----------------------------0a8606733948
Content-Disposition: form-data; name="model"
[blank line]
eng
----------------------------0a8606733948
Content-Disposition: form-data; name="audiofile"; filename="doc1.mp3"
Content-Type: application/octet-stream
[blank line]
[audio file content]
----------------------------0a8606733948
Content-Disposition: form-data; name="textfile"; filename="doc1.txt"
Content-Type: application/octet-stream
[blank line]
It's friday september thirteenth. I'm David David Brancaccio and here's some ...
----------------------------0a8606733948--

CURL examples

CURL is commonly available on many UNIX/Linux platforms. If you don't have the tool installed, visit the download page on the curl website.

1. Here is below a first curl example with the arguments encoded in the URL:


    curl -ksS -H "api-key: $api_key" "https://member.yobiyoba.com:8095/api?method=bs_trans&model=eng&audiofile=sample.mp3"
    -T sample.mp3 > sample.xml

2. Here is a second example where the arguments and the audio data are assembled in a MIME multipart message like for an HTML POST form. The audio file name is prefixed with an @ sign to ask curl to send the audio file content and not only the file name.


    curl -ksS -H "api-key: $api_key" https://member.yobiyoba.com:8095/api -F method=bs_trans -F model=eng -F audiofile=@sample.mp3
    > sample.xml

3. In this example a text file is provided along with the audio file in order to align the text with the audio:


    curl -ksS -H "api-key: $api_key" https://member.yobiyoba.com:8095/api -F method=bs_align -F model=eng -F audiofile=@sample.mp3
    -F textfile=@sample.txt > sample.xml

 Notes:
  • After checking the curl exit code you also need to check the presence of the <Error> tag in the XML output, since by default the HTTP error codes (400 and above) are not relayed to the curl exit code.
  • With the -f flag curl returns an exit code 22 for HTTP error code being 400 or above, however this flag is not recommended as it prevents curl from outputting the XML error message.
  • The curl option --connect-timeout may be used to detect unsuccessful connections.
  • If there is a proxy or a firewall between the curl client and the YobiYoba server, you may want to use the --keepalive-time option to prevent disconnections due to network inactivity. The option value should be less than the proxy or firewall timeout duration.
  • You should be very careful if you use the --max-time option as the processing time depends on the audio file size, the audio quality, and the server load. This option is not recommended.

PHP/CURL examples

1. Here is a PHP/CURL script equivalent to the first curl example (PUT method):

<?php
  $audiofile = "sample.mp3";
  $lang = "eng";
  $server = "member.yobiyoba.com";
  $curl=curl_init();
  curl_setopt($curl,CURLOPT_HTTPHEADER,array("api-key: $your_api_key");
  curl_setopt($curl,CURLOPT_SSL_VERIFYPEER,0);
  $audio = fopen($audiofile,"rb");
  curl_setopt($curl,CURLOPT_URL,"https://$server:8095/api?method=bs_trans&model=$lang&audiofile=$audiofile");
  curl_setopt($curl,CURLOPT_PUT,1);
  curl_setopt($curl,CURLOPT_INFILE,$audio);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER,1);
  $xml = curl_exec ($curl);
  if (curl_error($curl)) echo 'Curl error: '.curl_error($curl).PHP_EOL;
  else {
    $http_status = curl_getinfo($curl,CURLINFO_HTTP_CODE);
    if ($http_status != 200) echo 'HTTP status:'.$http_status.PHP_EOL;
    print $xml;
  }
  curl_close ($curl);
?>

2. Here is a PHP/CURL script equivalent to the second curl example (POST method with a MIME multipart message):

<?php
  $audiofile = "sample.mp3";
  $lang = "eng";
  $server = "member.yobiyoba.com";
  $curl=curl_init();
  curl_setopt($curl,CURLOPT_HTTPHEADER,array("api-key: $your_api_key");
  curl_setopt($curl,CURLOPT_SSL_VERIFYPEER,0);
  curl_setopt($curl,CURLOPT_URL,"https://$server:8095/api");
  curl_setopt($curl,CURLOPT_POSTFIELDS,array("method" => "cts_trans",
   		       "model" => "$lang", "audiofile" => "@$audiofile"));
  curl_setopt($curl,CURLOPT_RETURNTRANSFER,1);
  $xml = curl_exec($curl);
  if (curl_error($curl)) echo 'Curl error: '.curl_error($curl).PHP_EOL;
  else {
    $http_status = curl_getinfo($curl,CURLINFO_HTTP_CODE);
    if ($http_status != 200) echo 'HTTP status:'.$http_status.PHP_EOL;
    print $xml;
  }
  curl_close ($curl);
?>

Messages and HTTP status codes

The YobiYoba server returns an XML message and a 200 HTTP status code for a successful synchronous request, whereas it returns a 202 status code for an asynchronous request. For an unsuccessful request the server returns an XML error message with an error code (between 1 to 317) and a small explanatory text. The HTTP status code is set as specified in the following table.
Error Code HTTP status code Message
-- OK (200) xml document
-- Accepted (202) session-ID
1-34 OK (200) standard system call errors
200-232 OK (200) library error messages
200 OK (200) unspecified error
201 OK (200) not a regular file
202 OK (200) no read permission
203 OK (200) zero size
204 OK (200) cannot execute
205 OK (200) cannot create file
206 OK (200) not enough disk space
207 OK (200) some environment variables not defined
208 OK (200) not a stereo audio file
209 OK (200) working directory not found
210 OK (200) invalid audio file
211 OK (200) a stereo audio file
212 OK (200) problem encountered in reading audio file
213 OK (200) missing some models
214 OK (200) problem encountered during signal analysis
215 OK (200) error in Viterbi decoding
216 OK (200) error in lattice processing
217 OK (200) error in AM estimation
218 OK (200) error in clustering process
219 OK (200) incorrect input XML file
220 OK (200) error in AM adaptation process
221 OK (200) execution time too long
222 OK (200) missing audio file path name
223 OK (200) not enough memory
224 OK (200) invalid langset
225 OK (200) unknown language
226 OK (200) not enough data to build model
227 OK (200) invalid LID model
228 OK (200) invalid license
229 OK (200) invalid channel number
230 OK (200) file too large
231 OK (200) invalid phone
232 OK (200) invalid speaker set
233 OK (200) language mismatch
234 OK (200) invalid text or word list file
235 OK (200) too many audio tracks
236 OK (200) stream timeout
237 OK (200) unsupported architecture
238 OK (200) invalid vocabulary file
239 OK (200) invalid kar file
240 OK (200) stream long pause
241 OK (200) RT model not found for specified language
242 OK (200) missing words in pronunciation vocabulary
243 OK (200) not enough processing resources
244 OK (200) recording latency too high
245 OK (200) no adaptable system found
246 OK (200) invalid channel number
301 Bad Request (400) invalid method
302 Bad Request (400) missing method
303 Bad Request (400) missing audio file
304 Bad Request (400) missing model
305 Internal Server Error (500) server error, can't execute request: 'xxx'
306 Bad Request (400) invalid URI
307 Service Unavailable (503) resource not available, retry later1
308 Bad Request (400) bad HTTP request
309 Request Entity Too Large (413) request too big, check audio data
310 Bad Request (400) missing session ID
311 Bad Request (400) invalid session ID
312 Internal Server Error (500) cannot create session file
313 Not Found (404) session unknown
314 Bad Request (400) invalid argument: xxx
315 Internal Server Error (500) session aborted
316 Authorization Required (401) authorization required
317 Internal Server Error (500) internal server error
318 Bad Request (400) missing audio file name
319 Not Found (404) not found
320 OK (200) in progress
321 Not Found (404) session expired
322 Bad Request (400) missing text data
323 Bad Request (400) empty file
324 Internal Server Error (500) cannot create command file
325 Bad Request (400) missing URL argument
326 Bad Request (400) missing HTTP message
327 Bad Request (400) missing text data
328 Forbidden (403) operation denied
329 OK (200) in queue
330 Bad Request (400) missing data
331 Bad Request (400) missing keyword list file
1The HTTP response header field Retry-After specifies how long the client process should wait before resubmitting the request (usually 60s)