YobiYoba

REST API

YobiYoba web service API - Basic Methods

The YobiYoba web service API allows authorized clients to upload an audio file and get back an XML document in return. The following table lists the elements of the HTTP request and the response for a basic speech-to-text conversion.

URI	`https://member.yobiyoba.com:8095/api`
HTTP method	POST¹
Query arguments	`method=`	bs_trans, cts_trans, xs_trans, ...
	`model=`	eng, fre, ...
	`audiofile=`	audio-filename
Request body	an audio file² or a MIME multi-part message³
Response	200 OK	XML document⁴
Response	HTTP status	XML error message⁴

¹GET and PUT HTTP methods are also accepted.
²Supported audio formats: AAC/M4A, ADTS-AAC, AC3, AIFC, AIFF, AMR, ASF/WMA, FLAC, MPEG2/3, MPEG-TS, MS-GSM, MS-PCM8/16/24/32-MPEG3, NIST/shortpack/shorten, Ogg/Vorbis/Opus, Sun AU, WEBM.
³Including an audio file and optionally some text files and some query options.
⁴The YobiYoba DTD can be found here schema.dtd.

Notes:

The server closes the connection after each request.
It is recommended to not go through proxies or firewalls to connect to the YobiYoba server.

The YobiYoba STT service currently offers 3 submission modes: the file mode, the streaming mode, and the real-time mode. This documentation describes the file mode. In this mode the client host submits a request including an audio file and the service answers with the XML result once the file has been entirely treated (one HTTP request per file). The streaming and real-time modes are described here

In the default blocking mode the application client sends a request to the server and waits for the server's response. The client must keep waiting (indefinitely) until the full response is received. Multithreaded clients can submit multiple requests at the same time. A non-blocking mode is also supported but the default blocking mode offer several advantages: it is easier to implement on the client side, it minimizes the operation latency, it minimizes the server load, and it allows the client to abort the operation by closing the connection, thereby avoiding loading the server with operations for which the result is no longer needed.

For more information please contact us at yobiyoba@yobinext.com.

User authentification and data encryption

This is done using the authentication method via an api-key, i.e. you need to add an HTTP header named "api-key" which has as value a valid api key.

The HTTPS protocol is used for all communications with the YobiYoba service, meaning that all the exchanges (including the user authentification) are encrypted.

Query arguments

The query arguments can be URI encoded or can be passed to the server in a MIME multi-part message.The table below gives the most important arguments.

Argument	Meaning	Notes
`method`	service method	The method argument is required. Valid values are: `hello`, `status`, `upload`, `bs_part`, `bs_lid`, `bs_trans`, `bs_align`, `cts_part`, `cts_lid`, `cts_trans`, `xs_part`, `xs_lid`, `xs_trans`. The `bs` methods are for wide band audio such as radio and TV broadcast with sampling rates of 16kHz or higher. The `cts` methods are for telephone data with sampling rates of 8kHz of higher. The `xs` methods can be used for any type of data, they are the prefered methods for the supported languages. It is highly recommended to not use a bit rate lower than 32Kbps. Audio partitioning: `bs_part`, `cts_part`, `xs_part` Language identification: `bs_lid`, `cts_lid`, `xs_lid` Transcription: `bs_trans`, `cts_trans`, `xs_trans` Audio-text alignment: `bs_align`
`audiofile`	audio file name¹	This argument is required for all methods except `hello`, `upload` and `status`. The audio data corresponding to the audio file (max. size of 300M bytes) must be included in the HTTP message. Supported audio formats: AIFF, ASF/WMA, FLAC, MS-Wave, MPEG, Ogg/Vorbis, Nist Sphere, Sun AU (all single track).
`model`	model name	A set of models is available for each `method`. For the transcription methods (e.g. `xs_trans`) the model name specifies the language (e.g. `eng`), and optionally the dialect, the application and the model version. Some models can be specifically designed for a particular user application. Availables models per method : bs_align : ara, chi, dut, eng, fin, fre, ger, gre, heb, hin, hun, ita, kor, lav, lit, per, pol, por, rum, rus, spa, swa, swe, tur cts_align : ara, chi, dut, eng, fre, ita, pus, rus, spa, swa xs_align : ara, chi, cze, dut, eng, fin, fre, ger, gre, heb, hin, hun, ita, per, pol, por, pus, rum, rus, spa, swa, swe, tur, ukr, urd cts_trans : ara, chi, dut, eng, fre, ger, heb, ita, pus, rus, spa, swa bs_trans : ara, chi, cze, dut, eng, fin, fre, fre-daily, ger, gre, heb, hin, hun, ita, kor, lav, lit, per, pol, por, rum, rus, spa, swa, swe, tur, ukr, urd xs_trans : ara, bam, chi, cze, dut, eng, fin, fre, fre-daily, ger, gre, heb, hin, hun, ita, per, pol, por, pus, rum, rus, spa, swa, swe, tur, ukr, urd
`textfile`	text file name^1,2	This argument is required for the `bs_align` method. It is optional for the `cts_trans`, `bs_trans`, `xs_trans` and `upload` methods. The text file name and the text file must be included in a MIME multipart message along with the audio file name and the audio file. Supported text formats: plain text file using ASCII, ISO-8859 or UTF8 character sets with language dependent restrictions. The text should include one sentence or clause per line ending with a optional punctuation mark and a line-feed character to mark the end of a line (i.e. Unix style).
`vocfile`	vocabulary file name^1,2	Optional argument for the `bs_align`, `cts_trans`, `bs_trans`, `xs_trans` and `upload` methods. The vocabulary file name and the vocabulary file must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per word with optional pronunciations.
`llfile`	language list file	Optional argument for the `bs_lid`, `cts_lid`, `xs_lid`, `bs_trans`, `cts_trans`, `xs_trans` and `upload` methods. The file name and the file content must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per language with optional prior probability.
`slfile`	speaker list file	Optional argument for the `bs_trans` and `upload` methods. It can be used to specify a subset of the speaker list (if any) associated to the model. The file name and file content must be included in a MIME multipart message along with the audio file name and the audio file. Supported format: plain text file (Unix style) with one line per speaker.
`async`	synchronisation method	By default `(async=0)` the connection is kept until the processing is done and the XML document is sent to the client. If `async=1` the server returns a session ID after receiving the audio data. The `status` method must be used to follow the operation progress and to get the XML result. Requests with `async=1` have very low priority³, i.e this mode should only be used for testing purpose.
`session`	session-ID	The session ID is required for the `status` method.
`verbose`	get session log	This option is effective only if `async=0`. `verbose=1` to add a log header to the server response. `verbose=2` to add a progress header to the server response.
`dlopt`	speech duration (s)	Maximum speech duration for language identification. Applies to bs_trans, cts_trans, xs_trans, bs_lid, cts_lid and xs_lid. By default `dlopt=30`. The entire audio file is used to identify the language if `dlopt=0`. With the bs_trans, cts_trans and xs_trans methods, you have to use this option if the model is not specified. Alternatively `model=und` instructs the service that the language must be identified automatically.
`kopt`	max #speakers	The default maximum number of speakers is 1 for cts methods (i.e. `kopt=1`) and infinity for bs and xs methods
`qlopt`	LID decision threshold	Set the language identification threshold [0.0-1.0]. By default `qlopt=0.5`.
`qopt`	decoding option	The option value is a string of letters (for example `qopt=df`): `'d'` for dual track decoding (cts), `'n'` for no postprocessing (punctuation and numbers), `'p'` for no audio partitioning, `'m'` for multiple word hypotheses, `'f'` for no confidence score filtering.
`qsopt`	speaker segmentation	Use `qsopt=1` to add speaker segmentation for the `bs_align` method. By default `qsopt=0`.
`ropt`	LID version	This option is use to specify a specific LID version (for example `ropt=6.0`). By default the service uses the most recent version.
`uopt`	prior language model weight	Optional argument for the `bs_trans`, `cts_trans` and `xs_trans` methods if used with `textfile` and/or `vocfile`. Set the weight of the default language model [0.5-1.0]. By default this weight is estimated automatically.
`priority`	set priority	The default priority is 0. Use 10 for a low priority request³.

¹The following 8 characters are removed from file names: space, tab, &, <, >, %, double-quote, and single-quote.
²The textfile and vocfile arguments are not supported for all languages.
³Low priority requests are more likely to be denied with the error code 307.

Request examples

Example 1: URI encoded request

PUT /api?method=bs_trans&model=eng&audiofile=doc1.mp3 HTTP/1.1
api-key: your api key
User-Agent: ClientProgram/1.0
Host: member.yobiyoba.com:8095
Content-Type: audio/mpeg
Content-Length: 1194336
[blank line]
[audio file content]

Notes:

Content-Length is equal to the audio file size.
The YobiYoba server keeps the connection open until the audio file has been entirely processed and the XML transcript sent back to the client.
Chunked transfer encoding can be used for streaming upload.
If the client closes the connection while the data is being processed the operation is aborted.

Example 2: MIME multi-part body request

POST /api HTTP/1.1
api-key: your api key
User-Agent: ClientProgram/1.0
Host: member.yobiyoba.com:8095
Content-Length: 1194846
Content-Type: multipart/form-data; boundary=--------------------------0a8606733948
[blank line]
[MIME multi-part message]

Notes:

This is the standard method used by browsers to upload a file via an HTML form.

Forming a MIME multi-part message is out of the scope of this help page. This can easily be done using curl (cf. the first curl example below). Like for an HTTP POST form, each argument must be in a separate MIME part and the audio part must include the file name as follows:

----------------------------0a8606733948
Content-Disposition: form-data; name="method"
[blank line]
bs_trans
----------------------------0a8606733948
Content-Disposition: form-data; name="model"
[blank line]
eng
----------------------------0a8606733948
Content-Disposition: form-data; name="audiofile"; filename="doc1.mp3"
Content-Type: application/octet-stream
[blank line]
[audio file content]
----------------------------0a8606733948--

Example 3: MIME multi-part body with a text file

POST /api HTTP/1.1
api-key: your api key
User-Agent: ClientProgram/1.0
Host: member.yobiyoba.com:8095
Content-Length: 1194846
Content-Type: multipart/form-data; boundary=--------------------------0a8606733948
[blank line]
----------------------------0a8606733948
Content-Disposition: form-data; name="method"
[blank line]
bs_align
----------------------------0a8606733948
Content-Disposition: form-data; name="model"
[blank line]
eng
----------------------------0a8606733948
Content-Disposition: form-data; name="audiofile"; filename="doc1.mp3"
Content-Type: application/octet-stream
[blank line]
[audio file content]
----------------------------0a8606733948
Content-Disposition: form-data; name="textfile"; filename="doc1.txt"
Content-Type: application/octet-stream
[blank line]
It's friday september thirteenth. I'm David David Brancaccio and here's some ...
----------------------------0a8606733948--

CURL examples

CURL is commonly available on many UNIX/Linux platforms. If you don't have the tool installed, visit the download page on the curl website.

1. Here is below a first curl example with the arguments encoded in the URL:


    curl -ksS -H "api-key: $api_key" "https://member.yobiyoba.com:8095/api?method=bs_trans&model=eng&audiofile=sample.mp3"
    -T sample.mp3 > sample.xml

2. Here is a second example where the arguments and the audio data are assembled in a MIME multipart message like for an HTML POST form. The audio file name is prefixed with an @ sign to ask curl to send the audio file content and not only the file name.


    curl -ksS -H "api-key: $api_key" https://member.yobiyoba.com:8095/api -F method=bs_trans -F model=eng -F audiofile=@sample.mp3
    > sample.xml

3. In this example a text file is provided along with the audio file in order to align the text with the audio:


    curl -ksS -H "api-key: $api_key" https://member.yobiyoba.com:8095/api -F method=bs_align -F model=eng -F audiofile=@sample.mp3
    -F textfile=@sample.txt > sample.xml

Notes:

After checking the curl exit code you also need to check the presence of the <Error> tag in the XML output, since by default the HTTP error codes (400 and above) are not relayed to the curl exit code.
With the -f flag curl returns an exit code 22 for HTTP error code being 400 or above, however this flag is not recommended as it prevents curl from outputting the XML error message.
The curl option --connect-timeout may be used to detect unsuccessful connections.
If there is a proxy or a firewall between the curl client and the YobiYoba server, you may want to use the --keepalive-time option to prevent disconnections due to network inactivity. The option value should be less than the proxy or firewall timeout duration.
You should be very careful if you use the --max-time option as the processing time depends on the audio file size, the audio quality, and the server load. This option is not recommended.

PHP/CURL examples

1. Here is a PHP/CURL script equivalent to the first curl example (PUT method):

<?php $audiofile = "sample.mp3"; $lang = "eng"; $server = "member.yobiyoba.com"; $curl=curl_init(); curl_setopt($curl,CURLOPT_HTTPHEADER,array("api-key: $your_api_key"); curl_setopt($curl,CURLOPT_SSL_VERIFYPEER,0); $audio = fopen($audiofile,"rb"); curl_setopt($curl,CURLOPT_URL,"https://$server:8095/api?method=bs_trans&model=$lang&audiofile=$audiofile"); curl_setopt($curl,CURLOPT_PUT,1); curl_setopt($curl,CURLOPT_INFILE,$audio); curl_setopt($curl, CURLOPT_RETURNTRANSFER,1); $xml = curl_exec ($curl); if (curl_error($curl)) echo 'Curl error: '.curl_error($curl).PHP_EOL; else { $http_status = curl_getinfo($curl,CURLINFO_HTTP_CODE); if ($http_status != 200) echo 'HTTP status:'.$http_status.PHP_EOL; print $xml; } curl_close ($curl); ?>

2. Here is a PHP/CURL script equivalent to the second curl example (POST method with a MIME multipart message):

<?php $audiofile = "sample.mp3"; $lang = "eng"; $server = "member.yobiyoba.com"; $curl=curl_init(); curl_setopt($curl,CURLOPT_HTTPHEADER,array("api-key: $your_api_key"); curl_setopt($curl,CURLOPT_SSL_VERIFYPEER,0); curl_setopt($curl,CURLOPT_URL,"https://$server:8095/api"); curl_setopt($curl,CURLOPT_POSTFIELDS,array("method" => "cts_trans", "model" => "$lang", "audiofile" => "@$audiofile")); curl_setopt($curl,CURLOPT_RETURNTRANSFER,1); $xml = curl_exec($curl); if (curl_error($curl)) echo 'Curl error: '.curl_error($curl).PHP_EOL; else { $http_status = curl_getinfo($curl,CURLINFO_HTTP_CODE); if ($http_status != 200) echo 'HTTP status:'.$http_status.PHP_EOL; print $xml; } curl_close ($curl); ?>

Messages and HTTP status codes

The YobiYoba server returns an XML message and a 200 HTTP status code for a successful synchronous request, whereas it returns a 202 status code for an asynchronous request. For an unsuccessful request the server returns an XML error message with an error code (between 1 to 317) and a small explanatory text. The HTTP status code is set as specified in the following table.

Error Code	HTTP status code	Message
--	OK (200)	xml document
--	Accepted (202)	session-ID
1-34	OK (200)	standard system call errors
200-232	OK (200)	library error messages
200	OK (200)	unspecified error
201	OK (200)	not a regular file
202	OK (200)	no read permission
203	OK (200)	zero size
204	OK (200)	cannot execute
205	OK (200)	cannot create file
206	OK (200)	not enough disk space
207	OK (200)	some environment variables not defined
208	OK (200)	not a stereo audio file
209	OK (200)	working directory not found
210	OK (200)	invalid audio file
211	OK (200)	a stereo audio file
212	OK (200)	problem encountered in reading audio file
213	OK (200)	missing some models
214	OK (200)	problem encountered during signal analysis
215	OK (200)	error in Viterbi decoding
216	OK (200)	error in lattice processing
217	OK (200)	error in AM estimation
218	OK (200)	error in clustering process
219	OK (200)	incorrect input XML file
220	OK (200)	error in AM adaptation process
221	OK (200)	execution time too long
222	OK (200)	missing audio file path name
223	OK (200)	not enough memory
224	OK (200)	invalid langset
225	OK (200)	unknown language
226	OK (200)	not enough data to build model
227	OK (200)	invalid LID model
228	OK (200)	invalid license
229	OK (200)	invalid channel number
230	OK (200)	file too large
231	OK (200)	invalid phone
232	OK (200)	invalid speaker set
233	OK (200)	language mismatch
234	OK (200)	invalid text or word list file
235	OK (200)	too many audio tracks
236	OK (200)	stream timeout
237	OK (200)	unsupported architecture
238	OK (200)	invalid vocabulary file
239	OK (200)	invalid kar file
240	OK (200)	stream long pause
241	OK (200)	RT model not found for specified language
242	OK (200)	missing words in pronunciation vocabulary
243	OK (200)	not enough processing resources
244	OK (200)	recording latency too high
245	OK (200)	no adaptable system found
246	OK (200)	invalid channel number

301	Bad Request (400)	invalid method
302	Bad Request (400)	missing method
303	Bad Request (400)	missing audio file
304	Bad Request (400)	missing model
305	Internal Server Error (500)	server error, can't execute request: 'xxx'
306	Bad Request (400)	invalid URI
307	Service Unavailable (503)	resource not available, retry later¹
308	Bad Request (400)	bad HTTP request
309	Request Entity Too Large (413)	request too big, check audio data
310	Bad Request (400)	missing session ID
311	Bad Request (400)	invalid session ID
312	Internal Server Error (500)	cannot create session file
313	Not Found (404)	session unknown
314	Bad Request (400)	invalid argument: xxx
315	Internal Server Error (500)	session aborted
316	Authorization Required (401)	authorization required
317	Internal Server Error (500)	internal server error
318	Bad Request (400)	missing audio file name
319	Not Found (404)	not found
320	OK (200)	in progress
321	Not Found (404)	session expired
322	Bad Request (400)	missing text data
323	Bad Request (400)	empty file
324	Internal Server Error (500)	cannot create command file
325	Bad Request (400)	missing URL argument
326	Bad Request (400)	missing HTTP message
327	Bad Request (400)	missing text data
328	Forbidden (403)	operation denied
329	OK (200)	in queue
330	Bad Request (400)	missing data
331	Bad Request (400)	missing keyword list file

¹The HTTP response header field Retry-After specifies how long the client process should wait before resubmitting the request (usually 60s)