Internal documentation

Start scripts

This section lists the start scripts used to run the three CALL-SLT server processes.

  • Compilation: $CALLSLT/EngInt/scripts/run_server_flat_compilation_server.bat
  • Staging: $CALLSLT/EngInt/scripts/run_server_flat_staging_server.bat
  • Production: $CALLSLT/EngInt/scripts/run_server_flat.bat

Environment variables

This section lists the environment variables used to identify system directories.

  • $LITECONTENT points to a directory that contains Lite data and metadata.
  • $LITEWEBCONTENT points to the root directory for copied webserver content.
  • $LITEFMSCONTENT points to the root directory for copied FMS content.

FTP directories and user permissions

To be able to upload content, users need

  • an FTP directory
  • appropriate permissions

The relevant declarations are placed in the file $CALLSLT/GUI/Prolog/login_info.pl. A typical record looks like this:

login_info(mannyrayner,
           callslt,
           [administrator=yes,
            birth_year=1958,
            gender=male,
            native_language=english,
            native_language=swedish,
            permission=edit_transcription(any),
            permission=update_namespace(bologna),
            permission=update_namespace(hello),
            permission=update_namespace(english_course),
            permission=update_namespace(pronunciation),
            permission=update_namespace(toy),
            ftp_directory='Z:/manny'
           ]).

Metadata

Metadata is placed in the file $LITECONTENT/callslt_multilingual_metadata.cfg. Typical records look like this:

regulus_config(lite_directory(visual,pokemon,english), lite_content('visual/pokemon')).
regulus_config(lite_file(call,visual,english,[english]), lite_content('visual/pokemon/grammars/pokemon.txt')).

Multimedia files

The root directory for copied webserver content is $LITEWEBCONTENT, and the root directory for copied FMS content is $LITEFMSCONTENT. The structure of these directories mirrors that for the LiteContent source directory, as follows:

  • Top: deployment level, one of {compilation,staging,production}
  • 2nd: namespace
  • 3rd: course
  • 4th: one of {multimedia,doc}

So for example the staging version of the doc file pronunciation_h_help.html from Aline’s course will be

$LITEWEBCONTENT/staging/pronunciation/pronunciation/doc/pronunciation_h_help.html

the production version of the picture of Audrey Hepburn in the visual3 course is

$LITEWEBCONTENT/production/visual/visual3/multimedia/audrey_hepburn.jpg

and the compilation version of the video resto_catherine_1.flv from Claudia’s course is

$LITEFMSCONTENT/compilation/english_course/english_course/multimedia/resto_catherine_1.flv

The copying code is set up to create any new directories it needs under the root directories $LITEWEBCONTENT and $LITEFMSCONTENT.

Compile-time operations

Select

Operations for the Select step

Perform the search_ftp_register_copy_and_get_namespaces_domains_and_l1s operation on the compilation server.

This looks for new courses in the FTP directory, registers them if necessary in the metadata, copies the namespace directories from the FTP directory to the compilation server, and returns a list of namespace-course-L1 triples or error/warning messages about bad course directories/files in the FTP directory.

First set the client (this is necessary to know which courses are going to be unable):

{ "*action_for_session":[ "0.5796765610575676", {
                          "*process_non_speech_input":[
                                  { "*set_client":[ "multimedia_client" ] } ] } ] }

then perform the main operation:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    "search_ftp_register_copy_and_get_namespaces_domains_and_l1s" ] } ] }

Typical good output (null errors):

{ "action":{ "*errors_plus_current_namespaces_domains_and_l1s":[
                  null,
                  [ [ "visual",
                      "visual1_course",
                      "english" ] ] ] } }

Typical bad output (non-null errors):

{ "action":{ "*errors_plus_current_namespaces_domains_and_l1s":[
                 [ { "namespace":"visual",
                     "course_dir":"visual1",
                     "error":"File index.rst with extension \".rst\" should not be in grammars directory (namespace \"visual\", course \"visual1\")" } ],
                 null ] } }

Compile

Operations for the Compile step

On compilation server, set the namespace to the selected value:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    { "*set_namespace":[ "visual" ] } ] } ] }

then perform a compile operation:

{ "*action_for_session":[ "0.5796765610575676","recompile_system" ] }

Test and Release

Operations for the Test and Release steps

On the staging or production server (processing is the same in both cases), do the following:

Copy over Lite content:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    { "*update_lite_content_for_namespace":[ "visual" ] } ] } ] }

Set the namespace to the selected value:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    { "*set_namespace":[ "visual" ] } ] } ] }

Invoke the compile operation: this includes copying the compiled grammars from the level below, and copying the multimedia content for the selected namespace to the webserver and FMS directories:

{ "*action_for_session":[ "0.5796765610575676","recompile_system" ] }

Delete a course

The client needs to send three messages, one to each server.

First a delete_course message to the compilation server:

{ "*action_for_session":[ "0.7478729719296098",
                          { "*process_non_speech_input":[
                                    { "*delete_course":[ "visual", "arithmetic_french" ] } ] } ] }

A correct response will be something like

{ "action":{ "*course_deleted":[ "visual",
                                 "arithmetic_french",
                                 "ok",
                                 "Removed: namespace = visual, course = arithmetic_french (1 directories, 1 files)" ] } }

An incorrect response will have "error" instead of "ok".

Next, a refresh_course_information_from_declarations message to the staging server:

{ "*action_for_session":[ 587612866351,
                          { "*process_non_speech_input":[
                                    "refresh_course_information_from_declarations" ] } ] }

The response should be [action=ok] or [action=error].

Next, a refresh_course_information_from_declarations message to the production server:

{ "*action_for_session":[ 587612866351,
                          { "*process_non_speech_input":[
                                    "refresh_course_information_from_declarations" ] } ] }

The response should be [action=ok] or [action=error]

Runtime operations

The basic runtime sequence is as follows:

  • Login. You need to be logged in to do anything.
  • Get current courses. Return a list of available courses.
  • Set current course. Choose one of the courses to be the currently active course.
  • Get current lessons. Return a list of available lessons for the currently active course.
  • Set current lesson. Choose one of the lessons to be the currently active lesson.
  • Go forward. Get the next prompt.
  • Get audio/text help. Optionally get help for the current prompt.
  • Process speech recognition output or perform recognition and matching. Match output from some kind of speech recogniser against the current prompt.

Login

Use this to login to an existing account.

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    { "*login_user_with_password":[ "babeldr-demo", "babeldr", "" ] } ] } ] }

Typical response:

{"action":"not_administrator"}

Get current courses

Use this to find out what courses are available. You will get back a list of namespace-domain-L1 triples. You will be able to use them as input to the "*set_namespace_l1_and_domain" message.

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    "get_available_namespaces_domains_and_l1s" ] } ] }

Typical response:

{ "action":{ "*current_namespaces_domains_and_l1s":[
     [ [ "Level1", "vocEn", "french" ],
       [ "lego", "lego_course", "english" ],
       [ "swinburne1", "italian1", "english" ],
       [ "visual", "visual3_course", "english" ],
       [ "visual_australian", "words", "english" ],
       [ "visual_french", "arithmetic_french", "french" ],
       [ "visual_french", "pokemon_french", "french" ],
       [ "visual_german", "numbers", "english" ],
       [ "visual_japanese", "numbers", "english" ],
       [ "visual_slovenian", "slovenian_101", "english" ]
      ] ] } }

Set current course

Use this to set the current namespace, domain and L1. You can find possible values using the "*current_namespaces_domains_and_l1s" message.

Most messages won’t work if you haven’t made one of these calls first.

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    { "*set_namespace_l1_and_domain":[ "visual",
                                                                       "english",
                                                                       "visual1_course" ] } ] } ] }

Get current lessons

Use this to find out which lessons are loaded. You need to have set the current course first using "*set_namespace_l1_and_domain".

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    "get_current_lessons" ] } ] }

Typical response:

{ "action":{ "*current_lesson_ids":[
     [ "lesson_1",
       "lesson_2",
       "lesson_3",
       "lesson_4"
     ] ] } }

Set current lesson

Set the current lesson. You can find possible values using "get_current_lessons". The response gives the number of examples available for the selected lesson.

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    { "*set_lesson":[ "animals" ] } ] } ] }

Typical response:

{ "action":{ "*number_of_examples_matching_current_strategy":[ 2 ] } }

Get recognition grammar

Get the current recognition grammar. This is only useful when using Nuance-based recognition.

Typical message:

{ "*action_for_session":[ "0.5796765610575676", { "*process_non_speech_input":[
        "get_recognition_grammar_id" ] } ] }

Typical response:

{ "action":{ "*recognition_grammar":[
     "callslt_english_production_dot_MAIN__visual__visual1_course@unige.ch" ] } }

Go forward

Move to the next prompt. Returns data for that prompt.

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    "next" ] } ] }

Typical response:

{ "action":[ { "*display_prompt":[
     { "*multimedia":[ [ "production/visual/visual1/multimedia/cat.png" ], "What is it?" ] } ] },
     { "*display_graphical_prompt":[ null ] } ] }

Go back

Move to the preceding prompt. Returns data for that prompt.

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    "back" ] } ] }

Typical response:

{ "action":[ { "*display_prompt":[
     { "*multimedia":[ [ "production/visual/visual1/multimedia/cat.png" ], "What is it?" ] } ] },
     { "*display_graphical_prompt":[ null ] } ] }

Get audio/text help

Get help information for the current prompt. The fields which are most likely to be useful are "*speech_help" and "*text_help".

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_non_speech_input":[
                                    "help" ] } ] }

Typical response:

{ "action":{ "*provide_help":[ [
   { "*display_prompt":[ { "*multimedia":[ [ "production/visual_slovenian/numbers/multimedia/two.jpg" ],
                                            "Say the number in Slovenian" ] } ] },
   { "*display_graphical_prompt":[ null ] },
   { "*speech_help":[ [ { "*wavfile_and_info":[ { "wavfile":"production/visual_slovenian/numbers/multimedia/help/n2.flv", "transcription":"dva" } ] } ] ] },
   { "*text_help":[ [ "Dva" ] ] }, { "*preference":[ "speech_help" ] } ] ] }
}

Process speech recognition output

Process a string (normally produced using a speech recogniser) and match it against the current prompt.

Typical message:

{ "*action_for_session":[ "0.5796765610575676",
                          { "*process_rec_string":[ "a cat" ] }
                        ] }

Typical response:

{ "selected":"a cat",
  "action":{ "*display_matching_info":[ { "recognised":"a cat",
                                          "gloss":null,
                                          "matching_info":"correct",
                                          "level":"undefined",
                                          "score":{ "good":1, "bad":0, "streak":1 },
                                          "get_next":"true" } ] } }

Perform recognition and matching

Process output from the Nuance speech recogniser and match it against the current prompt.

Typical message:

{ "*action_for_session":[ "0.5796765610575676", { "*action_sequence":[
            { "*process_xml_message":[
                "<?xml version='1.0'?>
                <result>
                  <interpretation grammar=\"session:callslt_english_production_dot_MAIN__visual__visual1_course@unige.ch\" confidence=\"0.85\">
                  <input mode=\"speech\">a cat</input>
                  <instance>
                     <SWI_literal>a cat</SWI_literal>
                     <value>
                       <r0>
                         <r0 confidence=\"0.85\">null</r0>
                         <r1 confidence=\"0.85\">a_cat</r1>
                       </r0>
                     </value>
                    <SWI_grammarName>session:callslt_english_production_dot_MAIN__visual__visual1_course@unige.ch</SWI_grammarName>
                    <SWI_meaning>{value:{0:{0:null 1:a_cat}}}</SWI_meaning>
                  </instance>
                 </interpretation>
                </result>" ] },
            { "*process_non_speech_input":[ { "*store_most_recent_recorded_wavfile":[ "D:\Projects\Paideia\PaideiaServer\voices\foobar1_271.ulaw" ] } ] } ] } ] }

Typical response:

{ "*action_sequence":[ { "selected":"a cat",
                         "action":{ "*display_matching_info":[ { "recognised":"a cat",
                                                                 "gloss":null,
                                                                 "matching_info":"correct",
                                                                 "level":"undefined",
                                                                 "score":{ "good":1, "bad":0, "streak":1 },
                                                                 "get_next":"true" } ] } },
                       { "action":"ok" } ] }

Using the Mediator to send messages

The Mediator server receives and routes requests from the user to the back-end. These include requests for speech recognition using an audio file (RECOGNIZE), for semantic interpretation of a text string (INTERPET), or for processing a text string (MESSAGE). The server incorporates two listening ports; one for external access (user-initiated requests) and one for internal access (platform-initiated responses). Incoming requests and outgoing responses are linked though a hash table, so that the module is effectively stateless and therefore fast. The routing information is stored as a JSON-formatted string in an external configuration file; each entry defines an application id, the target Regulus Server and one or more target MRCP/SIP Servers.

The messages exchanged between the client and the server are sent over HTTP. The client needs to create POST requests that include different parameters, whereas the result is a JSON-formatted text. The lightweight nature of the JSON notation minimizes the processing time making the whole platform faster. The table below describes the parameters that need to be set in a POST request. The messages are formatted in JSON.

  • appId The id of the application. Example: babeldr_trainslate_french_production_staging
  • grammarId The id of the grammar. Example: callslt_english_production_dot_MAIN__visual__visualCaps@unige.ch
  • sessionId The session id is a dummy parameter reserved for future use. Always use the value 0.
  • ipAddress The IP address of the client. Example: localhost
  • voicefile The actual audio file as part of a multipart request. The file must be formatted as “audio/wav” or “audio/x-wav”. Example: c:/waveform_04.wav
  • interpretText The text that needs to be interpreted. Example: hello
  • messageText The message text that needs to be forwarded to the Regulus server. Example: { "*action_for_session":[ "0.028991163708269596", { "*process_non_speech_input":[ { "*login_user_with_password":[ "babeldr-demo", "babeldr", "" ] } ] } ] }

A recognition request should include the following parameters: appId, grammarId, sessionId, ipAddress and voicefile.

An interpretation request should include the following parameters: appId, grammarId, sessionId, ipAddress and interpretText.

A message request should include the following parameters: appId, grammarId, sessionId, ipAddress and messageText.

As previously noted, the result is a JSON-formatted string of the form:

{
   "appId": <app id>,
   "sessionId": <session id>,
   "result": <result>,
   "status": <status OK|ERROR>
}

For example the result of a login message request is like the following:

{
   "appId": "babeldr_trainslate_french_production_staging",
   "sessionId": "1",
   "result": {"action":"not_administrator"},
   "status": "OK"
}

Logfiles

You can find three kinds of logfiles on the server. For historical reasons, they are kept in rather unobvious locations:

CALL-SLT session logfiles

A Prolog-formatted logfile is created for each session in the directory $CALL-SLT/GUI/session_logfiles. Session logfile names are tagged by username and time, so a typical file is called something like $CALL-SLT/GUI/session_logfiles/mannyrayner_2016-11-30_07-37-56.pl.

The file contains a record for each logged CALL-SLT event. A typical segment looks like this:

event('2016-11-30_07-40-04',set_namespace(test_multimodal)).

event('2016-11-30_07-41-36',set_client(multimedia_client)).

event('2016-11-30_07-41-39',get_l2_for_namespace(test_multimodal)).

event('2016-11-30_07-41-42',set_lesson(colours)).

event('2016-11-30_07-41-45',new_prompt(multimedia(['production/test_multimodal/colours/multimedia/black.jpg'],'Say the name of the colour'))).

event('2016-11-30_07-42-22',
      provide_help([display_prompt(multimedia(['production/test_multimodal/colours/multimedia/black.jpg'],'Say the name of the colour')),
                    display_graphical_prompt(null),
                    speech_help([wavfile_and_info([wavfile='level_independent/ldt/10804_161118052001.flv',transcription=black])]),
                    text_help(['Black']), preference(speech_help)])).

event('2016-11-30_07-42-37',
      recognition_and_match([(wavfile =
                              'c:/speechtranslation/callslt-code/trunk/call-slt/eng/recorded_wavfiles/mannyrayner/2016-11-30_07-42-37/utt_001.wav'),
                             (recognised=black), (surface_prompt='multimedia:black.jpg Say the name of the colour'),
                             (surface_interlingua='Say the name of the colour'), (match=correct), (score=[good=1,bad=0])])).

Note that the recognition_and_match records point to the stored wavfiles.

Regulus session logfiles

The Regulus server creates logfiles showing message traffic to and from clients. One file is created for each server session. Timestamped files are kept in $REGULUS/CALL-SLT/Eng/logfiles. A typical segment looks like this:

dialogue_event('2016-11-23_23-44-31',
               response([(action =
                          provide_help([display_prompt(multimedia(['production/visual_mandarin/colours2/multimedia/black.jpg'],
                                                                  'What is the colour of the square?')),
                                        display_graphical_prompt(null),
                                        speech_help([wavfile_and_info([(wavfile='level_independent/ldt/10822_161121013104.flv'),
                                                                       (transcription='hēisède')])]),
                                        text_help(['黑色的']), preference(speech_help)]))])).
%     TIMING, structure_logging (Timestamp: 2016-11-23_23-44-31)
% "0.02"
% AV. TIMING, structure_logging (Timestamp: 2016-11-23_23-44-31)
% "0.01"
%     TIMING, prolog2json (Timestamp: 2016-11-23_23-44-31)
% "0.01"
% AV. TIMING, prolog2json (Timestamp: 2016-11-23_23-44-31)
% "0.00"
% SENT JSON, json_message (Timestamp: 2016-11-23_23-44-31)
% "{ "action":{ "*provide_help":[ [ { "*display_prompt":[ { "*multimedia":[ [ "production/visual_mandarin/colours2/multimedia/black.jpg" ], "What is the colour of the square?" ] } ] }, { "*display_graphical_prompt":[ null ] }, { "*speech_help":[ [ { "*wavfile_and_info":[ { "wavfile":"level_independent/ldt/10822_161121013104.flv", "transcription":"\u0068\u0113\u0069\u0073\u00e8\u0064\u0065" } ] } ] ] }, { "*text_help":[ [ "\u9ed1\u8272\u7684" ] ] }, { "*preference":[ "speech_help" ] } ] ] } }"
%     TIMING, string_logging (Timestamp: 2016-11-23_23-44-31)
% "0.01"
% AV. TIMING, string_logging (Timestamp: 2016-11-23_23-44-31)
% "0.01"
%     TIMING, full processing of item (Timestamp: 2016-11-23_23-44-31)
% "0.34"
% AV. TIMING, full processing of item (Timestamp: 2016-11-23_23-44-31)
% "0.29"
%     TIMING, json2prolog (Timestamp: 2016-11-23_23-45-41)
% "0.00"
% AV. TIMING, json2prolog (Timestamp: 2016-11-23_23-45-41)

Server trace files

Material sent by the servers to stdout is kept in timestamped directories under $REGULUS/tmp. There is one directory for each instance of invoking the start script. A typical trace file is $REGULUS/tmp/starttracefiles_2016-11-29_11-44-53/callslt_multilingual_production.txt, which contains the trace output for the CALL-SLT production server in the session starting at 2016-11-29_11-44-53. A typical segment looks like this:

Server received (07:43:47):
{ "*action_for_session":[ "0.9094235552474856", { "*action_sequence":[ { "*process_xml_message":[ "<?xml version='1.0'?><result><interpretation grammar=\"session:callslt_english_production_dot_MAIN__test_multimodal__test_colours@unige.ch\" confidence=\"0.68\"><input mode=\"speech\">green</input><instance><SWI_literal>green</SWI_literal><value><r0><r0 confidence=\"0.68\">null</r0><r1 confidence=\"0.68\">incorrect_version_of_red</r1></r0></value><SWI_grammarName>session:callslt_english_production_dot_MAIN__test_multimodal__test_colours@unige.ch</SWI_grammarName><SWI_meaning>{value:{0:{0:null 1:incorrect_version_of_red}}}</SWI_meaning></instance></interpretation></result>" ] }, { "*process_non_speech_input":[ { "*store_most_recent_recorded_wavfile":[ "D:\Projects\Paideia\PaideiaServer\voices\foobar1_589.ulaw" ] } ] } ] } ] }

interpreted as:

action_for_session(0.9094235552474856,action_sequence(process_xml_message(<?xml version='1.0'?><result><interpretation grammar="session:callslt_english_production_dot_MAIN__test_multimodal__test_colours@unige.ch" confidence="0.68"><input mode="speech">green</input><instance><SWI_literal>green</SWI_literal><value><r0><r0 confidence="0.68">null</r0><r1 confidence="0.68">incorrect_version_of_red</r1></r0></value><SWI_grammarName>session:callslt_english_production_dot_MAIN__test_multimodal__test_colours@unige.ch</SWI_grammarName><SWI_meaning>{value:{0:{0:null 1:incorrect_version_of_red}}}</SWI_meaning></instance></interpretation></result>),process_non_speech_input(store_most_recent_recorded_wavfile(D:\Projects\Paideia\PaideiaServer\voices\foobar1_589.ulaw))))

Server response (07:43:48): action_sequence([selected=green,action=display_matching_info([recognised=green,gloss=[],matching_info=different_interlingua([correct=multimedia:red.jpg Say the name of the colour,correct_graphical=null,what_you_said=&lt;FONT COLOR="#ff0033"&gt;green&lt;/FONT&gt;,what_you_said_graphical=null,difference=[sub(Incorrect,multimedia:red.jpg),ins(version),ins(of:),same(Say),same(the),same(name),same(of),same(the),same(colour)]]),level=undefined,score=[good=4,bad=3,streak= -1],get_next=false])],[action=error])

encoded as
{ "*action_sequence":[ { "selected":"green", "action":{ "*display_matching_info":[ { "recognised":"green", "gloss":null, "matching_info":{ "*different_interlingua":[ { "correct":"multimedia:red.jpg Say the name of the colour", "correct_graphical":null, "what_you_said":"&lt;FONT COLOR=\"#ff0033\"&gt;green&lt;/FONT&gt;", "what_you_said_graphical":null, "difference":[ { "*sub":[ "Incorrect", "multimedia:red.jpg" ] }, { "*ins":[ "version" ] }, { "*ins":[ "of:" ] }, { "*same":[ "Say" ] }, { "*same":[ "the" ] }, { "*same":[ "name" ] }, { "*same":[ "of" ] }, { "*same":[ "the" ] }, { "*same":[ "colour" ] } ] } ] }, "level":"undefined", "score":{ "good":4, "bad":3, "streak":-1 }, "get_next":"false" } ] } }, { "action":"error" } ] }
---     Time for full processing of item: 3.19 secs
--- Av. time for full processing of item: 0.91 secs