Offline testing

There are some simple tools for carrying out offline regression testing of the Python functionality. The idea is straightforward: you provide a JSON file containing a list of config files, possibly together with some other information, and perform a command-line call to carry out a certain type of processing on the content referenced by each config file. At the end, the script writes out a timestamped JSON file summarising the test. It compares the result of processing config file with the result in the most recent relevant logfile. Logfiles are placed in the directory $LARA/logfiles.

A call to the offline testing functionality looks like this:

python3 $LARA/Code/Python/lara_run.py batch <FileWithConfigFiles> <TestingOperation>

We currently have the following testing operation supported:

  • resources_and_word_pages, which performs the resources step followed by the word_pages step.

  • export_import, which first exports the content to a zipfile and then imports it again, before performs the resources step followed by the word_pages step.

  • tagging, which performs the treetagger step following by internalising the tagged file.

  • distributed, which performs the compile_reading_history step followed by the compile_next_page_in_history step.

resources_and_word_pages

A simple call for resources_and_word_pages looks like this:

python3 $LARA/Code/Python/lara_run.py batch config_files_small.json resources_and_word_pages

Here, the contents of config_files_small.json are:

[{ "config_file": "$LARA/Content/lorem_ipsum/corpus/local_config.json" },
 { "config_file": "$LARA/Content/mary_had_a_little_lamb/corpus/mary_had_a_little_lamb.json" },
 { "config_file": "$LARA/Content/peter_rabbit/corpus/local_config.json" }
 ]

This produced trace output on stdout and also the logfile $LARA/logfiles/2020-04-24_14-35-06.json, the contents of which look like this:

{
  "lorem_ipsum": {
      "audio_and_translation_files": {
          "segments": {
              "not_recorded": 0,
              "not_translated": 0,
              "recorded": 11,
              "translated": 11
          },
          "words": {
              "not_recorded": 62,
              "not_translated": 62,
              "recorded": 0,
              "translated": 0
          }
      },
      "result": "okay",
      "time_taken": 4.03,
      "word_pages": 73
  },
  "mary_had_a_little_lamb": {
      "audio_and_translation_files": {
          "segments": {
              "not_recorded": 0,
              "not_translated": 1,
              "recorded": 1,
              "translated": 0
          },
          "words": {
              "not_recorded": 18,
              "not_translated": 17,
              "recorded": 0,
              "translated": 0
          }
      },
      "result": "okay",
      "time_taken": 2.41,
      "word_pages": 25
  },
  "peter_rabbit": {
      "audio_and_translation_files": {
          "segments": {
              "not_recorded": 2,
              "not_translated": 2,
              "recorded": 39,
              "translated": 39
          },
          "words": {
              "not_recorded": 0,
              "not_translated": 0,
              "recorded": 384,
              "translated": 356
          }
      },
      "result": "okay",
      "time_taken": 34.36,
      "word_pages": 389
  }

}

If I now add some incorrect tagging to lorem_ipsum to make the corpus file syntactically incorrect and rerun the test, I get the new logfile $LARA/logfiles/2020-04-24_14-38-42.json, which looks like this:

{
  "lorem_ipsum": {
      "last_good_result": "2020-04-24_14-35-06",
      "result": "bad_word_pages_step",
      "time_taken": 1.76
  },
  "mary_had_a_little_lamb": {
      "audio_and_translation_files": {
          "segments": {
              "not_recorded": 0,
              "not_translated": 1,
              "recorded": 1,
              "translated": 0
          },
          "words": {
              "not_recorded": 18,
              "not_translated": 17,
              "recorded": 0,
              "translated": 0
          }
      },
      "result": "okay",
      "time_taken": 1.72,
      "word_pages": 25
  },
  "peter_rabbit": {
      "audio_and_translation_files": {
          "segments": {
              "not_recorded": 2,
              "not_translated": 2,
              "recorded": 39,
              "translated": 39
          },
          "words": {
              "not_recorded": 0,
              "not_translated": 0,
              "recorded": 384,
              "translated": 356
          }
      },
      "result": "okay",
      "time_taken": 12.74,
      "word_pages": 389
  }

}

export_import

The export_import is exactly the same as the resources_and_word_pages operation described above, except that the content is first written out as an export zipfile, and then imported again to a temporary directory.

tagging

The tagging operation is similar to the resources_and_word_pages operation described above. The formats of the input and output files are the same; the only difference is the operation performed.

distributed

The tagging operation is a little more complicated. Here, a distributed config file needs to be used, containing a reading history. Each item in the offline test file specifies the config file plus the names of the corpus resource and associated language resource from which the next page will be generated. The test consists of creating pages for the full reading history, followed by creating an extra page from the specified corpus resource.

A minimal example illustrates. The batch file, containing one item, is as follows:

[ { "config_file": "$LARA/Content/reader1_english/distributed_config.json",
    "corpus_id": "alice_in_wonderland",
    "language_resource_id": "english_geneva"
}

]

where the contents of the config file, $LARA/Content/reader1_english/distributed_config.json, are:

{
  "id": "reader1_english",
  "l1": "french",
  "resource_file": "$LARA/Content/all_resources.json",
  "reading_history": [[ "peter_rabbit", "english_geneva", [ 1, 26 ]],
                      [ "alice_in_wonderland", "english_geneva", [ 1, 18 ]],
                      [ "ogden_nash", "english_geneva", [ 1, 1 ]],
                      [ "four_little_children", "english_geneva", [ 1, 5 ]],
                      [ "edward_lear", "english_geneva", [ 1, 1 ]],
                      [ "mary_had_a_little_lamb2", "english_geneva", [ 1, 1 ]],
                      [ "gettysburg", "english_geneva", [ 1, 1 ]]
                     ],
  "preferred_voice": "cathy",
  "audio_mouseover": "yes",
  "translation_mouseover": "yes",
  "segment_translation_mouseover": "yes",
  "allow_table_of_contents": "yes",
  "max_examples_per_word_page": 10

}

The output produced looks like this:

{
  "reader1_english": {
      "main_text_files_after_full_distributed": 53,
      "main_text_files_after_next": 54,
      "result": "okay",
      "time_taken": 186.09,
      "time_taken_for_full_distributed_step": 182.93,
      "time_taken_for_next_page_step": 3.15
  }

}

This shows that one extra main text page was produced after the next_page step (i.e. it succeeded), and that the next page step took 3.15 seconds.