Note
This documentation is under active development. Criticism is very welcome.
Overview¶
LARA is a reading-listening tool that lets you mark up text to help students to improve their reading ability in a new language. Features include:
concordance pages for words showing where they have occurred before in the student’s own reading progress
these words are shown in contexts in which they have previously occurred.
colour-codes for how often a word has been seen
audio recordings of both segments and individual words
an easy tool for recording the audio
mouseovers for translations
automatically generated flashcards for self-testing
links to grammar resources etc
images and embedded audio files
HTML formatting
support for sign language
LARA pages are intended to be placed on the web and read through a web-browser. All information can be accessed by clicking or hovering with the mouse.
The easiest way understand what LARA does is to look at an example:

On the left, we have a page of text from Peter Rabbit. The reader has just clicked on the word “ran”, and on the right we get a list of all the places in Peter Rabbit where a form of “run” has turned up in the student’s reading history, which so far only consists of this one book. Note that the list contains examples with both “run” or “ran”. Clicking any word, on the left or on the right of the screen, produces a similar list. The reader can hover the mouse over any word to hear a spoken recording, and also get a translation. If they hover the mouse over a loudspeaker icon, they get a translation of the whole preceding sentence, and if they click it they hear an audio recording of the sentence. The colours show how often words have occurred to date. Red means once; green means two or three times; blue means four or five times; black means more than five times. When you start, everything is red. As you read, more and more of the words turn black.
This document explains
how to access content to read it: reader portal
how to create content using LARA tools: constructor portal
You can also download the underlying tools and run them on your laptop, though you’ll probably need some software skills to do that.
In the next section, we’ll start by showing you how to log in to the Portal and interact with some actual LARA content.
Who did what¶
The original LARA concept was suggested by Cathy Chua.
The first version of the LARA core engine was implemented by Manny Rayner in a mixture of SICStus Prolog and Python 3. The second version, which is described in this document, was implemented by Manny Rayner and Matt Butterweck in pure Python 3. The flashcards module was originally implemented by Bjartur Örn Jónsson and improved by Hana Steríková.
The LARA portal and LARA social network were implemented by Hanieh Habibi in PHP. A substantial part of the design is based on suggestions from Branislav Bédi.
The LARA GUI was implemented by Matt Butterweck in Python 3.
Icelandic morphological processing is performed by ABLtagger developed by Steinþór Steingrímsson and Örvar Kárason. Lemmatizing is performed by Nefnir, developed by Jón Friðrik Daðason.
Polish morphological processing is performed by Morfeusz2, developed by Marcin Woliński.
Turkish LARA SAAS servicing from the ITU Turkish NLP pipeline has been developed by Gülşen Eryiğit.
English LARA content has been developed by many people, including Kirsten Anker, Cathy Chua, Gwyn Glasser, Robert Gasser, Marta Mykhats, Chadi Raheb, Manny Rayner and Rosa Ritchie.
French LARA content has been developed by many people, including Cathy Chua, Gwyn Glasser, Marta Mykhats, Chadi Raheb and Manny Rayner.
Irish Gaelic LARA content has been developed by Harald Berthelsen and Neasa Ní Chiaráin.
Icelandic LARA content has been developed by Branislav Bédi.
Old Norse LARA content has been developed by Ingibjörg Iða Auðunardóttir, Branislav Bédi, Brynjarr Eyjólfsson, Birgitta Björg Guðmarsdóttir and Ingibjörg Þórisdóttir.
ÍTM (Icelandic Sign Language) LARA content has been developed by Sigurður Vigfússon.
Farsi LARA content has been developed by Elham Akhlaghi and Hanieh Habibi.
Japanese LARA content has been developed by Junta Ikeda and Hakeem Beedar.
German and Middle High German LARA content have been developed by Matt Butterweck.
Italian LARA content has been developed by Sabina Sestigiani, Catia Cucciarini and Ivana Horváthová.
Israeli Hebrew LARA content has been developed by Ghil’ad Zuckermann.
Barngarla LARA content has been developed by Ghil’ad Zuckermann.
Swedish LARA content has been developed by Harald Berthelsen and Manny Rayner.
Turkish LARA content has been developed by Fatih Bektaş.
Dutch LARA content has been developed by Helmer Strik and Catia Cucciarini.
Danish LARA content has been developed by Pernille Hvalsøe and Manny Rayner.
Polish LARA content has been developed by Anna Bączkowska and students.
Mandarin LARA content has been developed by Yao Chunlin.
Spanish LARA content has been developed by Rebeca López, Roy Lotz and Manny Rayner.
Most of this documentation has been written by Manny Rayner, with contributions from Matt Butterweck and Hanieh Habibi. It has been edited and substantially rewritten by Cathy Chua.
Grateful thanks to Johanna Gerlach for help with LiteDevTools, Philippe Baudrion for organising the CALLector webspace, and Lionel Nicolas and Verena Lyding for flexibility in supporting contacts between the various people involved in developing LARA.
Acknowledgements to funders¶
The greater part of the LARA core software has been developed under funding from the Swiss National Science Foundation.
The greater part of the Old Norse and Icelandic Sign Language content has been developed under funding from the Icelandic Centre for Research.
Table of contents¶
- The reader portal
- The constructor portal
- Annotated images
- Advanced portal functionality
- Unpublishing a resource
- The “Advanced options” tab
- Linguistics papers and similar texts
- Texts with video annotations
- Multi word expressions
- CSS stylesheets
- Adding embedded audio with the <audio> tag
- Special issues for Chinese
- Exporting and importing zip files
- Importing a project developed outside the portal
- Exporting a portal project to use it from the command-line
- Crowdsourcing
- The LARA repository
- Using the Python code: prerequisites
- Using the PHP code: prerequisites and installation
- Local content
- Directory structure
- Writing a local config file
- Config file parameters
- Format of tagged LARA text
- Adding HTML formatting to LARA text
- Special characters
- Heteronyms
- Including non-L2 text
- Adding <img> and <video> tags
- Adding <audio> tags
- Using colours to mark parts of speech (POS)
- Special support for plays
- Picturebook mode
- Phonetic mode
- Parallel LARA texts
- First invocation of LARA compiler (“resources”)
- Recording LARA audio using LiteDevTools
- Creating LARA audio using a TTS engine
- Creating segment audio by cutting up MP3s
- Creating segmented text by aligning against cut-up audio
- Extracting word token audio from sentence audio
- Filling in LARA translation spreadsheets
- Adding notes to words
- Adding images to words
- Second invocation of LARA compiler (“word_pages”)
- Editing a file from the content directory
- Opening the compiled HTML file in the browser
- Creating a new content from a template
- Making your LARA pages accessible
- Summary
- Tagging and segmentation
- Phonetic texts
- GUI Window
- Offline testing
- Distributed content
- Internal documentation: overview of architecture
- Internal documentation: top-level calls to Python
- Segmenting text
- Invoking TreeTagger
- Performing multi word expression annotation
- Performing the “resources” step
- Creating word token translation files from surface word translation files
- Performing the “word pages” step
- Processing LDT output
- Creating LARA audio using a TTS engine
- Adding metadata for distributed LARA
- Merging two language resource directories
- Merging two translation spreadsheets
- Finding deleted lines in a pair of translation spreadsheets
- Getting voices and L1s for a resource
- Getting voices and L1s for a resource file
- Getting the audio and translation files for a corpus resource
- Downloading a resource
- Exporting a corpus resource as a zipfile
- Importing a corpus resource from a zipfile
- Checking well-formedness of a config file
- Checking validity of a LARA ID
- Creating flashcards
- Getting possible flashcard types
- Structured diff on tagged corpus
- Unzipping a file
- Streamed download of binary files
- Converting a CSV file into a JSON file
- Converting a word token CSV file into a JSON file
- Converting a JSON file into a CSV file
- Converting a word token JSON file into a CSV file
- Converting a type or lemma JSON file into a CSV file
- Checking the status of the Concraft server for Polish
- Crowdsourcing a project
- Downloading Picturebook information from the Selector Tool
- Uploading Picturebook information to the Selector Tool
- Creating CSV files for human vs TTS questionnaires
- Formatting data from a human vs TTS questionnaire
- Compiling a reading history
- Incremental compilation of a reading history
- Getting a list of pages for a resource
- Cleaning the reading history cache
- Internal documentation: abstract HTML
- Internal documentation for portal