Advanced portal functionality

In this section, we will cover some more things you can do with the LARA portal that we haven’t mentioned so far.

Unpublishing a resource

Go back to the main menu on the left, and select Available texts and then All published texts. You should see something like this:

_images/PortalAllPublishedTexts1.jpg

We’re logged in as testlara, so almost all the texts belong to other people, but if we look we find the “Little Prince ch1 1” that we made earlier:

_images/PortalAllPublishedTexts2.jpg

There is a cross icon in the Unpublish column opposite the name of our text. If we decide we don’t want it to be generally available any more, we can click on the cross and remove it. This means that “The Little Prince ch 1” will no longer be available through the Reading texts > Read tab.

Note that we have not deleted the original project! If we change our minds, we can go back to Creating texts > My LARA texts, select Edit project for “The Little Prince ch 1”, move to the Create pages tab, and hit the Publish button to republish it. Now the text will be generally available again.

The “Advanced options” tab

We so far haven’t looked at the Advanced options tab in the Create resources screen. Open it, and you’ll see something like this:

_images/PortalAdvancedOptions.jpg

Some of these controls are obvious:

  • Table of contents adds a table of contents built out of your <h1> and <h2> tags. The title for the whole text needs to be marked with <h1>; section titles (e.g. chapters) need to be marked with <h2>. There has to be an end-of-segment mark (||) before the closing tag. So for example, a typical title for the whole text would be <h1>The Little Prince||</h1>, and a typical title for a chapter would be <h2>Chapter 1||</h2>.

  • Coloured words lets you switch off the colours marking word frequencies if you don’t want them.

  • Audio words in colour gives you a different way to use color: anything with an associated audio file is marked in red.

  • Max words per word page lets you change the number of examples on the right hand side.

  • Font changes the font.

  • Font size changes the default font size.

  • Frequency lists in… lets you display the frequency lists on the other side of the page.

For example, if you set Coloured words off and Font to serif, our example will look like this:

_images/PortalLPAdvancedOptions.jpg

The remaining options are explained below.

Linguistics papers and similar texts

It is often useful to be able to add a couple of lines to a LARA document that will be treated as plain text and not marked up by adding audio or translations. We can do this in a normal document by using “comment brackets”, /**/. For example, we could put some information at the beginning of “The Little Prince ch 1” to say what kind of text it is:

<page>
<h1>The Little Prince||</h1>
Antoine de Saint-Exupéry||
/*Simple LARA example illustrating basic functionality, used in online documentation.
Text and illustrations taken from https://archive.org/details/TheLittlePrince-English
LARA markup added by Manny Rayner.*/
<page>

When we upload and remake, the first page will now look like this:

_images/PortalLPPlainText.jpg

In most cases, this will be the right way to add plain text. Nearly all of our text is in the L2, and we use the comment brackets to mark the parts that are in the L1. However, there are some kinds of documents, most obviously linguistics papers and language textbooks, where it’s the opposite: most of the text is in the L1, and the text in the L2 which you want to annotate using LARA is the exception. With this kind of material, using comment brackets gives a very ugly and cluttered layout to the source. You can do better if you set some of the advanced features appropriately.

We illustrate with an example, a short fact sheet about Swedish grammar, written in English. The marked-up source text looks like this:

<h1>Basic Swedish grammar||</h1>

Swedish is an easy language for people who know English.|| There are
just a few tricky things you need to watch out for.||

<h2>Gender||</h2>

Swedish nouns have gender.|| It's not quite the same as
gender in French: instead of masculine and feminine, you have
<i>common</i> and <i>neuter</i>.|| This changes the form of articles
and adjectives that belong to nouns.|| So for example <i>{{bil}}</i>
("car") is a common noun, and "a red car" is <i>{{en röd bil}}</i>.||
But <i>{{tåg}}</i> ("train") is a neuter noun, and "a red train" is
<i>{{ett rött#röd# tåg}}</i>.|| As you see, you use different
indefininite articles for common and neuter, and neuter adjectives
have a different ending.||

<h2>Definiteness||</h2>

Swedish nouns also have an ending that means "the" ("definiteness"),
and the form of this ending depends on the gender.|| So "the car" is
<i>{{bilen#bil#}}</i> and "the train" is <i>{{tåget#tåg#}}</i>.||
If you use an adjective with a definite noun, it needs to be in a
special form, and you also need to add a definite article.|| So
"the red car" is <i>{{den röda#röd# bilen#bil#}}</i> and "the red train"
is <i>{{det röda#röd# tåget#tåg#}}</i>.|| As you see, the definite
form of the adjective is the same for common and neuter.||

<h2>Word order||</h2>
Word order is often pretty much the same as in English, except that
Swedish does questions in a simpler way.|| It's sort of the way English
used to do it a few hundred years ago.|| So "You see the car" is
<i>{{Du ser#se# bilen#bil#}}</i>, but "Do you see the car?" is
<i>{{Ser#se# du bilen#bil#?}}</i>|| This is more or less like
Shakespearian English "Seest thou the car?"||

The main text is in the L1 (English), and is not meant to be marked up; the material inside double curly brackets, {{}} is the L2 text which LARA is going to process. To make this work correctly, we set the buttons in the “Advanced options” tab as follows:

_images/PortalLinguisticsExampleOptions.jpg

The settings have the following meanings:

  • Comments by default We are using brackets to mark L2 text rather than L1 text.

  • Keep comments When we show examples on the right, we want to include the L1 text. The default is to omit it.

  • Audio words in colour We mark the Swedish words with audio in red, so that we can find them easily.

The resulting LARA text looks like this:

_images/PortalLinguisticsExample.jpg

Texts with video annotations

Usually you’ll add audio annotations to the words and segments of a text. It is possible to use video annotations instead. This may sound frivolous, but if you’re designing your text to be read by Deaf people it’s necessary. Deaf learners will have no use for audio, but they may well appreciate video annotations in sign language.

The process of creating video annotations is almost the same as that of creating audio annotations. LARA compiles recording scripts during the “Make resources” phase and uploads them to LiteDevTool; then the person responsible logs in and does the recording. (Obviously, you need a webcam-equipped laptop). Video recording mode will automatically have been switched on.

To make this happen, all you need to do is go into the Advanced options subtab of the Make resources tab and set the Video annotation control to Yes:

_images/VideoRecording1.jpg

If you’re creating sign language videos, it may sometimes be the case that the person doing the signing doesn’t know the language your text is written in and won’t be able to sign from it. This means you’ll need to translate the text language into another language which the signer does know. You can do this as follows:

  • Also set the control Video annotation from translation to Yes.

  • Set Translation language to the language the signer knows.

  • In the Fill out resources tab, enter translation information for words and segments in the usual way.

  • When you’ve entered the translation information, do Make resources again. The video recording task will now show the translated text.

Multi word expressions

One of the most time-consuming tasks in tagging is dealing with multi-word expressions (MWEs). For example, the bold text in the following English sentences can reasonably be considered as MWEs:

  • He didn’t like it at all.

  • We went round and round.

  • You seem to be looking forward to it.

  • I think they have given in.

  • He shook his head.

  • They decided to blow it up.

  • She threw the whole thing away.

Note that MWEs can include inflected forms: for example, looking forward to is an inflected form of look forward to, and given in is an inflected form of give in. Note also that the MWEs do not have to be continuous. The first five examples (at all, round and round, look forward to, give in and shake one’s head are continuous, but blow up and throw away have words in the middle that do not belong to the MWE.

When an MWE is continuous, you can tag it using the @ ... @ construct, for example:

He didn't like it @at all@.
I think they have @given in@#give in#

But this doesn’t work for discontinuous constituents, and the last two sentences have to be tagged roughly as follows:

They decided#decide# to blow#blow up# it up#blow up#.
She threw#throw away# the whole thing away#throw away#.

LARA provides functionality to make tagging of MWEs more systematic. There are four steps:

  • Add any new MWEs to the file of MWE definitions, downloading it first and then uploading the revised file.

  • Run the Make resources step to find candidate MWE matches in your text.

  • Go into the Fill out resources tab to mark which candidate MWE matches are correct.

  • Run the Make resources step again to insert the MWEs you have marked.

The details are as follows.

Define the MWEs

MWEs are put in a plain text file. Go first to Available texts > Multi words resources (advanced) to see if a file already exists. You should see a page that looks like this:

_images/MWELexicaPage.jpg

If there is a line there for the text language you’re using, click on the Download lexicon icon to get the file.

There is one definition per line. Empty lines and lines starting with a hash (#) are ignored.

The simplest kind of MWE definition is a fixed phrase. Here, all the words are written in lowercase, for example:

at all
round and round

If one or more of the words can be inflected, then these words must be marked as such. You can do this either by writing the words that can be inflected in uppercase, for example

LOOK forward to
GIVE in
BLOW up
THROW away

or by placing asterisks at the beginnings and ends of these words, i.e.:

*look* forward to
*give* in
*blow* up
*throw* away

So the MWE definition LOOK forward to or *look* forward to will match “I look forward to”, “she is looking forward to”, etc.

An MWE may contain other words that can vary. In English, a common case is a possessive pronoun, e.g. “shake one’s head” (“I shook my head”, “He shook his head”, etc). You can handle MWEs of this kind by adding a line to define a class of words, and then using the name of the class in the MWE definition, for example:

# Class definitions
class: one's my your his her its our their
class: oneself myself yourself himself herself itself ourselves themselves

SHAKE one's head
TAKE one's time
ENJOY onself
BRACE oneself

In some languages, the words in an MWE may occur in more than one order. A common case in French is reflexive verbs, where the reflexive pronoun usually comes before the verb (“Je me repose”) but comes after it in an imperative clause (“Reposez-vous”). To be able to handle this systematically, it is also possible to add transform lines to a file of MWE definitions. A transform line starts with transform and maps a left hand side to a right hand side. “Variables”, words which are the same on both sides are indicated by enclosing them in asterisks. So the French example is handled as follows:

transform: se *verb* -> *verb* toi
class: toi toi vous

This means that for any MWE entry matching the left hand side, e.g. se REPOSER, a second entry is automatically added, here REPOSER toi.

When you have finished editing the MWE definitions file, go to Creating texts > import external materials. You should see a screen that looks like this:

_images/UploadMWELexicon.jpg

Select Upload MWE definitions, choose the text language from the menu, and then click on Upload zip file to select your file.

Find possible MWE matches

Once you’ve updated your file of MWE definitions, do Make resources to apply them to your text and get a file of possible matches. Now go to Fill out resources > Multi word expressions > Multi words annotation. You should see a screen that looks like this:

_images/CheckMWEMatches.jpg

In each line, you’ll see the candidate MWE highlighted in red, and you use the radio buttons to mark whether it really is an MWE. Here, “in fact” and “shut up” are definitely MWEs, so you should mark them as Yes. “Do in” is definitely not an MWE (“in” is part of “in the medical line”), so should mark it as No. “Run to” is a borderline case: it’s not completely clear whether the “to” is part of “run to” or “to specialists”. I would personally say it’s not an MWE here. So you could mark things like this:

_images/CheckMWEMatchesDone.jpg

When you’ve finished, do Save and exit. Then go back to Create resources and do Make resources again. While you’re working with the MWEs, you may also want to set Coloured words to No and MWE words in colour to Yes in Advanced options:

_images/MWEWordsInColour.jpg

When you now go to the Create pages tab and do Create pages followed by Preview pages, the relevant part of your LARA text will look like this:

_images/MWEWordsInColourExample.jpg

CSS stylesheets

If you know how to write CSS, you can add CSS stylesheets to a LARA document. There are two ways to do this. If you have a single stylesheet that you want to apply to the whole document, you can upload it by marking the checkbox Import CSS for content at the bottom of Advanced options.

In many cases, though, you’ll want to apply a CSS style sheet only to certain pages, and you may want to use more than one sheet. You can do this by adding the name of the CSS sheet to the relevant <page> tags; if you’ve included references to CSS style sheets in this way, the portal will ask you to upload a zipfile of them at the end of the Create resources tab, in the same way that it asks for embedded images.

Here’s a simple example, continuing “The Little Prince ch 1”. We add a style sheet, title_page.css to the first page to say that we want larger fonts for the heading and the main text:

h1
{
      font-family: sans-serif; font-size: 3em;
}

p {
      font-family: sans-serif; font-size: 2em;
}

and we add a reference to the marked-up LARA text:

<page css_file="title_page.css">

<h1>The Little Prince||</h1>
Antoine de Saint-Exupéry||
/*Simple LARA example illustrating basic functionality, used in online documentation.
Text and illustrations taken from https://archive.org/details/TheLittlePrince-English
LARA markup added by Manny Rayner.*/
<page>

When we upload the new version of the marked-up text, we get this notification:

_images/PortalUploadPageCSS.jpg

We zip up the CSS file and upload it as requested. When we create the pages at the end, the first page looks like this:

_images/PortalLPPage1WithCSS.jpg

Adding embedded audio with the <audio> tag

It is possible to insert <audio> tags, to include links to independent pieces of audio content. Most texts will not require this functionality. It is however useful for poems, where you may want to be able to hear larger parts of the poem than a single segment read aloud. The following example shows how to do it:

<h1>Giuseppe Ungaretti (1888-1970)</h1>||

<h2>Soldati</h2> ||
A Selection of Modern Italian Poetry in Translation. Translation. Soldiers. Roberta L. Payne, McGill-Queen’s University Press, 2004, pp. 114-115||

<audio src="Soldati.mp3"/>

Si sta come||
d'autunno||
sugli alberi||
le foglie.||

Here, the element <audio src="Soldati.mp3"/> says to insert an audio control to play the file Soldati.mp3, which contains a reading of the entire poem.

At the moment, you need to record your embedded audio on your own machine (a good tool is Audacity). Add the <audio> tags, and when you upload the tagged file you will be prompted for a zipfile of embedded audio, in the same way as with embedded images.

Adding a combined audio file for a whole page

If the src field of an embedded audio tag has the special value this page, the audio used is created by concatenating all the mp3 files in the page where the tag appears. The concatenation is performed automatically as part of the second stage of compilation.

Note that the concatenation will only work if all the mp3s have the same sampling rate. This will almost always be true. If for some reason your mp3s have a mixture of sampling rates, you can create a copy of the audio directory containing them using a command of the form:

python3 lara_run_for_portal.py copy_audio_dir_with_uniform_sampling_rate <Dir> <Dir1> <ConfigFile>

where <Dir> is the original audio directory, <Dir1> is the new directory with uniform sampling rate, and <ConfigFile> is the config file for the project in question.

Special issues for Chinese

The differences between Chinese and European orthography mean that some special issues need to be addressed when creating a LARA document for a Chinese language. At the moment, Mandarin, Cantonese and Taiwanese are marked as Chinese languages.

Correcting the tagged document

Chinese words are not inflected, so a tagger will have nothing to do. In contrast, segmentation in Chinese is highly non-trivial. No spaces are included between written words, and the segmenter’s task is to insert inter-word boundaries, marking them with a vertical bar (|). This marking will hardly ever be completely correct. In most cases, it is consequently necessary to download the tagged file, readjust the segmentation, and then upload it again. The controls for downloading and uploading are on the first screen:

_images/DownloadUpload.jpg

Highlighting on hovering

The lack of spaces in Chinese means that in general it is not clear which word the mouse is pointing to. You can change LARA’s default behaviour to highlight the current word by uploading a CSS file with the following content:

a:hover {
          color: red;
      }

The control to upload the CSS file is under Advanced options on the first screen:

_images/AdvancedControlsChinese.jpg

Adding pinyin

It is possible to add pinyin (roman transliterations) to the popup word translations by uploading a pinyin file. The control used to do this is shown in the screenshot immediately above, and is found under Advanced options on the first screen.

The pinyin file is created from the tagged file by adding pinyin in parentheses after each hanzi (Chinese character). Thus if the original line from the tagged file is

熊|和|兔子|的|故事||

a possible pinyin corpus line would be

熊(xióng) |和(hé) |兔(tù) 子(zǐ) |的(dí) |故(gù) 事(shì) ||

A pinyin file can be created automatically using the tool at https://www.chineseconverter.com/en/convert/chinese-to-pinyin. Select the option 我(wǒ), paste the contents of the tagged file into the buffer, and then copy the output into the pinyin file.

Exporting and importing zip files

You can download a zipfile of a project by going to Creating texts > My LARA texts and clicking on the Export zip file icon. Here’s how we do it for “The Little Prince ch 1”:

_images/PortalExportZipfile.jpg

You’ll get a zipfile which you can save on your own machine.

You can upload a zipfile of this kind by going to the Import zip file tab:

_images/PortalImportZipfile.jpg

Click on the upload button and select the zip file you downloaded. When you’ve uploaded it, you’ll be able to see your new project by going back to Creating texts > My LARA texts:

_images/PortalShowImportedProject.jpg

This could for example be a useful thing to do if I wanted to make a copy of my project and change the translation language. I click on Edit project for the new imported project and get this screen:

_images/PortalShowImportedProject2.jpg

I can now edit it to change the translation language to Swedish (I also change the name):

_images/PortalShowImportedProject3.jpg

When I go to the word translation page, I can fill in Swedish translations:

_images/PortalShowImportedProject4.jpg

As we hoped, mousing the words in the compiled pages shows Swedish.

Importing a project developed outside the portal

If you have developed a LARA project using the command-line tools described in the later sections of this documentation and you have organised your directories in the recommended way, you can fairly easily import it to the portal. Corpus and segment related information should be in a project directory, and word related information should be in a language directory. Let’s say that the project directory is called $LARA/Content/my-project, with subdirectories corpus, audio and translations, the config file for the project directory is $LARA/Content/my-project/corpus/local_config.json and the language directory is called $LARA/Content/my-language, with subdirectories audio and translations. You can then import the project as follows:

  • Zip up $LARA/Content/my-language to the zip file $LARA/tmp/my-language.zip.

  • Create an export zipfile for the project using the command-line call

    python3 $LARA/Code/Python/lara_run_for_portal.py make_export_zipfile $LARA/Content/my-project/corpus/local_config.json $LARA/tmp/my-project.zip
    
  • Go to the portal and import $LARA/tmp/my-project.zip to create a new project my-project.

  • On the portal, open my-project. Under Make resources > Advanced options, tick Import external resources and upload $LARA/tmp/my-language.zip. This will merge the translation and audio resources from $LARA/Content/my-language with any existing resources for that language.

Exporting a portal project to use it from the command-line

In the opposite direction, you may want to take a project originally developed inside the portal and export it so that it can be run from the command-line. A common reason is that you want to use a piece of command-line functionality which hasn’t yet been wrapped for inclusion in the portal.

Reversing the example from the previous section, let’s suppose that your project in the portal is called my-project, it is for the language my-language, you want to import it to a command-line project that will be in the directory $LARA/Content/my-project, and there is already a directory for my-language called $LARA/Content/my-language. (If you don’t have a directory for my-language you can create an empty one). You do the following:

  • Export my-project from the portal to create a zipfile, which we save as $LARA/tmp/my-project_exported.zip.

  • Import your project for use from the command-line using a command-line call of the form

    python3 $LARA/Code/Python/lara_run_for_portal.py import_zipfile $LARA/tmp/my-project_exported.zip $LARA/Content/my-project $LARA/Content <ConfigFile>
    

    Here, $LARA/Content is the directory above the language directory and <ConfigFile> is a config file which has appropriate declarations for tmp directories. If the default tmp directories are being used, any syntactically valid config file will do.