next previous contents
Next: Testing the ad hoc Up: Introduction Previous: Introduction

Testing on the UBS corpora

UBS publishes an economic bulletin roughly once a month, tracing economic and financial trends for the Swiss and international markets. It contains section titles and figures, percentages, dates, names of the different stock markets and indices, etc. In general, the language used is not specialised jargon and the style used is of a fairly high standard.

The main advantage in using this kind of text is that the evaluator can see what kinds of real-life errors can sneak in and, conversely, where real-life false flagging is likely to happen. The texts we had available are the ones given to the printer for final printing; we can believe that they have gone through revision and approval by the appropriate services. However, on several occasions the checkers' flagging indicated a real error, which may be seen as proof of the usefulness of grammar checkers in general.

One of the options that the checkers offer is that the text can be corrected in batch mode and the corrected file subsequently stored on the hard disk. This is very handy in general, and especially for the kind of testing intended, so that the final result from the grammar checker can be evaluated as a whole and not just as unconnected output from the screen.

False flagging is probably the most serious problem with grammar checkers. It suggests that a sentence is incorrect or inappropriate, or that the style is too familiar or too pedantic. In the long run, users can get so put off by a checker's false flagging that they will decide to do without it. Clearly, this aspect is as important as spotting the error, if not more.

As previously said, there are not many error examples in the UBS texts used, given that they were texts intended for final printing. This should be kept in mind in order to keep things in perspective with regard to the amount of false flagging the checker does. In a sense, if we wanted to use the number of false flaggings versus the number of correct flaggings as a measure, these texts could not be used as testing material, since the result would be unfairly biased.

It should also be noted that not all flagging messages are presented with the same level of explicitness. Indeed, there are different types of message relating to errors. Quite often, the checkers would draw the user's attention to some possible error with a vague message like: Verify this sentence: there is probably a syntactic error; it is then up to the user to figure out what the problem might be. When the program finds more precise indications that something is not grammatical, the error message is something like: Verify if there should be agreement between noun and adjective: noun is singular and adjective is pluralgif On subject-verb agreement, the typical error message is: If noun is the subject of verb, then there is wrong agreement. Especially regarding the French checker, it is only on very short and simple sentences that the error is presented as a statement, and not just as a possibility.

False friends and homonyms are also a great source of overflagging: since many words can be mistakenly used for one another, the checker prompts the user every time it comes across a potential lexical misuse and points out that such a word can be mistaken for another. Usually, it also gives very brief descriptions of the meaning of the two words in question. gif This feature is not a source of false flagging per se: it draws attention to a potential word misuse without control over the flagging. However, there is a menu option to deactivate flagging for both false friends and homonyms, which puts the user in control of the flagging.

Grammar checker for English

The grammar checker for English, E1, provides the possibility of running the program in both interactive and batch mode, where in the latter case the text is marked with the errors detected. In addition, there is also the option of marking one problem at a time during interactive proofreading. These two options, however, present some weaknesses. First, the information provided by marking a problem is usually less rich than that provided in interactive mode. In fact, whereas the former only advises there is an error, the latter also provides suggestions for replacement. This is especially the case (but not the only one) when spelling errors occur. Secondly, in marking a particular problem during an interactive session, the program inserts a piece of text between - and - before the error occurrence. Surprisingly, after inserting such a text, the program keeps on proofreading the document considering the text inserted as part of the sentence to be tested, thus detecting errors such as subject-verb agreement or incomplete sentences.

Well spotted errors

As previously noted, the texts used had already been proofread, therefore they do not present many actual errors. However, the texts contain some errors, e.g. in punctuation (such as extra space before commas), capitalisation (a sentence starting with an uncapitalised word or presence of a comma after e.g. or i.e.) and subject-verb agreement (even in not straightforward cases), and most of these are well caught by the checker. In addition, errors in expressions involving restrictions on uncountable or countable nouns are usually detected.

False flagging

False flaggings found by a grammar checker may be classified according to the level at which the flagging occurs: paragraph, sentence, phrase or word level.

Paragraph level:
Errors at this level are more concerned with the style than with grammar, e.g. a paragraph consisting of one sentence only. This flagging occurs in more or less the same way as for the French version. More details are to be found in section Grammar checker for French.
Sentence level:
In addition to false flagging related to style such as use of long sentences, use of passive voice or use of too many prepositional phrases, quite a lot of false errors are detected because of the ambiguity of many English words that can be identified both as a noun and as a verb. In many of these cases the checker detects errors like subject-verb agreement, or incomplete sentence, or gives warnings about some verbs that do not require objects. Since E1 can view the parts of speech assigned to each word (only in case of error detection) of a sentence, it has been possible to understand where it failed. An example of such false flagging is the following sentence:

-(THIS DOESN'T SEEM TO BE A COMPLETE SENTENCE.)- A comparison of the ECU market rate with the reference or theoretical interest rate shows that the actual three-month ECU deposit rate averaged 0.24 of a percentage point above the theoretical ECU interest rate between September 1984 and June 1989.

In this sentence the word shows is considered a plural noun instead of a verb. However, sometimes, even if no word is recognised as a verb by the grammar checker, no error is flagged. gif

Another noteworthy false error detected by E1 is concerned with parenthesis. The grammar checker does not consider the possibility that the text included between parentheses can be an independent sentence and therefore looks for agreement. gif In addition, we can mention the following example to show another relevant drawback of the grammar checker:

-(THIS DOESN'T SEEM TO BE A COMPLETE SENTENCE.)- Accounting for 7.1 of the EC's GNP (1988; in ECU) and over 4.4 in EC internal trade (average of share in exports and imports), the peseta will have a weighting of 5.3 in the ECU currency basket.

E1 considers a sentence as a string ending with a character from the set ., :, ;. The first warning shown is relative to the sentence

Accounting for 7.1 of the EC's GNP (1988)

which is incomplete. Furthermore, when it checks the second sentence, it loses the knowledge about the occurrence of the first parenthesis, and therefore warns about a mismatched parenthesis.
Phrase level:
E1 fails to recognise expressions such as up to now (suggesting the use of too instead of to if the meaning is `also' or `excessively'), or in turn (warning that an article or another modifier usually precedes the word turn). In addition, whenever the phrase half of occurs, it suggests the use of only half, even if the whole phrase is the first half of 1989 or second half of the month.
Word level:
As mentioned before, some noise occurred when transferring the file containing the texts and most words containing hyphenation have been affected (e.g. shorterm vs. short-term, threeonth vs. three-month, 2ear vs. 2-year). This implied a great deal of involuntary testing of spelling errors, most of which have been detected, with the exception of 2ear or analogous mistakes. For the spelling errors due to this noise, the only correct suggestion for replacement provided by the checker was three-month. Two other types of false flaggings usually occurred: (1) all -ing forms used as nouns are not accepted in the plural forms; (2) it is suggested that a hyphen is not used with prefixes such as non- inter-, but if the hyphen is cancelled, a spelling error occurs.

Grammar checker for French

The grammar checker for French that we tested will be referred to as F1. Although this checker offers the option of running the program in batch mode and storing the corrected file on hard disk, it has been impossible to save an output file with more than about 2,000 words, due possibly to the Windows or the DOS version that our PC runs, but this blocks the proper functioning of F1. This is a major drawback for the user and we had to divide the original files to enable the system to process and save the corrected output. (As an indication, the two original files used were 6103 and 5831 words long).

Well spotted errors

On the texts used, F1 correctly detected punctuation mistakes such as mis-matched parentheses, or an extra space between a word and a comma or full stop. This might be viewed as secondary, but it is one of the errors most overlooked by human revision.

Amongst the traditional grammar errors, F1 concentrates on subject-verb agreement, adjective-noun agreement and spelling errors. On the spelling front, if a wrong spelling is detected in a verbal form, the suggestion list can propose the full declension of the verb, along with other suggestions from dictionary look-up. This is an appreciable facility that should be acknowledged.

F1 also behaves fairly well in spotting wrong agreement in verbal phrases with a past participle form. This is indeed one of those rules that most commonly confuses people and triggers errors in the agreement. In the texts used, this did not happen very often, only a couple of times, and in constructions where it is very easy to get confused.

As an example, F1 correctly spotted the following error in subject-verb agreement, where, in fact, we have a cross-reference between two sentences. Note that the checker makes assumptions as to which NP is likely to be the subject: in the first case there is indeed an error of agreement, but not in the second one.

Le renoncement à une majoration des taux en Europe et au Japon l' < SI `RENONCEMENT' EST LE SUJET DE `ONT', IL Y A UNE FAUTE D'ACCORD. > ont [le dollar] < SI LE SYNTAGME COORDONNE EST LE SUJET DE `A', IL Y A UNE FAUTE D'ACCORD. > au contraire stimulé et la nette compression du déficit commercial en juillet lui a donné des ailes.gif

In this sentence, the subject of the first verb ont (present, 3rd person plural of être, `to be') is indeed the singular noun renoncement (`renouncement') and not taux (`rates'); however, the second part of the sentence does not contain an error, and the checker seems to consider the conjunction to be between two NPs (but then which ones?) and not between two sentences. (In the English version the conjunction was rendered as two separate sentences separated by a full stop.)

Amongst style errors, F1 flagged those sentences starting with a digit and suggests writing out the relevant number in full or re-organising the sentence.

False flagging

As previously done for the description of the English version, the false flaggings found by the checker for French will be grouped according to the level at which the flagging occurs: paragraph, sentence, phrase and word level.

Paragraph level:
Fairly often, F1 would complain of the ``appearance'' of a paragraph, pointing out that ``usually, a paragraph has more than one sentence''.gif Very often, this type of false flagging appears when the checker encounters the title of a section (especially if it contains something the checker may interpret as an inflected verb form). In such cases, the length of the sentence is no more than one line. The checker would flag any paragraph (regardless of its length) if it contained only one sentence.
Sentence level:
There are two major types of false flagging in this category -- either the sentence is found to be too long (which is a simple word count: the number of words used as a basis may be set by the user) or the verb form is not found, is not of the desired voice (passive forms are always flagged as bad style) or is the wrong mood (where subjunctive or conjunctive moods must be used, as part of a construction with a conjunction or a subordinate clause). False flagging is triggered in those cases where, for instance, both subjunctive and indicative are correct, or when the checker wrongly assumes that a conjuction controls a given verb.
Phrase level:
This is where the most false flagging appears. F1 links parts of speech that do not in fact form a phrase, requiring agreement where it is wrong to do so. (For example, ce sont ...avant tout les agents économiques qui ... (``it is ...above all the economic agents who ...'') where agreement between tout (`all') -- singular -- and agents -- plural -- is suggested). Another well-known source of grammatical problems for French is the agreement of past participles. The program fails to recognise some correct constructions involving these. The phrase la décision du le 18 février (``the decision [taken on] February 18th''), was flagged as wrong, on the grounds that 18 was plural, whereas février was singular.
Word level:
F1 has a built-in spelling checker that checks the text before checking for grammatical errors. However, some words are flagged by the grammar checker as inappropriate or bad style (especially some conjunctions). Particularly, the checker forces the user to abbreviate Fr. (the usual abbreviation for Swiss franc) as F or FF, meaning French franc (as opposed to Swiss or Belgian, etc.). This type of error is not caught by the spelling checker (where it would be easy to just add the acronym to the personalised dictionary), but by the grammar checker, which makes it impossible to add it. It also does not allow numbers to be written as figures, even when intended as prices, etc.

next up previous contents
Next: Testing the ad hoc Up: Introduction Previous: Introduction