ISSCO Working Papers and Technical Reports



Working PapersTechnical Reports



Working Papers

Please Note: abstracts of ISSCO Working Papers from 1 to 33 are not available



No 1 Causality and Reasoning - R.C. Schank, 1973

No 3 Is there a Semantic Memory? - R.C. Schank, 1974

No 5 He will make you take it back: A Study in the Pragmatics of Language - E. Charniak, 1974

No 6 Understanding Paragraphs - R.C. Schank, 1974

No 7 Selezione di Paroloe per l'estrazione di unita fonetiche atte alla sintesi della lingua tedesca e italiana - G.B. Debiasi and A.M. Mioni, 1974

No 8 Fonetica e fonologica autonoma della lingua Hindi - R. Galassi, 1974

No 12 Konzepttheorie - Ein praktischer Beitrag zur Textverarbeitung und Textrezeption - W. Samlowski, 1974

No 13 A Partial Taxonomy of Knowledge about Actions - E. Charniak, 1975

No 17 Seven Theses on Artificial Intelligence and Natural Language - Y. Wilks, 1975

No 19 Frames, Planes and Nets: A Synthesis - G. Scragg, 1975

No 20 A Structure for Actions - G. Scragg, 1975

No 21 On understanding German noun clusters - W. Samlowski, 1976

No 23 A process to implement some word-sense disambiguation - P. Hayes, 1976

No 24 On the referential/attributive distinction - E. Charniak, 1976

No 26 Pragmatic aspects of noun cluster understanding in German - W. Samlowski, 1976

No 28 A framed PAINTING: the representation of a common sense knowledge fragment - E. Charniak, 1976

No 29 How to connect frames - M. Wettler, 1976

No 30 Mutatis Mutandis - M. King, 1976

No 31 Ms Malaprop - a language comprehension program - E. Charniak, 1976

No 32 Knowledge structures as guides to learning process - M. Wettler, 1978

No 33 Knowledge representation in database systems - M. King, 1979

No 34 The Naive Physics Manifesto - P. Hayes, 1978
AI needs non-toy worlds to experiment with. This paper is a proposal for the construction of a sizeable proportion of common sense knowledge about the everyday physical world, eg objects, shape, space, movement, substances, time, etc. The proposal is discussed from various different angles: why it is different from other, superficially different ones; why it needs to be done; that it can be done; how it can be done. Throughout, the author's commitment to a well defined theory of meaning is presupposed. This is compared to other views, and justified.

No 35 Ontology for Liquids - P. Hayes, 1978
This is the first essay in Naive Physics which treats the formalisation of common sense knowledge about liquids. The first part of the paper deals with the problem of dealing identifying pieces of liquid, resolved by talking about contained spaces. Formal definitions for surfaces, containers, and portals are given which draw freely from homology and homotopy therory. The logical consequences of these are used in the second the half of the paper to describe "histories" that liquid can partake in. Histories, intuitively, are the temporal extension of defined pieces of space in which "something happens". Relationships between histories mirror relationships between spaces defined in the first half of the paper. Inferences about the behaviour of liquids are shown to follow from the definitions developed.

No 37 Semantic Long Term Memory and the Understanding of Language - M. Wettler, 1978
This is a study of the formal representation of conceptual knowledge in relation to the understanding and production of utterances. After the initial historical introduction, the final sections deal with the structure of schemata; inference and consistency of schemata; topic shifts in dialogues; the generation and description of surface structure.

No 38 Six Lectures from Recursive Function Theory to Artificial Intelligence - G. Trautteur, 1979
This paper is based on a series of lectures given at ISSCO by the author. The first chapter introduces the theory of effective procedures, leading up to a statement of Church's thesis. The second investigates the relationship between language and metalanguage, bringing in the notions of a decision problem and incompleteness. The last section discusses various attempts to formalise the idea of complexity, and this is followed by the presentation of inductive inference, language identification, and dialogue systems from the point of view of recursive machinery in the theory of effective procedures is used to define a position for AI on various philosophical problems.

No 39 Some Considerations in Mapping Natural Language Queries onto DB Systems - L. Mazlack, 1979
The aim of this paper is to outline the design of an NL interface for existing database systems. Several examples are discussed where the same query is expressed by different NL expressions. A two-stage design is proposed that incorporates the notion of an intermediate database-independent query language which is then mapped into queries for the particular database under consideration.

No 40 Case in Linguistics and Cognitive Science - M. Rosner and H. Somers, 1980
The meaning of case as applied to sentences, verbs, events, and event types is quite different. Examples of the use of case in each of these four categories are exemplified and contrasted in detail. An attempt is made to show that comparisons of systems that use case should be avoided when in reality they have only trivial surface similarities. It is concluded that there is no on case grammar; in fact the use of case is always relativised to a specific application.

No 41 ISSCO-PTOSYS: Brief Description and User Manual - H. Somers, 1980
This paper is a description of PTOSYS (ptotic system), a computer program that builds case frame representations of input English sentences. A central feature of the system is the incremental construction of a dictionary structure which is built up through an interactive dialogue with the user. The system invokes this process whenever it discovers a word which it does not recognise, using its linguistic knowledge to guess the most pertinent questions.

No 42 The Use of Verb Features in Arriving at a Meaning Representation - H. Somers, 1981
A critical examination of the linguistic theory underlying the notion that in trying to map English sentences on to a case representation of deep structure, it is possible to use semantic features attached to the main verb to infer the correct type of case frame.

No 43 Conversational Coherency in Technical Conversations - R. Reichmann, 1979
An analysis of technical conversations is presented which uses the framework of tontext spaces developed earlier for the analysis of social exchanges. Both forms of discourse are shown to display similarities that are brought out by this form of analysis. A number of instances of surface linguistic phenomena, such as deictic "that", present progressive tense, pronominalisation, and clue words such as "it's like", "now", are presented and discussed: These phenomena are accounted for in terms of the underlying discourse structure and the resulting state and focus level assignments to discourse constituents.

No 44 A preliminary Study of the Linguistic Implications of Resource Control in Natural Language Understading - J.S. Bien, 1980
This paper presents some hypotheses concerning the organisation of language processing by human and computer, which allow us to view a number of apparently unrelated linguistic phenomena in terms of the sophisticated interactions between a few basic components. In particular, the fact that English articles are rendered in Slavonic Languages mainly by word order and vice versa, which has up till now remained completely mysterious, is explained by assuming different ways of controlling the depth of nominal phrase processing.

No 45 Upward Branching Phrase Markers: the state of the debate - G. Sampson, 1980
A 1975 paper by the author ("the single mother condition") argued that the definition of phrase-marker in linguistics should, for reasons both of empirical adequacy and theoretical elegance, be modified to permit both upward as well as downward branching. This paper examines various reactions to this proposal that have been published. Certain criticisms are accepted as valid but not fatal. The rest turn out to be based on a missunderstanding of the original claim.

No 46 Three Strategic Goals in Conversational Openings - M. Rosner, 1981
This paper tries to explain a short transcript of a conversational opening as completely as possible within the framework which takes conversational behaviour as defined by the operation of a sophisticated planning mechanism. It is argued that a critical role is played by the satisfaction, for each participant, of three strategic goals relating to attention, identification, and greeting. Additional tactics for gaining information are also described as necessary to account for this transcript.

No 47 A Poor Man's Flavor System - F. di Primio and T. Christaller, 1983
This paper is the result of an attempt to understand 'flavors', the object oriented programming system in Lispmachine Lisp. The authors argue that the basic principles of such systems are not easily accessible to the programming public, because papers on the subject rarely discuss concrete details. Accordingly, the author's approach is pedagogical, and takes the form of a description of the evolution of their own flavor system. An appendix contains programming examples that are sufficienly detailed to enable an average Lisp programmer to build a flavor system, and experiment with the essential concepts of object-oriented programming.

No 48 A Government-Binding Parser for French - E. Wehrli, 1984
This paper describes a parser for French based on an adaptation of Chomsky's Government and Binding theory. Reflecting the modular conception of GB-grammars, the parser consists of several modules corresponding to some of the subtheories of the grammar, such as X bar, binding, etc. Making an extensive use of lexical information and following strategies which attempt to take advantage of the basic properties of natural languages, this parser is powerful enough to produce all of the grammatical structures of sentences for a fairly substantial subset of French. At the same time, it is restricted enough to avoid a proliferation of alternative analyses, even with highly complex constructions. Particular attention has been paid to the problem of the grammatical interpretation of whyphrases, to critic constructions, as well as to the organisation and management of the lexicon.

No 49 AI Approaches to Machine Translation - P. Shann, 1985
This paper examines some experimental AI systems that were specifically developed for machine translation (Wilks' Preference Semantics, the Yale projects, Salat and CONTRA). It concentrates on the different types of meaning representation used, and the nature of the knowledge used for the solution of difficult problems in MT. To explore particular AI approaches, the resolution of several types of ambiguity is discussed from the point of view of different systems.

No 50 Machine Translation: Pre-ALPAC history, Post-ALPAC overview - B. Buchmann and S. Warwick, 1985
This paper gives a historical overview of the field of Machine Translation (MT). The ALPAC report, the now well-known landmark in the history of MT, serves to delimit the two sections of this paper. The first section, Pre-ALPAC history, looks in some detail at the hopeful beginnings, the first euphoric developments, and the onsetting disillusionment in MT. The second section, Post- ALPAC overview, describes more recent developments on the basis of current prototype and commercial systems. It also reviews some of the basic theoretical and practical issues in the field.

No 51 Software Engineering for Machine Translation - R. Johnson and M. Rosner, 1985
In this paper we discuss the desirable properties of a software environment for MT development, starting from the position that successful MT depends on a coherent theory of translation. We maintain that such an environment should not just provide for the construction of instances of MT systems within some preconceived (and probably weak) theoretical framework, but should also offer tools for rapid implementation and evaluation of a variety of experimental theories. A discussion of some potentially interesting properties of theories of language and translation is followed by a description of a prototype software system which is designed to facilitate practical experimentation with such theories.

No 52 A Tutorial on Machine Translation (french version of No 53) - M. King, 1987
Initially prepared for a pre-Coling Tutorial in 1986, this paper gives a comprehensive account of Machine Translation from a historical and linguistic point of view. The main problems associated with MT are outlined, and typical system architectures are briefly described. A number of linguistic theories of specific interest to MT are introduced and a brief account is given of some standard software techniques. The tutorial further attempts to show how these ideas are related to work on MT within the Artificial Intelligence paradigm. The paper is aimed at the general reader without specialist knowledge in the area.

No 53 A Tutorial on Machine Translation - M. King, 1987
Initially prepared for a pre-Coling Tutorial in 1986, this paper gives a comprehensive account of Machine Translation from a historical and linguistic point of view. The main problems associated with MT are outlined, and typical system architectures are briefly described. A number of linguistic theories of specific interest to MT are introduced and a brief account is given of some standard software techniques. The tutorial further attempts to show how these ideas are related to work on MT within the Artificial Intelligence paradigm. Also available in French as Working Paper No 52, the paper is aimed at the general reader without specialist knowledge in the area.

No 54 Belief Ascription, Metaphor, and Intensional Identification (100kB) - A. Ballim, Y. Wilks and J. Barnden, 1988
This paper discusses the extension of "ViewGen", an existing algorithm for belief ascription, to the areas of speech acts, intensional object identification and metaphor. ViewGen represents the beliefs of agents as explicit, partitioned proposition-sets known as environments. Environments are convenient, even essential, addressing important pragmatic issues of reasoning. The paper concentrates on showing that the transfer of information in non-metaphorical belief ascription can all be seen as different manifestations of a single environment-amalgamation process. The paper also briefly discusses the addition of a heuristic relevance- determination procedure to ViewGen, and justifies the partitioning approach to belief ascription.

No 55 Hetis: A Heterogeneous Inheritance System (55kB) - A. Ballim, S. Candelaria de Ram and D. Fass, 1988
Certain inadequacies of homogeneous inheritance systems have caused an interest in "heterogeneous inheritance systems". Such systems allow for a mixing of strict information with defeasible information. However, few well-founded systems have been proposed and heterogeneous systems have been considered not yet well understood. This paper presents a theory and implementation of a heterogeneous inheritance system. The principles of the system are that (i) "rules of composition" allow paths to be considered as single links (called "effective relationships"), and (ii) "rules of comparison" allow selection of those effective relationships rules are enumerated and discussed, then an implementation of the theory is shown. An example of the system is explained in detail. Finally, this work is compared and contrasted with some recent work by Horty and Thomason.

No 56 Form and Contents in Semantics - Y. Wilks, 1989
This paper continues a strain of intellectual complaint against the presumptions of certain kinds of formal semantics (the qualification is important) and their bad effects on those areas of artificial intelligence concerned with machine understanding of human language. The paper begins with a critical examination of Lifschitz' (out of McCarthy) use of epistemological adequacy. The paper then moves, rather more positively, to contrast forms of formal semantics with a possible alternative: commonsense semantics. Finally, as an in-between case of considerable interest, it examines various positions held by McDermott on these issues and concludes, reluctantly, that, altough he has reversed himself on the issues, there was no time at which he was right.

No 57 L'analyse morphologique du français et de l'italien avec le lexique ALVEY - P. Bouillon and L. Tovena, 1990
Ce travail présente l'analyseur morphologique ALVEY et souligne son intérêt pour l'analyse morphologique de deux langues romanes, comme le français et l'italien. Nous montrons que le lexique ALVEY, conçu à l'origine pour l'analyse morphologique de l'anglais, permet de traiter de manière efficace du problème de la suffixation française et italienne. Nous mettons aussi en évidence les limites du programme et nous proposons différentes améliorations qui permettraient de les éviter.

No 58 Le rôle de la représentation sémantique dans un système de traduction multilingue - K. Boesefeldt and P. Bouillon, 1991
Le travail présenté se situe dans le domaine du langage naturel et du traitement de la parole. Nous y abordons différents problèmes qui se posent lors de l'élaboration d'un système de traduction automatique et plus précisement de la traduction automatique des bulletins d'avalanches de l'allemand vers le français. Ces bulletins traitent d'un monde limité et utilisent un sous-langage bien défini. Lors de l'implémentation des divers phénomènes linguistiques, nous nous sommes inspiré des techniques utilisées par l'homme au cours de l'élaboration d'une traduction, en profitant notamment des restrictions imposées par le sous-langage. Il s'agit donc d'une approche tournée vers une application concrète. Nous soutenons, cependant, que les solutions implémentées dans le cadre de ce projet bien défini sont également valables pour d'autres applications dans le cadre de la traduction automatique. Enfin, nous tentons de démontrer l'efficacité du formalisme d'unification lors de l'écriture de grammaires. Le langage utilisé au cours du projet de traduction automatique des bulletins d'avalanches, ELU, a été développé à l'ISSCO et se fonde sur ce formalisme.

No 59 Discussion Paper - TSNLP (LRE Project) - What is a Test Suite? - D. Estival, K. Falkedal and S. Lehmann, 1994

No 60 Discussion Paper - TSNLP (LRE Project) - Virtual and Actual Test Suites - K. Falkedal, 1994

No 61 Rapport Final -- Projet EST (7RUPJO38421) (in English) - Développement d'outils et de données linguistiques pour le traîtement du langage naturel) - D. Estival, D. Tufis and O. Popescu, 1994
The project 'Developpement d'outils et de donnees linguistiques pour le traitement du langage naturel', which was sponsored by the 'Fond National Suisse pour la Recherche Scientifique' in the EST program, had two partners:
  • ISSCO, at the University of Geneva,
  • the Center for Advanced Research on Machine Learning, Natural Language Processing and Conceptual Modeling, of the Romanian Academy in Bucharest (Romania).

This one-year project started in September 1993 and is now coming to its end. In the proposal for the project, we had outlined two goals and correspondingly this report is divided into two sections:

  1. the porting of the ELU system from Sun workstations to Macintosh computers.
  2. the computational description of the Romanian language.

The initial reason for wanting to port ELU on personal computers was the easier access that Romanian scientists have to such platforms. As we had expected however, during the course of the project a growing interest for the Macintosh version of ELU in a number of other places became manifest and we have already received a number of requests for Mac ELU since it was announced in the ELSNEWS newsletter (Estival, Tufis and Popescu 1994).

With respect to the second goal of the project, we have to say that the computational description of the Romanian inflectional morphology which was achieved during the course of the project is now the most complete and most accurate description available. Moreover, it is the first one to have been written within a modern unification- based framework.



No 62 Dialogue Acts: One or More Dimensions? - A. Popescu-Belis, 2005
This report surveys the main theories of dialogue and communication that have been used to devise dialogue act tagsets, distinguishing theories that deal with a specific level of communication from theories that integrate several levels. The report proceeds to analyse four dialogue act tagsets that were used to annotate large scale dialogue corpora, paying particular attention to the difference between the range of possible combinations of tags and the range of combinations that do occur in the hand-labelled data. The problem of the dimensionality of tagsets is then introduced, and discussed in relation with other factors that influence the performance of automatic dialogue act taggers. After a brief discussion of empirical arguments relevant to the dimensionality problem, derived from human and automatic labelling experiments, the report proposes a synthesis of guidelines for the definition of dialogue act tagsets, based on multi-dimensional theoretical inspiration, and cross-dimensional constraints and the notion of dominant utterance-function, in order to reduce the search for automatic dialogue act taggers. (Version 3, August 2007, 45 pages.)

No 63 Probabilistic models of the diffusion of lexical conventions in a population of agents - A. Popescu-Belis, 2005
This paper attempts to show that the diffusion of naming conventions in a population of software agents results from mathematical properties of the multi-agent systems. We provide a number of mathematical proofs of the necessity that the population agree on common signal/meaning mappings. We define first an abstract model for the communicative competencies of the agents, and then study the model within two frameworks. The first is based on the theory of the "gambler's ruin" problem, while the second is inspired from random dynamical systems. The predictions of the theoretical analyses are also compared to the results of computer simulations.

No 64 Towards Automatic Generation of Evaluation Plans for Context-based MT Evaluation - A. Popescu-Belis, P. Estrella, M. King and N. Underwood, 2005
This paper extends the FEMTI guidelines for context- based MT evaluation with new functionalities aimed at evaluators and experts. The proposed interface to FEMTI generates an outline evaluation plan depending on the characteristics of the context in which an MT system will be used, entered by the evaluators. We first summarize the principle of context-based MT evaluation and the initial FEMTI proposal. Then, we introduce a vector-based representation of the context and of the quality characteristics, which underlies the process of evaluation design. We then show how this process is simplified by the proposed interfaces to FEMTI, and how expertise can be input into the system by using more advanced interfaces. A unified account of expert vs. evaluator use of FEMTI is finally proposed.

No 65 Automatic Identification of Discourse Markers in Multiparty Dialogues - A. Popescu-Belis and S. Zufferey, 2006
The lexical items that can serve as discourse markers (DMs) are often multi-functional. 'Like' and 'well', in particular, play numerous other roles apart from DMs: for instance, the first one can also be a verb and the second one an adverb. The goal of the present study is the identification, on transcripts of multi-party dialogues, of the occurrences of 'like' and 'well' that play a discourse or pragmatic role. DM identification is a binary classification task over the set of all occurrences of tokens 'like' and 'well'. The importance of DMs to computational linguistics is first discussed, along with previous experiments in DM identification. Then, the data is briefly described, emphasizing the DM annotation procedure and an inter-annotator agreement study. The proposed method uses lexical, prosodic/positional and sociolinguistic features, together with machine learning algorithms, among which decision trees are preferred. The results obtained using a ten-fold cross-validation procedure are analysed at length, focussing first on overall performance, and then on the relevance of each type of features. Feature analysis using a range of techniques shows that lexical indicators are the most reliable features for DM identification, followed by prosodic/positional features. Sociolinguistic features are slightly correlated with the use of 'like' as DM, while the dialogue act of the utterance containing a DM candidate does not seem relevant to DM identification. A differentiated treatment for each token appears to improve performance in almost all experiments. The methods and features used here improve performance over the past experiments, and suggest that DM identification is a tractable problem provided enough training data is available for each DM type, and that lexical features are used for each type.


Return to the top of the page



Technical Reports

No 1 ELU User Manual - D. Estival, 1990
ELU (Environnement Linguistique d'Unification) is an enhanced PATR-II style environment for linguistic development written and developed as ISSCO (see Johnson and Rosner (1989), Russell et al. (1990), Estival et al. (1990)). As its name indicates, ELU is based on unification and its purpose is the development of computational linguistic applications in general. It provides a declarative environment which allows linguists to write grammars that can be used both for parsing and for generation (the parser and the generator are described in Chapter 5). In addition to these two standard functions of a linguistic development environment, ELU also supports a transfer component (also described in Chapter 5). Together, these three components allow the development of a system which can analyze a text in one language and generate its translation in another language, making ELU particularly suitable for experimenting with machine translation.

The present manual is intended as a user guide to the ELU environment and as a reference for the ELU user language. ELU is a member of the PATR-II family of unification-based systems, and as such its syntax follows rather closely the PATR formalism which has become a standard for unification-based systems. Although a brief explanation is given at the beginning of Chapter 3, we assume some familiarity on the part of the reader with that formalism. For a general introduction to the principles of unification, we refer the reader to Shieber (1986), and for the PATR standard, to the CL-PATR User's Manual from CSLI (Shieber 1988).
The rest of this chapter introduces the notation used throughout this manual, particularly in Chapter 3 which describes the syntax of the language used for writing an ELU user program and Chapter 4 which describes the contents of an ELU user program.

No 2 Bilingual Concordancy Program (BCP) - A. Winarske, 1992

No 3 Conversion of Bilingual Dictionaries to HTML Using XSL (PS.GZ, 180 kB) - A. Popescu-Belis, 2002
This is a report on the format conversion of six bilingual dictionaries that were made available, through the DicoPro server, to the users of the 'RERO' network of Swiss francophone libraries. These dictionaries were provided to RERO by Collins (a division of HarperCollins Publishers) under the terms of a copyright contract. ISSCO/TIM/ETI was in charge of the server installation and the formatting of the dictionaries. Since they contain information on commercial data, Chapters 2 through 7, and Appendixes A and B, are confidential. However, Chapters 1 and 8 are public since they describe the conversion process and its perspectives, using only a few examples.

The first chapter describes the input and output data in the conversion process, that is, the structured data format provided by the publisher and the data formatting requirements of the DicoPro server, which are based on HTML. Then, the first chapter goes on describing the conversion process, with the technical step-by-step procedure and the XML/XSL mechanism that was used. The following six chapters list the modifications that had to be made to the source dictionaries in order to render them completely coherent with the publisher guidelines, so that they can be automatically converted. The conclusive chapter suggests possible future work, on the way towards a more concept-based encoding of the dictionary entries, rather than a format-based one. The two appendices contain the full text of the XSL stylesheet used to convert the dictionaries, as well as the scripts used in the global conversionprocess.

No 4 Vers des banques de textes multilingues : le balisage de textes (Programme de la formation RIFAL 2002) (PDF, 207 kB) - A. Popescu-Belis, 2003
Ce rapport fournit une introduction à XML, DTD et XSLT, à travers des exercices portant sur le balisage d'une fiche bibliographique simple, puis de documents au format XCES (XML Corpus Encoding Standard).

Return to the top of the page