Xtag is an user interface based on Tcl/Tk/expectk languages. This interface integrates all the features of the tagger modules:
The Help button allows to read the man pages with your favourite
browser.
COMMAND:

In the main window you can select with the appropriate
button the part of the tagger you want to execute.
You can follow the operations and have a look
at the output.
CONFIG
But above all, you have to configurate the options of the text to process in the "config" window with the usual arguments of the tagger commands :
PREPARATION
In the "Preparation" window, the text is
prepare for the tagger or trainer programs but you can selecte the "Simple
conversion" option that apply the conversions without change the
format (-H option). See the mpreptxt
man page.
TRAINING
In the "Training" window, you can make
a training (mtrain program) but also
operate on the matrices (mcreate,
mprint and edit). The matrices
file can be created by mtrain which
initializes the matrix with equi-proabable values based on the tags found
in the corpus.The values can be readjusted to reflect user-defined preferences
as stated in the biases_file. This training phase can be repeated for any
number of iterations where each iteration may assign different probababilities.
The matrices are used by the tagging program mtag
to calculate the most probable tag for each word in a text.
mtrain readjusts the parameters and returns
the new values in a compiled matrices file.
TAGGING
In the "Tagging" window you can make a tagging
(mtag program) .The option "Re-estimate"
allow to re-estimate the probalities to improve the accuracy. If
the correct tag list and the matrices output file are indicated, the tagger
will automatically readjust the values in the matrices according to the
correct solutions and retag the text. This improve obviously the result
for the given text, but can also improve the result for the text with the
similar structures.
RESULTS
In the "Results" window, you can operate in the
matrices with the biases rules (mbiases
program), and print the results with the mdiff
, mdiffb or mcontext
commands.
CONFIGURATION FILE
|
|
||
| TAGCNV_F | "states.cnv" | states conversion file |
| WRDCNV_F | "words.cnv" | Word conversion file |
| BIASLST_F | "biases.lst" | Biases file |
| NBRFIELD | 3 | Number of fields preceding the [BOS|EOS] field |
| LEXCOLUMN | 3 | Specifies the column where to find the word |
| PREMSEP | "\\\\" | Specifies the separator within [LEM,ANNOT] pair |
| SECSEP | "\\|" | Specifies the separator between the sets of ambiguous [LEM,ANNOT] pairs assigned to a given word |
| # Defaults for the preparing session | ||
|---|---|---|
| PR_INPUT_F | "text" | Input text to be prepare |
| PR_OUTPUT_F | "text.tr" | Output text |
| PR_MATRICES_F | "MMinit" | Matrices file |
| # Defaults for the training session | ||
| TR_INPUT_F | "text.tr" | Input text |
| TR_M_INPUT_F | "MMinit" | Input Matrices file |
| TR_M_OUTPUT_F | "MM_01" | Output matrices file |
| TR_M_PRINT_F | "MM_01.clr" | Output file of the print command |
| TR_LOOP | 1 | Loop number |
| # Defaults for the tagging session | ||
| TA_INPUT_F | "text.tr" | Input text |
| TA_OUTPUT_F | "text.tg" | Output text |
| TA_M_INPUT_F | "MM_01" | Matrices file |
| TA_PRECISION | 0 | Precision |
| TA_LOOP | 1 | Loop number |
| TA_M_OUTPUT_F | "" | Output Matrices file |
| TA_TAG_OUTPUT_F | "/tmp/Taglst" | Tag list file |
| # Defaults for the biasing session | ||
| B_M_INPUT_F | "MM_01" | Input matrices file |
| B_M_OUTPUT_F | "MM_01b" | output matrices file |
| B_BIASES_F | "biases.lst" | biases file |
| # Defaults for the results session | ||
| D_INPUT1_F | "/tmp/Taglst" | Tag list file |
| D_INPUT2_F | "TAG1" | Correct tag list (mhandtag) |
| D_TAG1 | "" | Tag 1 present in the tag list |
| D_TAG2 | "" | Tag 2 present in the correct tag list |
Comments, suggestions, and bug reports are always welcome.