Gazetteers are called these processing resources in GATE that use dictionary(any data source) to annotate text. This is the simplest and most intuitive way to use predefined knowledge to annotate documents.
The GATE user interface for gazetteers (which is also used in KIM) is not that easy for beginners (at least for me).
Demo 1
Demo 1 shows how to create and use a gazetteer from the user interface. "def" files are these files that contain a list of "lst" files. On the other hand every "lst" file contains items - one per row. It is a 3 level hierarchy. This organization is shown in the demo by creating a new "def" file called "MyGazetter.def". Then we create two "lst" files. After creating the first one (people.lst) we add this "lst" file to the "def" file by using the "insert menu" on the left.
Remarks:
1. Do not forget to save your "lst" and "def" files! That is exactly what happened the first time I pressed "Run Application", so I went back and made sure everything is saved and then I reran the application.
2. While creating the gazetteer in the beginning we can not create a new "def" file by setting the "listsURL" to a non-existent "def" file as this will trigger an error. We do that later on by clicking the "New" button in the "Linear Definition" panel.
3. You can set in the gazetteer options that if several annotations overlap only the longest matters!
4. When we created the "animals.lst" file and started entering values, note that there was no indication anywhere in the interface on which file we were now working on.
5. Also keep in mind that by replacing the ANNIE gazetteer with our own instance, we have to put the new one at the same position as the old one in the pipeline. It was third place. The gazetteer uses some results from previous steps in the pipeline and other GATE processing resources expect results from the gazetteer, so position really matters!
Demo 2
In this demo:
- we see the use of Morphological Analyzer to get the root of a word
- and the use of Flexible Gazetteer to annotate all forms of a word
When we run the standard ANNIE gazetteer we only match "city" and not "cities" in the lookup annotations. We need to add the CREOLE plug-in directory "Tools" to enable both the Morphological Analyzer and Flexible Gazetteer. Next we add the Morphological Analyzer and now we get an additional feature to every token: "Token.root" which contains the root of a word which in our case is "city". The FG(Flexible Gazetteer) does not work the usual way. It does not process the annotations itself, but it works on a selected feature of a selected annotation. So by using the FG we need to select a gazetteer (as this a required parameter to create a FG) so we choose the ANNIE gazetteer to process the "Token.root" annotation.feature. By using the FG we make a standard ANNIE gazetteer see not "cities", but "city" instead. And because "city" is being recognized by default by the ANNIE gazetteer as location, so becomes and "cities".
Demo 2 shows what is the result only with the ANNIE gazetteer and the result from the joint work of the Morphological Analyzer and Flexible Gazetteer.
You may want to see the result produced only by the FG. To do that you need to give a name to the "outputAnnotationSetName" option in the FG, which is available when you click on the ANNIE application and then click on the FG processing resource on the right.
Quick links:
The Semantic Annotation Workflow - KIM part 10
KIM Multi-threaded Clustered Client Application - KIM part 9
Gazetteers - KIM/GATE part 7
Strict Rules vs Machine Learning - KIM part 6
Tips and Tricks - KIM part 5
Using a Gate application - KIM part 4
Gate tutorial - KIM part 3
Using KIM from .NET - KIM part 2
Getting Started - KIM part 1
Installation - KIM part 0






Thanks
Your article on this topic help me on understanding the use of gazetter and Morphological Analyzer.
May I know how do we retrieve the root word from the annotated text and save it in the preserving document format.
Programmatically through Java you have access to every annotation and feature, so you should have no problem retrieving the "root" feature.
From the UI you can click on a document and then select "Save as XML", which will save all annotations and features including the "root" feature's value.
will it possible to write the root annotation rule in jape? to annotate the root and show in annotated list?
Please provide some more details.
10x