The purpose of this project is to develop a C library where you can manage, efficiently and minimizing memory consumption, large dictionaries of many kinds. For this purpose it uses numbered acyclic deterministic finite-state automata. In this work we assume that dictionary means any structure that would associate its entries (words) to any kind of information. For the development of this library we have followed two maxims:

  1. To minimize memory consumption needed to store the dictionaries.
  2. To minimize the running time for them to be used in systems that require a high performance.

The library consists of two parts: one part is used to build the compiler, which deals with the task of compiling or compressing the words dictionaries, and the other part is responsible for facilitating access to these compiled dictionaries.

The compiler needs a list of words and the information associated with them to generate the compressed dictionaries. From this information, it generates a compiled dictionary (compressed) in binary format, which can be accessed by any independent program throw the second part of the library.

The key features that differentiate this library from other existing proposals are:

  1. You can store any type of information associated with words inside it. Actually, dictionaries store integers and/or floats, but since these integers can be interpreted as indexes to any structure external to the library, they may reference data of any type, including strings.
  2. It enables the simultaneous management of multiple dictionaries at a time and with references between them. Because the integer data can be interpreted in different ways, a particular case is to consider them as indexes to other dictionaries, which allows several dictionaries to be connected.

For words storage the library uses a numbered acyclic deterministic finite-state automaton, which is built by the compiler using the automata building algorithm proposed by Jan Daciuk in his article: Incremental Construction of Minimal Acyclic Finite-State Automata. Therefore, the automaton is built in an incremental and minimal way, and the use of memory and word recognition speed are optimized.

For information storage, however, we have used the ideas presented by Jorge Graña in his article: Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries and generalize some aspects to use the library in different contexts out of the scope of this paper: to store any information associated with the words, to remove the limits about the number of fields of information, to use more than one mapping table, to access information on disk and/or memory, etc.

english | galego | español
Valid XHTML 1.0 Strict Valid CSS!