The purpose of this project is to develop a C library where you can manage, efficiently and minimizing memory consumption, large dictionaries of many kinds. For this purpose it uses numbered acyclic deterministic finite-state automata. In this work we assume that dictionary means any structure that would associate its entries (words) to any kind of information. For the development of this library we have followed two maxims:
The library consists of two parts: one part is used to build the compiler, which deals with the task of compiling or compressing the words dictionaries, and the other part is responsible for facilitating access to these compiled dictionaries.
The compiler needs a list of words and the information associated with them to generate the compressed dictionaries. From this information, it generates a compiled dictionary (compressed) in binary format, which can be accessed by any independent program throw the second part of the library.
The key features that differentiate this library from other existing proposals are:
For words storage the library uses a numbered acyclic deterministic finite-state automaton, which is built by the compiler using the automata building algorithm proposed by Jan Daciuk in his article: Incremental Construction of Minimal Acyclic Finite-State Automata. Therefore, the automaton is built in an incremental and minimal way, and the use of memory and word recognition speed are optimized.
For information storage, however, we have used the ideas presented by Jorge Graña in his article: Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries and generalize some aspects to use the library in different contexts out of the scope of this paper: to store any information associated with the words, to remove the limits about the number of fields of information, to use more than one mapping table, to access information on disk and/or memory, etc.