OpenNLP
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
The goal of the OpenNLP project will be to create a mature toolkit for the abovementioned tasks. An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.
The Apache OpenNLP library contains several components, enabling one to build a full natural language processing pipeline.
- sentence detector
- tokenizer,
- name finder,
- document categorizer,
- part-of-speech tagger,
- chunker,
- parser,
- coreference resolution
interface
- API
try (InputStream modelIn = new FileInputStream("lang-model-name.bin")) {
SomeModel model = new SomeModel(modelIn);
ToolName toolName = new ToolName(model);
String output[] = toolName.executeTask("example text");
}
- CLI
opennlp ToolName
opennlp ToolName help
opennlp ToolName lang-model-name.bin < input.txt > output.txt