The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

The goal of the OpenNLP project will be to create a mature toolkit for the abovementioned tasks. An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.

The Apache OpenNLP library contains several components, enabling one to build a full natural language processing pipeline.

  • sentence detector
  • tokenizer,
  • name finder,
  • document categorizer,
  • part-of-speech tagger,
  • chunker,
  • parser,
  • coreference resolution

interface

  • API
try (InputStream modelIn = new FileInputStream("lang-model-name.bin")) {
  SomeModel model = new SomeModel(modelIn);
  ToolName toolName = new ToolName(model);
  String output[] = toolName.executeTask("example text");
}
  • CLI
opennlp ToolName
opennlp ToolName help
opennlp ToolName lang-model-name.bin < input.txt > output.txt