Title: | Apache OpenNLP Tools Interface |
---|---|
Description: | An interface to the Apache OpenNLP tools (version 1.5.3). The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text written in Java. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. See <https://opennlp.apache.org/> for more information. |
Authors: | Kurt Hornik [aut, cre] |
Maintainer: | Kurt Hornik <[email protected]> |
License: | GPL-3 |
Version: | 0.2-7 |
Built: | 2024-11-01 03:08:57 UTC |
Source: | https://github.com/cran/openNLP |
Generate an annotator which computes chunk annotations using the Apache OpenNLP Maxent chunker.
Maxent_Chunk_Annotator(language = "en", probs = FALSE, model = NULL)
Maxent_Chunk_Annotator(language = "en", probs = FALSE, model = NULL)
language |
a character string giving the ISO-639 code of the language being processed by the annotator. |
probs |
a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘chunk_prob’ feature. |
model |
a character string giving the path to the Maxent model file to be
used, or |
See http://opennlp.sourceforge.net/models-1.5/ for available model files. These can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at.
An Annotator
object giving the generated chunk
annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
## Requires package 'openNLPmodels.en' from the repository at ## <https://datacube.wu.ac.at>. require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Chunking needs word token annotations with POS tags. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() pos_tag_annotator <- Maxent_POS_Tag_Annotator() a3 <- annotate(s, list(sent_token_annotator, word_token_annotator, pos_tag_annotator)) annotate(s, Maxent_Chunk_Annotator(), a3) annotate(s, Maxent_Chunk_Annotator(probs = TRUE), a3)
## Requires package 'openNLPmodels.en' from the repository at ## <https://datacube.wu.ac.at>. require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Chunking needs word token annotations with POS tags. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() pos_tag_annotator <- Maxent_POS_Tag_Annotator() a3 <- annotate(s, list(sent_token_annotator, word_token_annotator, pos_tag_annotator)) annotate(s, Maxent_Chunk_Annotator(), a3) annotate(s, Maxent_Chunk_Annotator(probs = TRUE), a3)
Generate an annotator which computes entity annotations using the Apache OpenNLP Maxent name finder.
Maxent_Entity_Annotator(language = "en", kind = "person", probs = FALSE, model = NULL)
Maxent_Entity_Annotator(language = "en", kind = "person", probs = FALSE, model = NULL)
language |
a character string giving the ISO-639 code of the language being processed by the annotator. |
kind |
a character string giving the ‘kind’ of entity to be annotated (person, date, ...). |
probs |
a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘prob’ feature. |
model |
a character string giving the path to the Maxent model file to be
used, or |
See http://opennlp.sourceforge.net/models-1.5/ for available model files. These can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at.
An Annotator
object giving the generated entity
annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
## Requires package 'openNLPmodels.en' from the repository at ## <https://datacube.wu.ac.at>. require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence and word token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- annotate(s, list(sent_token_annotator, word_token_annotator)) ## Entity recognition for persons. entity_annotator <- Maxent_Entity_Annotator() entity_annotator annotate(s, entity_annotator, a2) ## Directly: entity_annotator(s, a2) ## And slice ... s[entity_annotator(s, a2)] ## Variant with sentence probabilities as features. annotate(s, Maxent_Entity_Annotator(probs = TRUE), a2)
## Requires package 'openNLPmodels.en' from the repository at ## <https://datacube.wu.ac.at>. require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence and word token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- annotate(s, list(sent_token_annotator, word_token_annotator)) ## Entity recognition for persons. entity_annotator <- Maxent_Entity_Annotator() entity_annotator annotate(s, entity_annotator, a2) ## Directly: entity_annotator(s, a2) ## And slice ... s[entity_annotator(s, a2)] ## Variant with sentence probabilities as features. annotate(s, Maxent_Entity_Annotator(probs = TRUE), a2)
Generate an annotator which computes POS tag annotations using the Apache OpenNLP Maxent Part of Speech tagger.
Maxent_POS_Tag_Annotator(language = "en", probs = FALSE, model = NULL)
Maxent_POS_Tag_Annotator(language = "en", probs = FALSE, model = NULL)
language |
a character string giving the ISO-639 code of the language being processed by the annotator. |
probs |
a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘POS_prob’ feature. |
model |
a character string giving the path to the Maxent model file to be
used, or |
See http://opennlp.sourceforge.net/models-1.5/ for available model files. For languages other than English, these can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at. For English, no additional installation is required.
An Annotator
object giving the generated POS tag
annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence and word token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- annotate(s, list(sent_token_annotator, word_token_annotator)) pos_tag_annotator <- Maxent_POS_Tag_Annotator() pos_tag_annotator a3 <- annotate(s, pos_tag_annotator, a2) a3 ## Variant with POS tag probabilities as (additional) features. head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2)) ## Determine the distribution of POS tags for word tokens. a3w <- subset(a3, type == "word") tags <- sapply(a3w$features, `[[`, "POS") tags table(tags) ## Extract token/POS pairs (all of them): easy. sprintf("%s/%s", s[a3w], tags) ## Extract pairs of word tokens and POS tags for second sentence: a3ws2 <- annotations_in_spans(subset(a3, type == "word"), subset(a3, type == "sentence")[2L])[[1L]] sprintf("%s/%s", s[a3ws2], sapply(a3ws2$features, `[[`, "POS"))
require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence and word token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- annotate(s, list(sent_token_annotator, word_token_annotator)) pos_tag_annotator <- Maxent_POS_Tag_Annotator() pos_tag_annotator a3 <- annotate(s, pos_tag_annotator, a2) a3 ## Variant with POS tag probabilities as (additional) features. head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2)) ## Determine the distribution of POS tags for word tokens. a3w <- subset(a3, type == "word") tags <- sapply(a3w$features, `[[`, "POS") tags table(tags) ## Extract token/POS pairs (all of them): easy. sprintf("%s/%s", s[a3w], tags) ## Extract pairs of word tokens and POS tags for second sentence: a3ws2 <- annotations_in_spans(subset(a3, type == "word"), subset(a3, type == "sentence")[2L])[[1L]] sprintf("%s/%s", s[a3ws2], sapply(a3ws2$features, `[[`, "POS"))
Generate an annotator which computes sentence annotations using the Apache OpenNLP Maxent sentence detector.
Maxent_Sent_Token_Annotator(language = "en", probs = FALSE, model = NULL)
Maxent_Sent_Token_Annotator(language = "en", probs = FALSE, model = NULL)
language |
a character string giving the ISO-639 code of the language being processed by the annotator. |
probs |
a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘prob’ feature. |
model |
a character string giving the path to the Maxent model file to be
used, or |
See http://opennlp.sourceforge.net/models-1.5/ for available model files. For languages other than English, these can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at. For English, no additional installation is required.
An Annotator
object giving the generated sentence
token annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) sent_token_annotator <- Maxent_Sent_Token_Annotator() sent_token_annotator a1 <- annotate(s, sent_token_annotator) a1 ## Extract sentences. s[a1] ## Variant with sentence probabilities as features. annotate(s, Maxent_Sent_Token_Annotator(probs = TRUE))
require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) sent_token_annotator <- Maxent_Sent_Token_Annotator() sent_token_annotator a1 <- annotate(s, sent_token_annotator) a1 ## Extract sentences. s[a1] ## Variant with sentence probabilities as features. annotate(s, Maxent_Sent_Token_Annotator(probs = TRUE))
Generate an annotator which computes word token annotations using the Apache OpenNLP Maxent tokenizer.
Maxent_Word_Token_Annotator(language = "en", probs = FALSE, model = NULL)
Maxent_Word_Token_Annotator(language = "en", probs = FALSE, model = NULL)
language |
a character string giving the ISO-639 code of the language being processed by the annotator. |
probs |
a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘prob’ feature. |
model |
a character string giving the path to the Maxent model file to be
used, or |
See http://opennlp.sourceforge.net/models-1.5/ for available model files. For languages other than English, these can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at. For English, no additional installation is required.
An Annotator
object giving the generated word token
annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() a1 <- annotate(s, sent_token_annotator) word_token_annotator <- Maxent_Word_Token_Annotator() word_token_annotator a2 <- annotate(s, word_token_annotator, a1) a2 ## Variant with word token probabilities as features. head(annotate(s, Maxent_Word_Token_Annotator(probs = TRUE), a1)) ## Can also perform sentence and word token annotations in a pipeline: a <- annotate(s, list(sent_token_annotator, word_token_annotator)) head(a)
require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() a1 <- annotate(s, sent_token_annotator) word_token_annotator <- Maxent_Word_Token_Annotator() word_token_annotator a2 <- annotate(s, word_token_annotator, a1) a2 ## Variant with word token probabilities as features. head(annotate(s, Maxent_Word_Token_Annotator(probs = TRUE), a1)) ## Can also perform sentence and word token annotations in a pipeline: a <- annotate(s, list(sent_token_annotator, word_token_annotator)) head(a)
Generate an annotator which computes Penn Treebank parse annotations using the Apache OpenNLP chunking parser for English.
Parse_Annotator()
Parse_Annotator()
Using the generated annotator requires installing package openNLPmodels.en from the repository at https://datacube.wu.ac.at (which provides the Maxent model file used by the parser).
An Annotator
object giving the generated parse
annotator.
https://opennlp.apache.org for more information about Apache OpenNLP.
## Requires package 'openNLPmodels.en' from the repository at ## <https://datacube.wu.ac.at>. require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence and word token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- annotate(s, list(sent_token_annotator, word_token_annotator)) parse_annotator <- Parse_Annotator() ## Compute the parse annotations only. p <- parse_annotator(s, a2) ## Extract the formatted parse trees. ptexts <- sapply(p$features, `[[`, "parse") ptexts ## Read into NLP Tree objects. ptrees <- lapply(ptexts, Tree_parse) ptrees
## Requires package 'openNLPmodels.en' from the repository at ## <https://datacube.wu.ac.at>. require("NLP") ## Some text. s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ", "nonexecutive director Nov. 29.\n", "Mr. Vinken is chairman of Elsevier N.V., ", "the Dutch publishing group."), collapse = "") s <- as.String(s) ## Need sentence and word token annotations. sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- annotate(s, list(sent_token_annotator, word_token_annotator)) parse_annotator <- Parse_Annotator() ## Compute the parse annotations only. p <- parse_annotator(s, a2) ## Extract the formatted parse trees. ptexts <- sapply(p$features, `[[`, "parse") ptexts ## Read into NLP Tree objects. ptrees <- lapply(ptexts, Tree_parse) ptrees