失眠网 > 【java】使用Stanford CoreNLP处理英文（词性标注/词形还原/解析等）

【java】使用Stanford CoreNLP处理英文（词性标注/词形还原/解析等）

时间：2020-03-18 10:02:00

本文旨在学习使用Stanford CoreNLP进行自然语言处理。

编程环境：64位win7系统，NetBeans，java要求1.8+

CoreNLP版本：3.6.0，下载地址：http://stanfordnlp.github.io/CoreNLP/，获取stanford-corenlp-full--12-09.zip压缩包。

Stanford CoreNLP功能：分词（tokenize）、分句（ssplit）、词性标注（pos）、词形还原（lemma,中文没有）、命名实体识别（ner）、语法解析（parse）、情感分析（sentiment）、指代消解（coreference resolution）等。

支持语言：中文、英文、法语、德语、西班牙语、阿拉伯语等。

具体使用：

1.在NetBeans中新建工程；

2.解压stanford-corenlp-full--12-09.zip，将下面的jar包导入工程库中：

slf4j-api.jar

slf4j-simple.jar

stanford-corenlp-3.6.0.jar

stanford-corenlp-3.6.0-javadoc.jar

stanford-corenlp-3.6.0-models.jar

stanford-corenlp-3.6.0-sources.jar

xom.jar

3.新建如下代码：

package corenlp;/*** 功能：练习使用CoreNLP，针对英文处理* 时间：4月22日 14:03:42* */import java.util.List;import java.util.Map;import java.util.Properties;import edu.stanford.nlp.dcoref.CorefChain;import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;import edu.stanford.nlp.ling.CoreAnnotations.LemmaAnnotation;import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;import edu.stanford.nlp.ling.CoreLabel;import edu.stanford.nlp.pipeline.Annotation;import edu.stanford.nlp.pipeline.StanfordCoreNLP;import edu.stanford.nlp.semgraph.SemanticGraph;import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;// import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;import edu.stanford.nlp.trees.Tree;import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;import edu.stanford.nlp.util.CoreMap;public class CoreNLP {public static void main(String[] args) {/*** 创建一个StanfordCoreNLP object* tokenize(分词)、ssplit(断句)、 pos(词性标注)、lemma(词形还原)、* ner(命名实体识别)、parse(语法解析)、指代消解？同义词分辨？*/Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); // 七种AnnotatorsStanfordCoreNLP pipeline = new StanfordCoreNLP(props); // 依次处理String text = "This is a test.";// 输入文本Annotation document = new Annotation(text); // 利用text创建一个空的Annotationpipeline.annotate(document); // 对text执行所有的Annotators（七种）// 下面的sentences 中包含了所有分析结果，遍历即可获知结果。List<CoreMap> sentences = document.get(SentencesAnnotation.class);System.out.println("word\tpos\tlemma\tner");for(CoreMap sentence: sentences) {for (CoreLabel token: sentence.get(TokensAnnotation.class)) {String word = token.get(TextAnnotation.class); // 获取分词String pos = token.get(PartOfSpeechAnnotation.class);// 获取词性标注String ne = token.get(NamedEntityTagAnnotation.class); // 获取命名实体识别结果String lemma = token.get(LemmaAnnotation.class);// 获取词形还原结果System.out.println(word+"\t"+pos+"\t"+lemma+"\t"+ne);}// 获取parse treeTree tree = sentence.get(TreeAnnotation.class); System.out.println(tree.toString());// 获取dependency graphSemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);System.out.println(dependencies);}Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);}}

解释：该代码将text字符串交给Stanford CoreNLP处理，StanfordCoreNLP的各个组件（annotator）对其依次进行处理。

处理完后的sentences中包含了所有分析结果，对其遍历即可获取。

4.运行结果：

如果觉得《【java】使用Stanford CoreNLP处理英文（词性标注/词形还原/解析等）》对你有帮助，请点赞、收藏，并留下你的观点哦！

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。