文章目录
概述数据小例子搜索标题中包含java或elasticsearch的blog搜索标题中包含java和elasticsearch的blog搜索包含java,elasticsearch,spark,hadoop,4个关键字中,至少3个的blog用bool组合多个搜索条件,来搜索titlebool组合多个搜索条件,如何计算relevance score搜索java,hadoop,spark,elasticsearch,至少包含其中3个关键字概述
继续跟中华石杉老师学习ES,第六篇
课程地址: /view/55
如果我们要想对全文检索的方式实现更细粒度的控制该怎么办呢? 这里我们就来探讨下手动控制全文检索结果的精准度的几种方式
match query
6.4版本 :
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-match-query.html
7.0
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-match-query.html
数据
为了说明该部分,我们给帖子数据增加标题title字段
POST /forum/article/_bulk{"update":{"_id":"1"}}{"doc":{"title":"this is java and elasticsearch blog"}}{"update":{"_id":"2"}}{"doc":{"title":"this is java blog"}}{"update":{"_id":"3"}}{"doc":{"title":"this is elasticsearch blog"}}{"update":{"_id":"4"}}{"doc":{"title":"this is java, elasticsearch, hadoop blog"}}{"update":{"_id":"5"}}{"doc":{"title":"this is spark blog"}}
看下其中一条数据检查下title字段
mapping :
小例子
搜索标题中包含java或elasticsearch的blog
重点是:或
The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text
这个,就跟之前的那个term query,不一样了。不是搜索exact value,是进行full text全文检索。
match query,是负责进行全文检索的。当然,如果要检索的field,是 not_analyzed类型的,或者是keyword类型,那么match query也相当于term query。
title的字段映射为
我们先看下 “this is java and elasticsearch blog” 的分词
GET /forum/_analyze{"field": "title","text": "this is java and elasticsearch blog"}
被拆分成了 this 、 is 、java 、 and 、 elasticsearch 、 blog 存放在倒排索引中
我们要 搜索标题中包含java或elasticsearch的blog ,改如何做呢?
看看 java elasticsearch 的分词
GET /forum/_analyze{"field": "title","text": "java elasticsearch"}
所以,这个只要match query即可
GET /forum/_search{"query": {"match": {"title": "java elasticsearch"}}}
返回4条数据 ,符合 或
{"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 4,"max_score": 0.8092568,"hits": [{"_index": "forum","_type": "article","_id": "4","_score": 0.8092568,"_source": {"articleID": "QQPX-R-3956-#aD8","userID": 2,"hidden": true,"postDate": "-01-02","tag": ["java","elasticsearch"],"tag_cnt": 2,"view_cnt": 80,"title": "this is java, elasticsearch, hadoop blog"}},{"_index": "forum","_type": "article","_id": "1","_score": 0.5753642,"_source": {"articleID": "XHDK-A-1293-#fJ3","userID": 1,"hidden": false,"postDate": "-01-01","tag": ["java","hadoop"],"tag_cnt": 2,"view_cnt": 30,"title": "this is java and elasticsearch blog"}},{"_index": "forum","_type": "article","_id": "3","_score": 0.2876821,"_source": {"articleID": "JODL-X-1937-#pV7","userID": 2,"hidden": false,"postDate": "-01-01","tag": ["hadoop"],"tag_cnt": 1,"view_cnt": 100,"title": "this is elasticsearch blog"}},{"_index": "forum","_type": "article","_id": "2","_score": 0.19856805,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog"}}]}}
搜索标题中包含java和elasticsearch的blog
重点是:和
The operator flag can be set to or or and to control the boolean clauses (defaults to or).
如果你希望所有的搜索关键字都要匹配的,那么就用and,可以实现单纯match query无法实现的效果
GET /forum/_search{"query": {"match": {"title": {"query": "java elasticsearch","operator": "and"}}}}
返回2条数据 ,OK
搜索包含java,elasticsearch,spark,hadoop,4个关键字中,至少3个的blog
指定一些关键字中,必须至少匹配其中的多少个关键字,才能作为结果返回
The minimum number of optional should clauses to match can be set using theminimum_should_match
parameter.
minimum_should_match 说明
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-minimum-should-match.html
百分比
GET /forum/_search{"query": {"match": {"title": {"query": "java elasticsearch spark hadoop","minimum_should_match": "75%"}}}}
数字
GET /forum/_search{"query": {"match": {"title": {"query": "java elasticsearch spark hadoop","minimum_should_match": 3}}}}
返回一条数据 ,符合了至少3个
用bool组合多个搜索条件,来搜索title
GET /forum/article/_search{"query": {"bool": {"must": {"match": {"title": "java"}},"must_not": {"match": {"title": "spark"}},"should": [{"match": {"title": "hadoop"}},{"match": {"title": "elasticsearch"}}]}}}
match在匹配时会对所查找的关键词进行分词,然后按分词匹配查找.
term会直接对关键词进行查找。一般模糊查找的时候,多用match,而精确查找时可以使用term.
也可以使用term精确查找
GET /forum/_search{"query": {"bool": {"must": {"term": {"title": "java"}},"must_not": {"term": {"title": "spark"}},"should": [{"term": {"title": "hadoop"}},{"term": {"title": "elasticsearch"}}]}}}
bool组合多个搜索条件,如何计算relevance score
must和should搜索对应的分数,加起来,除以must和should的总数
排名第一:java,同时包含should中所有的关键字,hadoop,elasticsearch排名第二:java,同时包含should中的elasticsearch排名第三:java,不包含should中的任何关键字
should是可以影响相关度分数的
must是确保说,谁必须有这个关键字,同时会根据这个must的条件去计算出document对这个搜索条件的relevance score
在满足must的基础之上,should中的条件,不匹配也可以,但是如果匹配的更多,那么document的relevance score就会更高
搜索java,hadoop,spark,elasticsearch,至少包含其中3个关键字
默认情况下,should是可以不匹配任何一个的,比如上面的搜索中,this is java blog,就不匹配任何一个should条件
但是有个例外的情况,如果没有must的话,那么should中必须至少匹配一个才可以.
比如下面的搜索,should中有4个条件,默认情况下,只要满足其中一个条件,就可以匹配作为结果返回, 但是可以精准控制,should的4个条件中,至少匹配几个才能作为结果返回
GET /forum/article/_search{"query": {"bool": {"should": [{"match": {"title": "java"}},{"match": {"title": "elasticsearch"}},{"match": {"title": "hadoop"}},{"match": {"title": "spark"}}],"minimum_should_match": 3}}}
总结一下
1、全文检索的时候,进行多个值的检索,有两种做法,match query;should2、控制搜索结果精准度:and operator、minimum_should_match
如果觉得《白话Elasticsearch06- 深度探秘搜索技术之手动控制全文检索结果的精准度》对你有帮助,请点赞、收藏,并留下你的观点哦!