失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 白话Elasticsearch19-深度探秘搜索技术之混合使用match和近似匹配实现召回率(recall

白话Elasticsearch19-深度探秘搜索技术之混合使用match和近似匹配实现召回率(recall

时间:2020-10-11 08:14:46

相关推荐

白话Elasticsearch19-深度探秘搜索技术之混合使用match和近似匹配实现召回率(recall

文章目录

概述召回率recall精准度 precision分析利弊方案

概述

继续跟中华石杉老师学习ES,第19篇

课程地址: /view/55

召回率recall

举个例子 ,比如搜索一个java spark,总共有100个doc,能返回多少个doc作为结果,就是召回率,recall

精准度 precision

举个例子 ,比如搜索一个java spark,能不能尽可能让包含java spark,或者是java和spark离的很近的doc,排在最前面,precision

分析利弊

直接用match_phrase短语搜索,会导致必须所有term都在doc field中出现,而且距离在slop限定范围内,才能匹配上

match phrase,proximity match,要求doc必须包含所有的term,才能作为结果返回;如果某一个doc可能就是有某个term没有包含,那么就无法作为结果返回

比如:

java spark --> hello world java --> 就不能返回了

java spark --> hello world, java spark --> 才可以返回

近似匹配的时候,召回率比较低,因为精准度太高了

但是有时可能我们希望的是匹配到几个term中的部分,就可以作为结果出来,这样可以提高召回率。

同时我们也希望用上match_phrase根据距离提升分数的功能,让几个term距离越近分数就越高,优先返回

就是优先满足召回率,意思,java spark,包含java的也返回,包含spark的也返回,包含java和spark的也返回;同时兼顾精准度,就是包含java和spark,同时java和spark离的越近的doc排在最前面 .

方案

此时可以用bool组合match query和match_phrase query一起,来实现上述效果

我们先看下 match query的返回结果

GET /forum/article/_search{"query": {"bool": {"must": [{"match": {"content": "java spark"}}]}}}

返回数据

{"took": 3,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 1.8166281,"hits": [{"_index": "forum","_type": "article","_id": "5","_score": 1.8166281,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "-05-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java spark","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny"}},{"_index": "forum","_type": "article","_id": "2","_score": 0.7721133,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams","new_author_last_name": "Williams","new_author_first_name": "Smith"}}]}}

再看下 bool组合match query和match_phrase query的情况

GET /forum/article/_search{"query": {"bool": {"must": [{"match": {"content": "java spark"}}],"should": [{"match_phrase": {"content": {"query": "java spark","slop": 50}}}]}}}

返回数据

{"took": 3,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 3.2468495,"hits": [{"_index": "forum","_type": "article","_id": "5","_score": 3.2468495,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "-05-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java spark","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny"}},{"_index": "forum","_type": "article","_id": "2","_score": 0.7721133,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams","new_author_last_name": "Williams","new_author_first_name": "Smith"}}]}}

白话Elasticsearch19-深度探秘搜索技术之混合使用match和近似匹配实现召回率(recall)与精准度(precision)的平衡

如果觉得《白话Elasticsearch19-深度探秘搜索技术之混合使用match和近似匹配实现召回率(recall》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。