本文以 Elasticsearch 5.6.2为例。

最新(截止到2018-09-23)的 Elasticsearch 是 6.4.1。5.x系列和6.x系列虽然有些区别,但基本用法是一样的。

官方文档:
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/

安装

安装比较简单。分两步:

  • 配置JDK环境
  • 安装Elasticsearch

Elasticsearch 依赖 JDK环境,需要系统先下载安装 JDK 并配置 JAVA_HOME 环境变量。JDK 版本推荐:1.8.0系列。地址:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

安装JDk

Linux:

$ yum install -y java-1.8.0-openjdk

配置环境变量,需要修改/etc/profile, 增加:

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el6_10.x86_64
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
JAVACMD=/usr/bin/java
export JAVA_HOME JAVACMD CLASSPATH PATH

然后使之生效:

source /etc/profile

Windows:

安装包地址:
http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-windows-x64.exe

下载并配置JDK环境变量

JAVA_HOME=C:\Program Files\Java\jdk1.8.0_101

CLASSPATH=.;%JAVA_HOME%\lib;.;%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar;

安装Elasticsearch

Elasticsearch 安装只需要下载二进制压缩包包,解压即可使用。需要特别注意的是版本号,如果还要安装Kibana及插件,需要注意选用一样的版本号。

安装包下载:https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.2.tar.gz

这个页面有 Elasticsearch 所有版本的下载:https://www.elastic.co/downloads/past-releases

下载后解压到指定目录,进入到 bin 目录,就可以运行 Elasticsearch 了:
Linux:

./elasticsearch

Windows:

elasticsearch.bat

注: Linux/Mac环境不能使用 root 用户运行。

基础入门

我们可以使用curl或者kibana提供的Dev Tools进行API测试。

例如:
curl方式:

curl 'localhost:9200/_cat/health?format=json'

[{"epoch":"1537689647","timestamp":"16:00:47","cluster":"elasticsearch","status":"yellow","node.total":"1","node.data":"1","shards":"11","pri":"11","relo":"0","init":"0","unassign":"11","pending_tasks":"0","max_task_wait_time":"-","active_shards_percent":"50.0%"}]

Dev Tools:

GET /_cat/health?format=json

个人比较喜欢Kibana提供的Dev Tools,非常方便。

查看_cat命令:

GET _cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates

以下测试均在Dev Tools执行。

节点操作

查看健康状态

GET /_cat/health?format=json

结果:

[
  {
    "epoch": "1537689915",
    "timestamp": "16:05:15",
    "cluster": "elasticsearch",
    "status": "yellow",
    "node.total": "1",
    "node.data": "1",
    "shards": "11",
    "pri": "11",
    "relo": "0",
    "init": "0",
    "unassign": "11",
    "pending_tasks": "0",
    "max_task_wait_time": "-",
    "active_shards_percent": "50.0%"
  }
]

健康状态有3种:

  • Green – 正常(集群功能齐全)
  • Yellow – 所有数据均可用,但尚未分配一些副本(群集功能齐全)
  • Red – 某些数据由于某种原因不可用(群集部分功能可用)

注意:当群集为红色时,它将继续提供来自可用分片的搜索请求,但您可能需要尽快修复它,因为存在未分配的分片。

查看节点

GET /_cat/nodes?format=json

索引

查看所有index

GET /_cat/indices?format=json

结果:

[
  {
    "health": "yellow",
    "status": "open",
    "index": "filebeat-2018.09.23",
    "uuid": "bwWVhUkBTIe46h9QJfmZHw",
    "pri": "5",
    "rep": "1",
    "docs.count": "4231",
    "docs.deleted": "0",
    "store.size": "2.5mb",
    "pri.store.size": "2.5mb"
  },
  {
    "health": "yellow",
    "status": "open",
    "index": ".kibana",
    "uuid": "tnWbNLSMT7273UEh6RfcBg",
    "pri": "1",
    "rep": "1",
    "docs.count": "4",
    "docs.deleted": "0",
    "store.size": "23.9kb",
    "pri.store.size": "23.9kb"
  }
]

创建index

PUT /customer?pretty

删除index

DELETE /customer?pretty

查询指定 Index 的 mapping

GET /customer/_mapping?pretty

注:ElasticSearch里面有 indextype 的概念:index称为索引,type为文档类型,一个index下面有多个type,每个type的字段可以不一样。这类似于关系型数据库的 database 和 table 的概念。但是,ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。所以后来ElasticSearch团队想去掉type,于是在6.x版本为了向下兼容,一个index只允许有一个type。预计7.x版本彻底去掉type。参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

所以,实际使用中建议一个index里面仅有一个type,名称可以和index一致,或者使用固定的doc

增删改查

按ID新增数据

type为doc:

PUT /customer/doc/1?pretty
{
  "name": "John Doe"
}
PUT /customer/doc/2?pretty
{
  "name": "yujc",
  "age":22
}

如果index不存在,直接新增数据也会同时创建index。

同时,该操作也能修改数据:

PUT /customer/doc/2?pretty
{
  "name": "yujc",
  "age":23
}

age字段会被修改,而且_version会被修改为2:

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": false
}

按ID查询数据

GET /customer/doc/1?pretty

结果:

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 2,
  "found": true,
  "_source": {
    "name": "John Doe"
  }
}

直接新增数据

我们也可以不指定文档ID从而直接新增数据:

POST /customer/doc?pretty
{
  "name": "yujc",
  "age":23
}

注意这里使用的动作是POSTPUT新增数据必须指定文档ID。

更新数据

我们使用下面两种方式均能更新已有数据:

PUT /customer/doc/1?pretty
{
  "name": "yujc2",
  "age":22
}

POST /customer/doc/1?pretty
{
  "name": "yujc2",
  "age":22
}

以上操作均会覆盖现有数据。

如果只是想更新指定字段,必须使用POST加参数的形式:

POST /customer/doc/1/_update?pretty
{
  "doc":{"name": "yujc"}
}

其中_update表示更新。doc必须有,否则会报错。

增加字段:

POST /customer/doc/1/_update?pretty
{
  "doc":{"yeat": 2018}
}

就会在已有的数据基础上增加一个year字段,不会覆盖已有数据:

GET /customer/doc/1?pretty

结果:

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 16,
  "found": true,
  "_source": {
    "name": "yujc",
    "age": 22,
    "yeat": 2018
  }
}

也可以使用简单脚本执行更新。此示例使用脚本将年龄增加5:

POST /customer/doc/1/_update?pretty
{
  "script":"ctx._source.age+=5"
}

结果:

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 17,
  "found": true,
  "_source": {
    "name": "yujc",
    "age": 27,
    "yeat": 2018
  }
}

按ID删除数据

DELETE /customer/doc/1?pretty

批量

新增

POST /customer/doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }

该操作会新增2条记录,而不是4条。查询数据:

GET /customer/doc/2?pretty

结果:

{
  "_index": "customer",
  "_type": "doc",
  "_id": "2",
  "_version": 2,
  "found": true,
  "_source": {
    "name": "Jane Doe"
  }
}

更新、删除

POST /customer/doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}

该操作会更新ID为1的文档,删除ID为2的文档。

注意:批量操作如果某条失败了,并不影响下一条继续执行。

全文检索

经过前面的基础入门,我们对ES的基本操作也会了。现在来学习ES最强大的部分:全文检索。

准备工作

批量导入数据

先需要准备点数据,然后导入:

wget https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/account/_bulk?pretty&refresh" --data-binary "@accounts.json"

这样我们就导入了1000条数据到ES。index是bank。我们可以查看现在有哪些index:

curl "localhost:9200/_cat/indices?format=json&pretty"

结果:

[
  {
    "health" : "yellow",
    "status" : "open",
    "index" : "bank",
    "uuid" : "IhyOzz3WTFuO5TNgPJUZsw",
    "pri" : "5",
    "rep" : "1",
    "docs.count" : "1000",
    "docs.deleted" : "0",
    "store.size" : "640.3kb",
    "pri.store.size" : "640.3kb"
  },
  {
    "health" : "yellow",
    "status" : "open",
    "index" : "customer",
    "uuid" : "f_nzBLypSUK2SVjL2AoKxQ",
    "pri" : "5",
    "rep" : "1",
    "docs.count" : "9",
    "docs.deleted" : "0",
    "store.size" : "31kb",
    "pri.store.size" : "31kb"
  },
  {
    "health" : "yellow",
    "status" : "open",
    "index" : ".kibana",
    "uuid" : "tnWbNLSMT7273UEh6RfcBg",
    "pri" : "1",
    "rep" : "1",
    "docs.count" : "5",
    "docs.deleted" : "0",
    "store.size" : "29.4kb",
    "pri.store.size" : "29.4kb"
  }
]

使用kibana可视化数据

该小节是可选的,如果不感兴趣,可以跳过。

该小节要求你已经搭建好了ElasticSearch + Kibana。

打开kibana web地址:http://127.0.0.1:5601,依次打开:Management
-> Kibana -> Index Patterns ,选择Create Index Pattern

a. Index pattern 输入:bank

b. 点击Create。

然后打开Discover,选择 bank 就能看到刚才导入的数据了。

我们在可视化界面里检索数据:

是不是很酷!

接下来我们使用API来实现检索。

关键字检索

模糊检索

GET /bank/_search?q="Virginia"&pretty

解释:检索关键字为”Virginia”的结果。结果示例:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 4.631368,
    "hits": [
      {
        "_index": "bank",
        "_type": "account",
        "_id": "298",
        "_score": 4.631368,
        "_source": {
          "account_number": 298,
          "balance": 34334,
          "firstname": "Bullock",
          "lastname": "Marsh",
          "age": 20,
          "gender": "M",
          "address": "589 Virginia Place",
          "employer": "Renovize",
          "email": "bullockmarsh@renovize.com",
          "city": "Coinjock",
          "state": "UT"
        }
      },
      {
        "_index": "bank",
        "_type": "account",
        "_id": "25",
        "_score": 4.6146765,
        "_source": {
          "account_number": 25,
          "balance": 40540,
          "firstname": "Virginia",
          "lastname": "Ayala",
          "age": 39,
          "gender": "F",
          "address": "171 Putnam Avenue",
          "employer": "Filodyne",
          "email": "virginiaayala@filodyne.com",
          "city": "Nicholson",
          "state": "PA"
        }
      }
    ]
  }
}

返回字段含义:

  • took – Elasticsearch执行搜索的时间(以毫秒为单位)
  • timed_out – 搜索是否超时
  • _shards – 搜索了多少个分片,以及搜索成功/失败分片的计数
  • hits – 搜索结果,是个对象
  • hits.total – 符合我们搜索条件的文档总数
  • hits.hits – 实际的搜索结果数组(默认为前10个文档)
  • hits.sort – 对结果进行排序(如果按score排序则没有该字段)
  • hits._score、max_score – 暂时忽略这些字段
GET /bank/_search?q=*&sort=account_number:asc&pretty

解释:所有结果通过account_number字段升序排列。默认只返回前10条。

下面的查询与上面的含义一致:

GET /bank/_search
{
  "query": {
        "multi_match" : {
            "query" : "Virginia",
            "fields" : ["_all"]
        }
    }
}

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

通常我们会采用传JSON方式查询。Elasticsearch提供了一种JSON样式的特定于域的语言,可用于执行查询。这被称为查询DSL。

注意:上述的查询里面我们仅指定了index,并没有指定type,那么ES将不会区分type。如果想区分,请在URI后面追加type。示例:GET /bank/account/_search

字段检索

再看按字段查询:

GET /bank/_search
{
  "query": {
        "multi_match" : {
            "query" : "Virginia",
            "fields" : ["firstname"]
        }
    }
}

GET /bank/_search
{
  "query": {
        "match" : {
            "firstname" : "Virginia"
        }
    }
}

上面2种查询是等效的,都是查询firstnameVirginia的结果。

不分词

默认检索都是分词的,如果我们希望精确匹配,可以这样实现:

GET /bank/_search
{
  "query": {
        "match" : {
            "address.keyword" : "171 Putnam Avenue"
        }
    }
}

在字段后面加上.keyword表示不分词,使用精确匹配。大家可以测试下面2种查询结果的区别:

GET /bank/_search
{
  "query": {
        "match" : {
            "address" : "Putnam"
        }
    }
}

GET /bank/_search
{
  "query": {
        "match" : {
            "address.keyword" : "Putnam"
        }
    }
}

第二种将查不到任何结果。

分页

分页使用关键字from、size,分别表示偏移量、分页大小。

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 0,
  "size": 2
}

from默认是0,size默认是10。

字段排序

字段排序关键字是sort。支持升序(asc)、降序(desc)。

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from":0,
  "size":10
}

过滤字段

默认情况下,ES返回所有字段。这被称为源(_source搜索命中中的字段)。如果我们不希望返回所有字段,我们可以只请求返回源中的几个字段。

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}

通过_source关键字可以实现字段过滤。

AND查询

如果我们想同时查询符合A和B字段的结果,该怎么查呢?可以使用must关键字组合。

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}


GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "account_number":136 } },
        { "match": { "address": "lane" } },
        { "match": { "city": "Urie" } }
      ]
    }
  }
}

must也等价于:

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } }
      ],
      "must": [
        { "match": { "address": "lane" } }
      ]
    }
  }
}

这种相当于先查询A再查询B,而上面的则是同时查询符合A和B,但结果是一样的,执行效率可能有差异。有知道原因的朋友可以告知。

OR查询

ES使用should关键字来实现OR查询。

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "account_number":136 } },
        { "match": { "address": "lane" } },
        { "match": { "city": "Urie" } }
      ]
    }
  }
}

AND取反查

must_not关键字实现了既不包含A也不包含B的查询。

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }

表示 address 字段需要符合既不包含 mill 也不包含 lane。

布尔组合查询

我们可以组合 must 、should 、must_not 进行复杂的查询。

  • A AND NOT B
GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": 40 } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

相当于SQL:

select * from bank where age=40 and state!= "ID";
  • A AND (B OR C)
GET /bank/_search
{
    "query":{
        "bool":{
            "must":[
                {"match":{"age":39}},
                {"bool":{"should":[
                            {"match":{"city":"Nicholson"}},
                            {"match":{"city":"Yardville"}}
                        ]}
                }
            ]
        }
    }
}

相当于SQL:

select * from bank where age=39 and (city="Nicholson" or city="Yardville");

范围查询

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

相当于SQL:

select * from bank where balance between 20000 and 30000;

聚合查询

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

结果:

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

查询结果返回了ID州(Idaho)有27个账户,TX州(Texas)有27个账户。

相当于SQL:

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

该查询意思是按照字段state分组,返回前10个聚合结果。

其中size设置为0意思是不返回文档内容,仅返回聚合结果。state.keyword表示字段精确匹配,因为使用模糊匹配性能很低,所以不支持。

多重聚合

我们可以在聚合的基础上再进行聚合,例如求和、求平均值等等。

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

上述查询实现了在前一个聚合的基础上,按州计算平均帐户余额(同样仅针对按降序排序的前10个州)。

我们可以在聚合中任意嵌套聚合,以从数据中提取所需的统计数据。

在前一个聚合的基础上,我们现在按降序排列平均余额:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

这里基于第二个聚合结果进行倒序排列。其实上一个例子隐藏了默认排序,也就是默认按照_sort(分值)倒序:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "_sort": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

此示例演示了我们如何按年龄段(20-29岁,30-39岁和40-49岁)进行分组,然后按性别分组,最后得到每个年龄段的平均帐户余额:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

这个结果就复杂了,属于嵌套分组,结果也是嵌套的:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_age": {
      "buckets": [
        {
          "key": "20.0-30.0",
          "from": 20,
          "to": 30,
          "doc_count": 451,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 232,
                "average_balance": {
                  "value": 27374.05172413793
                }
              },
              {
                "key": "F",
                "doc_count": 219,
                "average_balance": {
                  "value": 25341.260273972603
                }
              }
            ]
          }
        },
        {
          "key": "30.0-40.0",
          "from": 30,
          "to": 40,
          "doc_count": 504,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "F",
                "doc_count": 253,
                "average_balance": {
                  "value": 25670.869565217392
                }
              },
              {
                "key": "M",
                "doc_count": 251,
                "average_balance": {
                  "value": 24288.239043824702
                }
              }
            ]
          }
        },
        {
          "key": "40.0-50.0",
          "from": 40,
          "to": 50,
          "doc_count": 45,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 24,
                "average_balance": {
                  "value": 26474.958333333332
                }
              },
              {
                "key": "F",
                "doc_count": 21,
                "average_balance": {
                  "value": 27992.571428571428
                }
              }
            ]
          }
        }
      ]
    }
  }
}

term与match查询

首先大家看下面的例子有什么区别:

已知条件:ES里address171 Putnam Avenue的数据有1条;addressPutnam的数据有0条。index为bank,type为account,文档ID为25。

GET /bank/_search
{
  "query": {
        "match" : {
            "address" : "Putnam"
        }
    }
}

GET /bank/_search
{
  "query": {
        "match" : {
            "address.keyword" : "Putnam"
        }
    }
}

GET /bank/_search
{
  "query": {
        "term" : {
            "address" : "Putnam"
        }
    }
}

结果:
1、第一个能匹配到数据,因为会分词查询。
2、第二个不能匹配到数据,因为不分词的话没有该条数据。
3、结果不确定。需要看实际是怎么分词的。

我们通过下列查询可以知晓该条数据字段address的分词情况:

GET /bank/account/25/_termvectors?fields=address

结果:

{
  "_index": "bank",
  "_type": "account",
  "_id": "25",
  "_version": 1,
  "found": true,
  "took": 0,
  "term_vectors": {
    "address": {
      "field_statistics": {
        "sum_doc_freq": 591,
        "doc_count": 197,
        "sum_ttf": 591
      },
      "terms": {
        "171": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 3
            }
          ]
        },
        "avenue": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 11,
              "end_offset": 17
            }
          ]
        },
        "putnam": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 4,
              "end_offset": 10
            }
          ]
        }
      }
    }
  }
}

可以看出该条数据字段address一共分了3个词:

171
avenue
putnam

现在可以得出第三个查询的答案:匹配不到!但值改成小写的putnam又能匹配到了!

原因是:

  • term query 查询的是倒排索引中确切的term
  • match query 会对filed进行分词操作,然后再查询

由于Putnam不在分词里(大小写敏感),所以匹配不到。match query先对filed进行分词,也就是分成putnam,再去匹配倒排索引中的term,所以能匹配到。

standard analyzer 分词器分词默认会将大写字母全部转为小写字母。

参考

1、Getting Started | Elasticsearch Reference [5.6] | Elastic
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/getting-started.html
2、Elasticsearch 5.x 关于term query和match query的认识 – wangchuanfu – 博客园
https://www.cnblogs.com/wangchuanfu/p/7444253.html

版权声明:本文为52fhy原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/52fhy/p/9826356.html