Elasticsearch基础入门

本文以 Elasticsearch 5.6.2为例。

最新（截止到2018-09-23）的 Elasticsearch 是 6.4.1。5.x系列和6.x系列虽然有些区别，但基本用法是一样的。

官方文档：
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/

安装

安装比较简单。分两步：

配置JDK环境
安装Elasticsearch

Elasticsearch 依赖 JDK环境，需要系统先下载安装 JDK 并配置 JAVA_HOME 环境变量。JDK 版本推荐：1.8.0系列。地址：https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

安装JDk

Linux:

$ yum install -y java-1.8.0-openjdk

配置环境变量，需要修改/etc/profile，增加：

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el6_10.x86_64
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
JAVACMD=/usr/bin/java
export JAVA_HOME JAVACMD CLASSPATH PATH

然后使之生效：

source /etc/profile

Windows:

安装包地址：
http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-windows-x64.exe

下载并配置JDK环境变量

JAVA_HOME=C:\Program Files\Java\jdk1.8.0_101

CLASSPATH=.;%JAVA_HOME%\lib;.;%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar;

安装Elasticsearch

Elasticsearch 安装只需要下载二进制压缩包包，解压即可使用。需要特别注意的是版本号，如果还要安装Kibana及插件，需要注意选用一样的版本号。

安装包下载：https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.2.tar.gz

这个页面有 Elasticsearch 所有版本的下载：https://www.elastic.co/downloads/past-releases

下载后解压到指定目录，进入到 bin 目录，就可以运行 Elasticsearch 了：
Linux:

./elasticsearch

Windows:

elasticsearch.bat

注： Linux/Mac环境不能使用 root 用户运行。

基础入门

我们可以使用curl或者kibana提供的Dev Tools进行API测试。

例如：
curl方式：

curl 'localhost:9200/_cat/health?format=json'

[{"epoch":"1537689647","timestamp":"16:00:47","cluster":"elasticsearch","status":"yellow","node.total":"1","node.data":"1","shards":"11","pri":"11","relo":"0","init":"0","unassign":"11","pending_tasks":"0","max_task_wait_time":"-","active_shards_percent":"50.0%"}]

Dev Tools:

GET /_cat/health?format=json

个人比较喜欢Kibana提供的Dev Tools，非常方便。

查看_cat命令：

GET _cat

=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates

以下测试均在Dev Tools执行。

节点操作

查看健康状态

GET /_cat/health?format=json

结果：

[
  {
    "epoch": "1537689915",
    "timestamp": "16:05:15",
    "cluster": "elasticsearch",
    "status": "yellow",
    "node.total": "1",
    "node.data": "1",
    "shards": "11",
    "pri": "11",
    "relo": "0",
    "init": "0",
    "unassign": "11",
    "pending_tasks": "0",
    "max_task_wait_time": "-",
    "active_shards_percent": "50.0%"
  }
]

健康状态有3种：

Green – 正常（集群功能齐全）
Yellow – 所有数据均可用，但尚未分配一些副本（群集功能齐全）
Red – 某些数据由于某种原因不可用（群集部分功能可用）

注意：当群集为红色时，它将继续提供来自可用分片的搜索请求，但您可能需要尽快修复它，因为存在未分配的分片。

查看节点

GET /_cat/nodes?format=json

索引

查看所有index

GET /_cat/indices?format=json

结果：

[
  {
    "health": "yellow",
    "status": "open",
    "index": "filebeat-2018.09.23",
    "uuid": "bwWVhUkBTIe46h9QJfmZHw",
    "pri": "5",
    "rep": "1",
    "docs.count": "4231",
    "docs.deleted": "0",
    "store.size": "2.5mb",
    "pri.store.size": "2.5mb"
  },
  {
    "health": "yellow",
    "status": "open",
    "index": ".kibana",
    "uuid": "tnWbNLSMT7273UEh6RfcBg",
    "pri": "1",
    "rep": "1",
    "docs.count": "4",
    "docs.deleted": "0",
    "store.size": "23.9kb",
    "pri.store.size": "23.9kb"
  }
]

创建index

PUT /customer?pretty

删除index

DELETE /customer?pretty

查询指定 Index 的 mapping

GET /customer/_mapping?pretty

注：ElasticSearch里面有 index 和 type 的概念：index称为索引,type为文档类型，一个index下面有多个type，每个type的字段可以不一样。这类似于关系型数据库的 database 和 table 的概念。但是，ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。所以后来ElasticSearch团队想去掉type，于是在6.x版本为了向下兼容，一个index只允许有一个type。预计7.x版本彻底去掉type。参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

所以，实际使用中建议一个index里面仅有一个type，名称可以和index一致，或者使用固定的doc。

增删改查

按ID新增数据

type为doc：

PUT /customer/doc/1?pretty
{
  "name": "John Doe"
}

PUT /customer/doc/2?pretty
{
  "name": "yujc",
  "age":22
}

如果index不存在，直接新增数据也会同时创建index。

同时，该操作也能修改数据：

PUT /customer/doc/2?pretty
{
  "name": "yujc",
  "age":23
}

age字段会被修改，而且_version会被修改为2：

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": false
}

按ID查询数据

GET /customer/doc/1?pretty

结果：

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 2,
  "found": true,
  "_source": {
    "name": "John Doe"
  }
}

直接新增数据

我们也可以不指定文档ID从而直接新增数据：

POST /customer/doc?pretty
{
  "name": "yujc",
  "age":23
}

注意这里使用的动作是POST。PUT新增数据必须指定文档ID。

更新数据

我们使用下面两种方式均能更新已有数据：

PUT /customer/doc/1?pretty
{
  "name": "yujc2",
  "age":22
}

POST /customer/doc/1?pretty
{
  "name": "yujc2",
  "age":22
}

以上操作均会覆盖现有数据。

如果只是想更新指定字段，必须使用POST加参数的形式：

POST /customer/doc/1/_update?pretty
{
  "doc":{"name": "yujc"}
}

其中_update表示更新。doc必须有，否则会报错。

增加字段：

POST /customer/doc/1/_update?pretty
{
  "doc":{"yeat": 2018}
}

就会在已有的数据基础上增加一个year字段，不会覆盖已有数据：

GET /customer/doc/1?pretty

结果：

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 16,
  "found": true,
  "_source": {
    "name": "yujc",
    "age": 22,
    "yeat": 2018
  }
}

也可以使用简单脚本执行更新。此示例使用脚本将年龄增加5：

POST /customer/doc/1/_update?pretty
{
  "script":"ctx._source.age+=5"
}

结果：

{
  "_index": "customer",
  "_type": "doc",
  "_id": "1",
  "_version": 17,
  "found": true,
  "_source": {
    "name": "yujc",
    "age": 27,
    "yeat": 2018
  }
}

按ID删除数据

DELETE /customer/doc/1?pretty

批量

新增

POST /customer/doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }

该操作会新增2条记录，而不是4条。查询数据：

GET /customer/doc/2?pretty

结果：

{
  "_index": "customer",
  "_type": "doc",
  "_id": "2",
  "_version": 2,
  "found": true,
  "_source": {
    "name": "Jane Doe"
  }
}

更新、删除

POST /customer/doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}

该操作会更新ID为1的文档，删除ID为2的文档。

注意：批量操作如果某条失败了，并不影响下一条继续执行。

全文检索

经过前面的基础入门，我们对ES的基本操作也会了。现在来学习ES最强大的部分：全文检索。

准备工作

批量导入数据

先需要准备点数据，然后导入：

wget https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/account/_bulk?pretty&refresh" --data-binary "@accounts.json"

这样我们就导入了1000条数据到ES。index是bank。我们可以查看现在有哪些index：

curl "localhost:9200/_cat/indices?format=json&pretty"

结果：

[
  {
    "health" : "yellow",
    "status" : "open",
    "index" : "bank",
    "uuid" : "IhyOzz3WTFuO5TNgPJUZsw",
    "pri" : "5",
    "rep" : "1",
    "docs.count" : "1000",
    "docs.deleted" : "0",
    "store.size" : "640.3kb",
    "pri.store.size" : "640.3kb"
  },
  {
    "health" : "yellow",
    "status" : "open",
    "index" : "customer",
    "uuid" : "f_nzBLypSUK2SVjL2AoKxQ",
    "pri" : "5",
    "rep" : "1",
    "docs.count" : "9",
    "docs.deleted" : "0",
    "store.size" : "31kb",
    "pri.store.size" : "31kb"
  },
  {
    "health" : "yellow",
    "status" : "open",
    "index" : ".kibana",
    "uuid" : "tnWbNLSMT7273UEh6RfcBg",
    "pri" : "1",
    "rep" : "1",
    "docs.count" : "5",
    "docs.deleted" : "0",
    "store.size" : "29.4kb",
    "pri.store.size" : "29.4kb"
  }
]

使用kibana可视化数据

该小节是可选的，如果不感兴趣，可以跳过。

该小节要求你已经搭建好了ElasticSearch + Kibana。

打开kibana web地址：http://127.0.0.1:5601，依次打开：Management
-> Kibana -> Index Patterns ,选择Create Index Pattern：

a. Index pattern 输入：bank ；

b. 点击Create。

然后打开Discover，选择 bank 就能看到刚才导入的数据了。

我们在可视化界面里检索数据：

是不是很酷！

接下来我们使用API来实现检索。

关键字检索

模糊检索

GET /bank/_search?q="Virginia"&pretty

解释：检索关键字为”Virginia”的结果。结果示例：

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 4.631368,
    "hits": [
      {
        "_index": "bank",
        "_type": "account",
        "_id": "298",
        "_score": 4.631368,
        "_source": {
          "account_number": 298,
          "balance": 34334,
          "firstname": "Bullock",
          "lastname": "Marsh",
          "age": 20,
          "gender": "M",
          "address": "589 Virginia Place",
          "employer": "Renovize",
          "email": "bullockmarsh@renovize.com",
          "city": "Coinjock",
          "state": "UT"
        }
      },
      {
        "_index": "bank",
        "_type": "account",
        "_id": "25",
        "_score": 4.6146765,
        "_source": {
          "account_number": 25,
          "balance": 40540,
          "firstname": "Virginia",
          "lastname": "Ayala",
          "age": 39,
          "gender": "F",
          "address": "171 Putnam Avenue",
          "employer": "Filodyne",
          "email": "virginiaayala@filodyne.com",
          "city": "Nicholson",
          "state": "PA"
        }
      }
    ]
  }
}

返回字段含义：

took – Elasticsearch执行搜索的时间（以毫秒为单位）
timed_out – 搜索是否超时
_shards – 搜索了多少个分片，以及搜索成功/失败分片的计数
hits – 搜索结果，是个对象
hits.total – 符合我们搜索条件的文档总数
hits.hits – 实际的搜索结果数组（默认为前10个文档）
hits.sort – 对结果进行排序（如果按score排序则没有该字段）
hits._score、max_score – 暂时忽略这些字段

GET /bank/_search?q=*&sort=account_number:asc&pretty

解释：所有结果通过account_number字段升序排列。默认只返回前10条。

下面的查询与上面的含义一致：

GET /bank/_search
{
  "query": {
        "multi_match" : {
            "query" : "Virginia",
            "fields" : ["_all"]
        }
    }
}

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

通常我们会采用传JSON方式查询。Elasticsearch提供了一种JSON样式的特定于域的语言，可用于执行查询。这被称为查询DSL。

注意：上述的查询里面我们仅指定了index，并没有指定type，那么ES将不会区分type。如果想区分，请在URI后面追加type。示例：GET /bank/account/_search。

字段检索

再看按字段查询：

GET /bank/_search
{
  "query": {
        "multi_match" : {
            "query" : "Virginia",
            "fields" : ["firstname"]
        }
    }
}

GET /bank/_search
{
  "query": {
        "match" : {
            "firstname" : "Virginia"
        }
    }
}

上面2种查询是等效的，都是查询firstname为Virginia的结果。

不分词

默认检索都是分词的，如果我们希望精确匹配，可以这样实现：

GET /bank/_search
{
  "query": {
        "match" : {
            "address.keyword" : "171 Putnam Avenue"
        }
    }
}

在字段后面加上.keyword表示不分词，使用精确匹配。大家可以测试下面2种查询结果的区别:

GET /bank/_search
{
  "query": {
        "match" : {
            "address" : "Putnam"
        }
    }
}

GET /bank/_search
{
  "query": {
        "match" : {
            "address.keyword" : "Putnam"
        }
    }
}

第二种将查不到任何结果。

分页

分页使用关键字from、size，分别表示偏移量、分页大小。

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 0,
  "size": 2
}

from默认是0，size默认是10。

字段排序

字段排序关键字是sort。支持升序(asc)、降序(desc)。

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from":0,
  "size":10
}

过滤字段

默认情况下，ES返回所有字段。这被称为源（_source搜索命中中的字段）。如果我们不希望返回所有字段，我们可以只请求返回源中的几个字段。

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}

通过_source关键字可以实现字段过滤。

AND查询

如果我们想同时查询符合A和B字段的结果，该怎么查呢？可以使用must关键字组合。

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}


GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "account_number":136 } },
        { "match": { "address": "lane" } },
        { "match": { "city": "Urie" } }
      ]
    }
  }
}

must也等价于：

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } }
      ],
      "must": [
        { "match": { "address": "lane" } }
      ]
    }
  }
}

这种相当于先查询A再查询B，而上面的则是同时查询符合A和B，但结果是一样的，执行效率可能有差异。有知道原因的朋友可以告知。

OR查询

ES使用should关键字来实现OR查询。

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "account_number":136 } },
        { "match": { "address": "lane" } },
        { "match": { "city": "Urie" } }
      ]
    }
  }
}

AND取反查

must_not关键字实现了既不包含A也不包含B的查询。

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }

表示 address 字段需要符合既不包含 mill 也不包含 lane。

布尔组合查询

我们可以组合 must 、should 、must_not 进行复杂的查询。

A AND NOT B

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": 40 } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

相当于SQL：

select * from bank where age=40 and state!= "ID";

A AND (B OR C)

GET /bank/_search
{
    "query":{
        "bool":{
            "must":[
                {"match":{"age":39}},
                {"bool":{"should":[
                            {"match":{"city":"Nicholson"}},
                            {"match":{"city":"Yardville"}}
                        ]}
                }
            ]
        }
    }
}

相当于SQL：

select * from bank where age=39 and (city="Nicholson" or city="Yardville");

范围查询

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

相当于SQL：

select * from bank where balance between 20000 and 30000;

聚合查询

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

结果：

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

查询结果返回了ID州(Idaho)有27个账户，TX州(Texas)有27个账户。

相当于SQL：

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

该查询意思是按照字段state分组，返回前10个聚合结果。

其中size设置为0意思是不返回文档内容，仅返回聚合结果。state.keyword表示字段精确匹配，因为使用模糊匹配性能很低，所以不支持。

多重聚合

我们可以在聚合的基础上再进行聚合，例如求和、求平均值等等。

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

上述查询实现了在前一个聚合的基础上，按州计算平均帐户余额（同样仅针对按降序排序的前10个州）。

我们可以在聚合中任意嵌套聚合，以从数据中提取所需的统计数据。

在前一个聚合的基础上，我们现在按降序排列平均余额：

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

这里基于第二个聚合结果进行倒序排列。其实上一个例子隐藏了默认排序，也就是默认按照_sort(分值)倒序：

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "_sort": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

此示例演示了我们如何按年龄段（20-29岁，30-39岁和40-49岁）进行分组，然后按性别分组，最后得到每个年龄段的平均帐户余额：

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

这个结果就复杂了，属于嵌套分组，结果也是嵌套的：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_age": {
      "buckets": [
        {
          "key": "20.0-30.0",
          "from": 20,
          "to": 30,
          "doc_count": 451,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 232,
                "average_balance": {
                  "value": 27374.05172413793
                }
              },
              {
                "key": "F",
                "doc_count": 219,
                "average_balance": {
                  "value": 25341.260273972603
                }
              }
            ]
          }
        },
        {
          "key": "30.0-40.0",
          "from": 30,
          "to": 40,
          "doc_count": 504,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "F",
                "doc_count": 253,
                "average_balance": {
                  "value": 25670.869565217392
                }
              },
              {
                "key": "M",
                "doc_count": 251,
                "average_balance": {
                  "value": 24288.239043824702
                }
              }
            ]
          }
        },
        {
          "key": "40.0-50.0",
          "from": 40,
          "to": 50,
          "doc_count": 45,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 24,
                "average_balance": {
                  "value": 26474.958333333332
                }
              },
              {
                "key": "F",
                "doc_count": 21,
                "average_balance": {
                  "value": 27992.571428571428
                }
              }
            ]
          }
        }
      ]
    }
  }
}

term与match查询

首先大家看下面的例子有什么区别：

已知条件：ES里address为171 Putnam Avenue的数据有1条；address为Putnam的数据有0条。index为bank，type为account，文档ID为25。

GET /bank/_search
{
  "query": {
        "match" : {
            "address" : "Putnam"
        }
    }
}

GET /bank/_search
{
  "query": {
        "match" : {
            "address.keyword" : "Putnam"
        }
    }
}

GET /bank/_search
{
  "query": {
        "term" : {
            "address" : "Putnam"
        }
    }
}

结果：
1、第一个能匹配到数据，因为会分词查询。
2、第二个不能匹配到数据，因为不分词的话没有该条数据。
3、结果不确定。需要看实际是怎么分词的。

我们通过下列查询可以知晓该条数据字段address的分词情况:

GET /bank/account/25/_termvectors?fields=address

结果：

{
  "_index": "bank",
  "_type": "account",
  "_id": "25",
  "_version": 1,
  "found": true,
  "took": 0,
  "term_vectors": {
    "address": {
      "field_statistics": {
        "sum_doc_freq": 591,
        "doc_count": 197,
        "sum_ttf": 591
      },
      "terms": {
        "171": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 3
            }
          ]
        },
        "avenue": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 11,
              "end_offset": 17
            }
          ]
        },
        "putnam": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 4,
              "end_offset": 10
            }
          ]
        }
      }
    }
  }
}

可以看出该条数据字段address一共分了3个词：

171
avenue
putnam

现在可以得出第三个查询的答案：匹配不到！但值改成小写的putnam又能匹配到了！

原因是：

term query 查询的是倒排索引中确切的term
match query 会对filed进行分词操作，然后再查询

由于Putnam不在分词里（大小写敏感），所以匹配不到。match query先对filed进行分词，也就是分成putnam，再去匹配倒排索引中的term,所以能匹配到。

standard analyzer 分词器分词默认会将大写字母全部转为小写字母。

参考

1、Getting Started | Elasticsearch Reference [5.6] | Elastic
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/getting-started.html
2、Elasticsearch 5.x 关于term query和match query的认识 – wangchuanfu – 博客园
https://www.cnblogs.com/wangchuanfu/p/7444253.html

本文链接：https://www.cnblogs.com/52fhy/p/9826356.html

Elasticsearch基础入门

安装

安装JDk

安装Elasticsearch

基础入门

节点操作

查看健康状态

查看节点

索引

查看所有index

创建index

删除index

查询指定 Index 的 mapping

增删改查

按ID新增数据

按ID查询数据

直接新增数据

更新数据

按ID删除数据

批量

新增

更新、删除

全文检索

准备工作

批量导入数据

使用kibana可视化数据

关键字检索

模糊检索

字段检索

不分词

分页

字段排序

过滤字段

AND查询

OR查询

AND取反查

布尔组合查询

范围查询

聚合查询

多重聚合

term与match查询

参考

Elasticsearch基础入门的更多相关文章

随机推荐

热门专题

目录导航