ES的安装部署以及基本操作
一、背景
像百度、Google的网页搜索,能根据我们输入的关键字模糊匹配返回相关搜索信息,并高亮显示我们的搜索关键字。这种大量的非结构化文本检索,关系型数据库并不能很好的支持(1、非结构化数据关系型数据库支持并不好 2、即便依据常用于搜索的和关键字建立索引,模糊查询的效率也是低下的因为压根走不到索引 3、支持搜索的灵活性很重要)。
全文搜索引擎,是对文章中的每个词进行扫描,并对每个词建立索引,存下来这些词在文章中所在的页码和次数,检索的时候根据建立的索引进行查找,并将查询结果进行反馈。
二、什么是ES
ElasticSearch是一个分布式、Restful风格的搜索和数据分析引擎,能够解决不断涌现的各种用例。ES 是一个开源的高扩展的分布式全文搜索引擎,它可以近乎实时的存储、检索数据;本身扩展性很好,可以扩展到上百台服务器,处理 PB 级别的数据。
三、下载安装
下载链接
Past Releases of Elastic Stack Software | Elastic
目前最新版本是 Elasticsearch 8.17.1 2025年1月22日。但不建议直接使用最新,使用一个靠近最新版本的的就可(这样稳定性会更好一些)
windows版本的解压即安装结束
然后添加环境变量 ES_HOME 值为ES的根目录。在path中添加 %ES_HOME%\bin
3.1 主要目录
目录 | 含义 |
---|---|
bin | 用于启动和管理Elasticsearch的脚本和工具 |
config | Elasticsearch的配置文件 |
jdk | 内置 JDK 目录,如果本机配置了java_home使用的是本机的jdk |
lib | 依赖库目录 |
logs | 日志文件目录 |
modules | 核心模块目录 |
plugins | 插件目录 |
9300端口为组件通讯窗口,9200为浏览器访问的http协议RESTful端口
localhpost:9200
打开网页访问上述url,能看到正常的回显信息
3.2 什么是RESTful
REST 指的是一组架构约束条件和原则。满足这些约束条件和原则的应用程序或设计就
是 RESTful。Web 应用程序最重要的 REST 原则是,客户端和服务器之间的交互在请求之
间是无状态的。从客户端到服务器的每个请求都必须包含理解请求所必需的信息。如果服务
器在请求之间的任何时间点重启,客户端不会得到通知。此外,无状态请求可以由任何可用
服务器回答,这十分适合云计算之类的环境。客户端可以缓存数据以改进性能。
(太具体的规则真没看明白,后续补充吧)
四 HTTP操作
4.1 索引
索引相当于关系型数据库中的schema。
(1)创建索引
用apipost向ES发起put请求
http://127.0.0.1:9200/test
返回结果如下
{"acknowledged": true,"shards_acknowledged": true,"index": "test"
}
索引test建立成功。重复建立索引则会失败并返回如下结果
{"error": {"root_cause": [{"type": "resource_already_exists_exception","reason": "index [test/M6Qd4rchT56iUlDJZWHWmA] already exists","index_uuid": "M6Qd4rchT56iUlDJZWHWmA","index": "test"}],"type": "resource_already_exists_exception","reason": "index [test/M6Qd4rchT56iUlDJZWHWmA] already exists","index_uuid": "M6Qd4rchT56iUlDJZWHWmA","index": "test"},"status": 400
}
索引test已经存在
(2)查看已有索引
发起get请求
http://127.0.0.1:9200/_cat/indices?v
响应结果如下
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open test M6Qd4rchT56iUlDJZWHWmA 1 1 0 0 208b 208b
表头 | 说明 | 案例 |
---|---|---|
health | 当前服务器健康状态 green(集群完整) yellow(单点正常、集群不完整) red(单点不正常) | yellow |
status | 索引打开、关闭状态 | open |
index | 索引名 | test |
uuid | 索引统一编号 | M6Qd4rchT56iUlDJZWHWmA |
pri | 主分片数量 | 1 |
rep | 副本数量 | 1 |
docs.count | 可用文档数量 | 0 |
docs.deleted | 文档删除状态(逻辑删除) | 0 |
store.size | 主分片和副分片整体占空间大小 | 208b |
pri.store.size | 主分片占空间大小 | 208b |
(3)查看单个索引
发起get请求
http://127.0.0.1:9200/test
响应结果如下
{"test": {"aliases": {},"mappings": {},"settings": {"index": {"creation_date": "1738905447076","number_of_shards": "1","number_of_replicas": "1","uuid": "M6Qd4rchT56iUlDJZWHWmA","version": {"created": "7080099"},"provided_name": "test"}}}
}
(4)删除索引
发起DELETE请求
http://127.0.0.1:9200/test
响应结果如下
{"acknowledged": true
}
4.2 文档
(1)创建文档
文档可以类比为关系型数据库中的 table
发起POST请求
http://127.0.0.1:9200/test/_doc
在body中选择 raw,body类型为json,body内容如下
{ "title":"天选之子", "category":"老司机", "images":"", "price":100
}
响应结果如下
{"_index": "test","_type": "_doc","_id": "0usM35QBJ6wHPuS349Jo","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 0,"_primary_term": 1
}
也可以执行文档id。比如指定文档的id为 123
http://127.0.0.1:9200/test/_doc/123
(2)查看文档
可以根据文档的id查看指定的文档
发起GET请求
http://127.0.0.1:9200/test/_doc/123
响应信息如下
{"_index": "test","_type": "_doc","_id": "123","_version": 1,"_seq_no": 1,"_primary_term": 1,"found": true,"_source": {"title": "天选之子","category": "老司机","images": "","price": 100}
}
(3)修改文档
发起POST请求,修改id为 123 的文档
http://127.0.0.1:9200/test/_doc/123
body内容为
{ "title":"天选之子123", "category":"老司机", "images":"", "price":100
}
响应信息如下
{"_index": "test","_type": "_doc","_id": "123","_version": 2,"result": "updated","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 2,"_primary_term": 1
}
(4)修改字段
发起POST请求,修改id为 123 的文档
http://127.0.0.1:9200/test/_update/123
body内容为
{ "doc":{"price":111 }
}
相应内容如下
{"_index": "test","_type": "_doc","_id": "123","_version": 3,"result": "updated","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 3,"_primary_term": 1
}
(5)删除文档
发起DELETE请求,删除id为 123 的文档
http://127.0.0.1:9200/test/_doc/123
响应信息如下
{"_index": "test","_type": "_doc","_id": "123","_version": 4,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 4,"_primary_term": 1
}
也可以按照条件进行删除(不建议,保险起见还是按照id删除比较好)
发起 POST 请求,删除price字段为 111 的文档
http://127.0.0.1:9200/test/_delete_by_query
请求体内容为
{ "query":{ "match":{ "price":111 } }
}
响应结果如下
{"took": 1033,"timed_out": false,"total": 1,"deleted": 1,"batches": 1,"version_conflicts": 0,"noops": 0,"retries": {"bulk": 0,"search": 0},"throttled_millis": 0,"requests_per_second": -1,"throttled_until_millis": 0,"failures": []
}
4.3 映射
映射笔者理解其实就是文档的描述。文档类似于二维数据库中的二维表。而映射类似于二维表的元数据。
(1)创建映射
发起PUT请求
http://127.0.0.1:9200/test/_mapping
body
{"properties": {"name": {"type": "text","index": true,"store": false},"sex": {"type": "text","index": false,"store": false},"age": {"type": "long","index": false,"store": false}}
}
type类型
类型 | 描述 | |
---|---|---|
String | text:可分词 keyword:不可分词,数据会作为完整字段进行匹配 | |
Numerical | 基本数据类型:long、integer、short、byte、double、float、half_float 浮点数的高精度类型:scaled_float | |
Date | 日期类型 | |
Array | 数组类型 | |
Object | 对象 |
index是否索引,默认为true。true可以用来搜索。
store是否独立存储。默认为false,获取独立存储的字段会更快一些,但存储会占用更多的空间。
analyzer分词器
响应结果如下
{"acknowledged": true
}
(2)查看映射
发送GET请求
http://127.0.0.1:9200/test/_mapping
响应结果如下
{"test": {"mappings": {"properties": {"age": {"type": "long","index": false},"category": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"images": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"name": {"type": "text"},"price": {"type": "long"},"sex": {"type": "text","index": false},"title": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}}}
}
(3)索引和映射关联
建立索引并关联映射
发起PUT请求
http://127.0.0.1:9200/test1
body
{"settings": {},"mappings": {"properties": {"name": {"type": "text","index": true},"sex": {"type": "text","index": false},"age": {"type": "long","index": false}}}
}
响应结果如下
{"acknowledged": true,"shards_acknowledged": true,"index": "test1"
}
4.4 查询操作
(1)查询所有文档
查询索引 test1 下的所有文档
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{"query": {"match_all": {}}
}
响应结果如下
{"took": 181,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 1,"hits": [{"_index": "test1","_type": "_doc","_id": "1001","_score": 1,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 30}},{"_index": "test1","_type": "_doc","_id": "1002","_score": 1,"_source": {"name": "lisi","nickname": "lisi","sex": "男","age": 20}},{"_index": "test1","_type": "_doc","_id": "1003","_score": 1,"_source": {"name": "wangwu","nickname": "wangwu","sex": "女","age": 40}}]}
}
(2)按照字段进行查询
查询name为张三的文档(此处因为name是映射索引,其他字段不可用于此查询)
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{"query": {"match": {"name":"zhangsan" }}
}
响应结果如下
{"took": 360,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.9808291,"hits": [{"_index": "test1","_type": "_doc","_id": "1001","_score": 0.9808291,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 30}}]}
}
(3)按照多个字段进行查询
查询name和nickname都是zhangsan的文档
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{"query": {"multi_match": {"query": "zhangsan","fields": ["name","nickname"]}}
}
返回结果如下
{"took": 116,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.9808291,"hits": [{"_index": "test1","_type": "_doc","_id": "1001","_score": 0.9808291,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 30}}]}
}
(4)字段精确查询
(此处因为name是映射索引,其他字段不可用于此查询)
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{"query": {"term": {"name": {"value": "zhangsan"}}}
}
(5)多字段精确查询
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{"query": {"terms": {"name": ["zhangsan","lisi"]}}
}
(6)查询部分字段
一般的查询回将保存在 _source 中的字段全部返回,如果返回个别则需要进行如下操作
发起GET请求
http://127.0.0.1:9200/test1/_search
body
{"_source": ["name","nickname"],"query": {"terms": {"name": ["zhangsan"]}}
}
也可以通过includes来指定需要显示的字段,或者用excludes指定不需要显示的字段
body
{"_source": {"includes": ["name","nickname"]},"query": {"terms": {"name": ["zhangsan"]}}
}
{"_source": {"excludes": ["age","sex"]},"query": {"terms": {"name": ["zhangsan"]}}
}
(7)组合查询
通过must(必须 )、must_not(必须不)、should(应该)的方式进行组合
should选项不会影响响应结果,但是满足should条件的文档评分会更高,可以通过 minimum_should_match参数指定至少需要满足的 should`条件的数量。如果未指定,则默认为 0
{"query": {"bool": {"must": [{"match": {"name": "zhangsan"}}],"must_not": [{"match": {"age": "40"}}],"should": [{"match": {"sex": "男"}}]}}
}
(8)范围查询
操作 | 说明 |
---|---|
gt | > |
lt | < |
gte | >= |
lte | <= |
body
{"query": {"range": {"age": {"gte": 30,"lte": 35}}}
}
此处笔者查询出错,因为我的age在映射中不是index
(9)模糊查询
当一个词变更为另一个词需要的变更次数叫做编辑距离
fuzzy 查询可以指定编辑距离内模糊匹配满足条件的结果
查询name包含 zhangsan 字符串的文档(所有编辑距离)
{"query": {"fuzzy": {"name": {"value": "zhangsan"}}}
}
响应结果
{"took": 365,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 1.2039728,"hits": [{"_index": "test1","_type": "_doc","_id": "1001","_score": 1.2039728,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 30}},{"_index": "test1","_type": "_doc","_id": "1004","_score": 1.0534762,"_source": {"name": "zhangsan1","nickname": "zhangsan1","sex": "女","age": 50}}]}
}
查询name包含 zhangsan 字符串的文档(编辑距离为2)
{"query": {"fuzzy": {"name": {"value": "zhangsan","fuzziness": 1}}}
}
响应结果
{"took": 365,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 1.2039728,"hits": [{"_index": "test1","_type": "_doc","_id": "1001","_score": 1.2039728,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 30}},{"_index": "test1","_type": "_doc","_id": "1004","_score": 1.0534762,"_source": {"name": "zhangsan1","nickname": "zhangsan1","sex": "女","age": 50}}]}
}
(10)排序
查询name为 zhangsan, age 降序,评分降序
{"query": {"match": {"name": "zhangsan"}},"sort": [{"age": {"order": "desc"}},{"_score": {"order": "desc"}}]
}
响应结果
{"took": 3,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": null,"hits": [{"_index": "test1","_type": "_doc","_id": "1005","_score": 0.87546873,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 40},"sort": [40,0.87546873]},{"_index": "test1","_type": "_doc","_id": "1001","_score": 0.87546873,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 30},"sort": [30,0.87546873]}]}
}
(11)高亮查询
在使用 match 查询的同时,加上一个 highlight 属性,可以设置高亮显示(高亮显示字段需要是索引字段)
标签 | 说明 |
---|---|
pre_tags | 前置标签 |
post_tags | 后置标签 |
fields | 需要高亮的字段 |
title | 这里声明 title 字段需要高亮,后面可以为这个字段设置特有配置,也可以空 |
{"query": {"match": {"name": "zhangsan"}},"highlight": {"pre_tags": "<font color='red'>","post_tags": "</font>","fields": {"name": {}}}
}
响应结果如下
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 0.87546873,"hits": [{"_index": "test1","_type": "_doc","_id": "1001","_score": 0.87546873,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 30},"highlight": {"name": ["<font color='red'>zhangsan</font>"]}},{"_index": "test1","_type": "_doc","_id": "1005","_score": 0.87546873,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 40},"highlight": {"name": ["<font color='red'>zhangsan</font>"]}}]}
}
(12)分页查询
当返回的文档数据量较多时,可以使用分页查询
from:当前页的起始索引,默认从 0 开始。
size:每页显示多少条
当 from 值较大时,性能会显著下降,因为 Elasticsearch 需要处理大量数据来跳过前面的记录。
Elasticsearch 默认限制了 from + size的最大值为10,000(index.max_result_window),超过此限制会导致查询失败
(分页查询一般都会结合排序使用)
{"query": {"match_all": {}},"sort": [{"age": {"order": "desc"}}],"from": 1,"size": 2
}
响应结果如下
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": null,"hits": [{"_index": "test1","_type": "_doc","_id": "1003","_score": null,"_source": {"name": "wangwu","nickname": "wangwu","sex": "女","age": 40},"sort": [40]},{"_index": "test1","_type": "_doc","_id": "1005","_score": null,"_source": {"name": "zhangsan","nickname": "zhangsan","sex": "男","age": 40},"sort": [40]}]}
}
(13)聚合查询
ES也支持最大最小平均等的聚合,关键字分别为(max、min、avg、sum)
求平均年龄(这里的avg_age也可以是其他,仅代表一个结果返回值接收属性的叫法)
{"aggs": {"avg_age": {"avg": {"field": "age"}}},"size": 0
}
响应结果
{"took": 20,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"avg_age": {"value": 36}}
}
去重计数
{"aggs": {"distinct_age": {"cardinality": {"field": "age"}}},"size": 0
}
响应结果
{"took": 112,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"distinct_age": {"value": 4}}
}
stats 聚合,对某个字段一次性返回 count,max,min,avg 和 sum 五个指标
{"aggs": {"stats_age": {"stats": {"field": "age"}}},"size": 0
}
响应结果
{"took": 3,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"stats_age": {"count": 5,"min": 20,"max": 50,"avg": 36,"sum": 180}}
}
(14)桶聚合查询(分组查询)
按年龄分组统计
{"aggs": {"age_groupby": {"terms": {"field": "age"}}},"size": 0
}
响应结果
{"took": 8,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"age_groupby": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": 40,"doc_count": 2},{"key": 20,"doc_count": 1},{"key": 30,"doc_count": 1},{"key": 50,"doc_count": 1}]}}
}