Elasticsearch运维实战

下载 5

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u3791/Elasticsearch_in_Production?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

da仔

发布于

6年前

7467

人观看

#信息技术

Elasticsearch以其强大而易用的搜索功能赢得大量的用户和部署，在实际生产环境中如何保持集群的稳定变得越来越重要。本次分享从Elasticsearch的原理、实际运维中需要关注的重要指标两个层面来讲解Elasticsearch集群的配置和监测。

展开查看详情

1 .ELASTICSEARCH集群运维实战 PRESENTED BY XU PENG

2 .ELASTICSEARCH在数据湖中的地位

3 .ELASTICSEARCH的功能

4 .LUCENE核心概念

5 .PART II ELASTICSEARCH集群

6 . ELASTICSEARCH – 分布式LUCENE索引 - 集群管理，节点层面 - 分布式索引管理 1. Schema管理 2. 索引分布 3. 索引迁移 - 索引查询 - query-and-fetch

7 .ELASTICSEARCH集群节点组成 - 节点类型 - Master - Data - Coordinator - Ingest

8 . 节点发现机制 • ZEN DISCOVERY - MASTER节点包含CLUSTER状态信息 1. 节点的加入或离开 2. INDEX的创建、删除、打开、关闭 3. SHARDS的分配和路由信息 4. SCHEMA改变 • CURL –XGET LOCALHOST:9200/_CLUSTER/STATE

9 . 最精简的ES配置 #配置文件/etc/elasticsearch/elasticsearch.yml cluster.name: es_demo_cluster node.name: es_demo_node_1 node.master: true node.ingest: false node.data: true bootstrap.memory_lock: true bootstrap.system_call_filter: false network.host: 192.168.56.101 discovery.zen.ping.unicast.hosts: ["192.168.56.101","192.168.56.102","192.168.56.103"] discovery.zen.minimum_master_nodes: 2 #xpack配置 xpack.security.enabled: false xpack.monitoring.enabled: true xpack.graph.enabled: false xpack.watcher.enabled: false xpack.monitoring.exporters.my_remote.type: http xpack.monitoring.exporters.my_remote.host: ["http://localhost:9200"]

10 .索引写入过程分析

11 . 分片管理分片初始化

12 . INDEX SETTINGS { "settings": { - checklist "index": { "routing": { 1. 避免同一个index的所有shard落入同一个数据节点 "allocation": { 2. 根据业务场景动态调整refresh_interval "total_shards_per_node": "3" 3. 调整flush_threshold_size大小 } }, 4. 每一个shard占用的磁盘空间控制在10GB~15GB "refresh_interval": "120s", 5. 同一个节点所管理的分片数不超过600个，20 shard/per gb "number_of_shards": "18", heap "translog": { 6. 注意索引写入过程中Throttling的次数 "flush_threshold_size": "2g“, 7. 根据历史统计信息，动态调整索引配置，维护一个健康稳定的 “durability”: “async”, 集群 “sync_interval”: “15s” }, 8. 不要把Elasticsearch做为核心关键数据的主要存储 "merge": { "scheduler": { "max_thread_count": "1" } }, "number_of_replicas": "0", } } }

13 . MAPPINGS – 定义SCHEMA "mappings": { "logs": { - _source "properties": { - enabled "builtinTimestamp": { - 重建索引时有用，最好不要禁止 "type": "date", - properties "format": "YYYY-MM-dd HH:mm:ss.SSS" - Text与keyword的区别 }, - index "exception": { - doc_values "type": "keyword", "ignore_above": 10915 Notes: 时间类型的格式 }, "request": { "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" "type": "text", "index": false }, "response": { "type": "keyword" }, "timestamp": { "type": "date" }, }}}}}

14 . 集群索引恢复 "transient": { "cluster": { "routing": { "allocation": { "node_initial_primaries_recoveries": "25", "balance": { “index”: “16.0f”, 值越大，意味着某一个索引的shard在集群中分布越均匀 "threshold": "1.0f", “shard”: “0.02f” 值越大，集群中某一个节点上的shard数目越均匀 }, "enable": "all", "cluster_concurrent_rebalance": "120", "node_concurrent_recoveries": “4", "exclude": { "_name": "", “_ip”: “192.168.0.101，192.168.0.102” } } }

15 . ES监控要点 ES节点集群规 ES实例划索引模板索引分片

16 .OS监控工具一览

17 . 集群配置和监控 - OS参数配置 - JVM设置 - 内存 1. 内存不超过32GB vm.dirty_ratio 2. 避免调整thread_pool vm.dirty_background_ratio - I/O scheduler echo ‘cfq’ > /sys/block/sd$i/queue/scheduler blockdev –setra 1024 /dev/sd$i - 监控工具 Ssd硬盘，推荐使用noop 1. htop/atop 2. sar - 关闭Transparent Huge Pages 3. perf echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled - 禁用numa sudo sysctl -w vm.zone_reclaim_mode=0

18 . CAT API 节点状态集群状态 GET _cat/health GET _nodes/stats 统计信息 GET _cluster/health?pretty GET _nodes/node_name/stats 具体某一个节点的信息 GET _cluster/state GET _nodes/ 基础配置信息索引信息 shard信息 GET _cat/indices GET _cat/shards GET _cat/segments GET _stats 查询具体参数 GET _cat/nodes?help

19 .监控方案节点信息

20 .监控方案索引统计信息

21 . 集群状态日志文件 [2019-04-12T10:14:28,867][INFO ][o.e.c.s.ClusterApplierService] [ctrip_flight_v5_demo_183_98] removed {{ctrip_demo_v6_data_0_185}{MNhNRMSPQpKbv2ukGewKzw}{1NU_GBxCQwS7Btxg63osvA}{192.168.0.185}{192.168. 0.185:9301}{ml.machine_memory=134610358272, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true},}, reason: apply cluster state (from master [master {ctrip_flight_v6_demo_183_98}

22 . 自动化/智能化运维 Docker & K8s grafana ES 自动化运 Prometheu 维 s-exporter prometheu s

23 . Q&A THANK YOU

0点赞

1收藏

5下载