【ES实战】Rally 离线使用实现自定义track压测

Source

Rally 离线使用实现自定义track压测

离线安装

离线安装Rally

离线安装Rally 安装之后,默认配置不动

在另外的机器上单独部署的ES集群,或者说是,自主修改的ES集群,不使用官方的发布包。

术语表

track是对一个或多个具有特定文档语料库的基准测试场景的描述。例如,它定义了所涉及的索引、数据文件以及所调用的操作。用esrally list tracks列出可用的轨迹。尽管Rally开箱即有一些轨道,但你通常应该根据自己的数据创建自己的轨道。

基准测试场景的基础组成数据,如对集群读写的操作步骤和索引使用的数据。

challenge
一个挑战描述了一个基准情景,例如,用4个客户端以最大吞吐量索引文档,同时从另外两个客户端发出术语和短语查询,每个客户端的速度限制为每秒10次查询。它总是在一个轨道的背景下被指定。通过使用esrally list tracks列出相应的轨道来查看可用的挑战。

基准测试场景的不同测试的维度数据,如并发度的不同

car
car是Elasticsearch集群的特定配置,例如开箱即用的配置、具有特定堆大小的配置或自定义日志配置。用esrally list cars列出可用的车。

基准测试场景的ES集群的配置数据

telemetry
遥测在Rally中用于收集关于车的指标,例如CPU使用率或索引大小。

基准测试场景中的测试效果度量指标数据

race
一场比赛是对Rally二进制的一次调用。另一个名字是一个 “基准试验”。在一场比赛中,Rally用给定的赛车在赛道上运行一次挑战。

一次基准测试的执行。

tournament
一个锦标赛是两个比赛的比较。你可以使用Rally的锦标赛模式来进行。

对比基准测试的不同执行


自定义track-tutorial

创建自定义的track目录/opt/software/esrally/rally-tracks/tutorial

自定义的track目录中需要三个文件:

  • documents.json :结构化的索引的数据

    数据来源于官网测试数据 allCountries.zip (around 300MB),解压后会有一个文件 allCountries.txt。然后通过Python脚本将数据转测json。

    python脚本

    import json
    
    cols = (("geonameid", "int", True),
            ("name", "string", True),
            ("asciiname", "string", False),
            ("alternatenames", "string", False),
            ("latitude", "double", True),
            ("longitude", "double", True),
            ("feature_class", "string", False),
            ("feature_code", "string", False),
            ("country_code", "string", True),
            ("cc2", "string", False),
            ("admin1_code", "string", False),
            ("admin2_code", "string", False),
            ("admin3_code", "string", False),
            ("admin4_code", "string", False),
            ("population", "long", True),
            ("elevation", "int", False),
            ("dem", "string", False),
            ("timezone", "string", False))
    
    
    def main():
        with open("allCountries.txt", "rt", encoding="UTF-8") as f:
            for line in f:
                tup = line.strip().split("\t")
                record = {
          
            }
                for i in range(len(cols)):
                    name, type, include = cols[i]
                    if tup[i] != "" and include:
                        if type in ("int", "long"):
                            record[name] = int(tup[i])
                        elif type == "double":
                            record[name] = float(tup[i])
                        elif type == "string":
                            record[name] = tup[i]
                print(json.dumps(record, ensure_ascii=False))
    
    
    if __name__ == "__main__":
        main()
    

    脚本调用:python3 toJSON.py > documents.json

    {
          
            "geonameid": 2986043, "name": "Pic de Font Blanca", "latitude": 42.64991, "longitude": 1.53335, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 2994701, "name": "Roc Mélé", "latitude": 42.58765, "longitude": 1.74028, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3007683, "name": "Pic des Langounelles", "latitude": 42.61203, "longitude": 1.47364, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3017832, "name": "Pic de les Abelletes", "latitude": 42.52535, "longitude": 1.73343, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3017833, "name": "Estany de les Abelletes", "latitude": 42.52915, "longitude": 1.73362, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3023203, "name": "Port Vieux de la Coume d’Ose", "latitude": 42.62568, "longitude": 1.61823, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3029315, "name": "Port de la Cabanette", "latitude": 42.6, "longitude": 1.73333, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3034945, "name": "Port Dret", "latitude": 42.60172, "longitude": 1.45562, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3038814, "name": "Costa de Xurius", "latitude": 42.50692, "longitude": 1.47569, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3038815, "name": "Font de la Xona", "latitude": 42.55003, "longitude": 1.44986, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3038816, "name": "Xixerella", "latitude": 42.55327, "longitude": 1.48736, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3038818, "name": "Riu Xic", "latitude": 42.57165, "longitude": 1.67554, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3038819, "name": "Pas del Xic", "latitude": 42.49766, "longitude": 1.57597, "country_code": "AD", "population": 0}
    {
          
            "geonameid": 3038820, "name": "Roc del Xeig", "latitude": 42.56068, "longitude": 1.4898, "country_code": "AD", "population": 0}
    
  • index.json:索引的结构mappings和settings

    {
          
            
      "settings": {
          
            
        "index.number_of_replicas": 0
      },
      "mappings": {
          
            
            "_doc":{
          
            
                "dynamic": "strict",
                "properties": {
          
            
                  "geonameid": {
          
            
                    "type": "long"
                  },
                  "name": {
          
            
                    "type": "text"
                  },
                  "latitude": {
          
            
                    "type": "double"
                  },
                  "longitude": {
          
            
                    "type": "double"
                  },
                  "country_code": {
          
            
                    "type": "text"
                  },
                  "population": {
          
            
                    "type": "long"
                  }
                }
            }
      }
    }
    
  • track.json:定义了压测的索引说明和压测场景调度

    {
          
            
      "version": 2,
      "description": "Tutorial benchmark for Rally",
      "indices": [
        {
          
            
          "name": "geonames",
          "body": "index.json"
        }
      ],
      "corpora": [
        {
          
            
          "name": "rally-tutorial",
          "documents": [
            {
          
            
              "source-file": "documents.json",
              "document-count": 11658903,
              "uncompressed-bytes": 1544799789
            }
          ]
        }
      ],
      "schedule": [
        {
          
            
          "operation": {
          
            
            "operation-type": "delete-index"
          }
        },
        {
          
            
          "operation": {
          
            
            "operation-type": "create-index"
          }
        },
        {
          
            
          "operation": {
          
            
            "operation-type": "cluster-health",
            "request-params": {
          
            
              "wait_for_status": "green"
            },
            "retry-until-success": true
          }
        },
        {
          
            
          "operation": {
          
            
            "operation-type": "bulk",
            "bulk-size": 5000
          },
          "warmup-time-period": 120,
          "clients": 8
        },
        {
          
            
          "operation": {
          
            
            "operation-type": "force-merge"
          }
        },
        {
          
            
          "operation": {
          
            
            "name": "query-match-all",
            "operation-type": "search",
            "body": {
          
            
              "query": {
          
            
                "match_all": {
          
            }
              }
            }
          },
          "clients": 8,
          "warmup-iterations": 1000,
          "iterations": 1000,
          "target-throughput": 100
        }
      ]
    }
    

documents属性下的数字需要用来验证完整性和提供进度报告。source-file数据源文件,document-count文档中记录的数量,uncompressed-bytes未压缩的文档总大小。官网的数据文档可能会发生变化,可以通过命令来确认实际的数据情况,用wc -l documents.json确定正确的文件记录数。使用stat -c %s documents.json来确认大小。

详细的track结构说明见官网:track说明

执行track

命令:需要以离线方式运行,只进行基准压测,指明了使用的ES集群和track地址,对压测结果也进行了格式化的自定义输出。

esrally race --pipeline=benchmark-only --target-hosts=http://192.168.0.1:9200 --track-path=/opt/software/esrally/rally-tracks/tutorial --offline --report-file=/opt/software/esrally/report.csv --report-format=csv

执行结果

[esrally@~ tutorial]$ esrally race --pipeline=benchmark-only --target-hosts=http://192.168.0.1:9200 --track-path=/opt/software/esrally/rally-tracks/tutorial --offline --report-file=/opt/software/esrally/report.md --report-format=csv

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [87a1c0b8-314c-4531-9b23-856c7c5e107c]
[INFO] Racing on track [tutorial] and car ['external'] with version [6.8.0].

Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running bulk                                                                   [100% done]
Running force-merge                                                            [100% done]
Running query-match-all                                                        [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

结果报告

Metric Task Value Unit 含义
Cumulative indexing time of primary shards 0 min 主分片累计索引时间
Min cumulative indexing time across primary shards 0 min 跨分片累计索引最小时间
Median cumulative indexing time across primary shards 0 min 跨分片累计索引中位时间
Max cumulative indexing time across primary shards 0 min 跨分片累计索引最大时间
Cumulative indexing throttle time of primary shards 0 min 主分片累计节流索引时间
Min cumulative indexing throttle time across primary shards 0 min 跨分片累计节流最小索引时间
Median cumulative indexing throttle time across primary shards 0 min 跨分片累计节流中位索引时间
Max cumulative indexing throttle time across primary shards 0 min 跨分片累计节流最大索引时间
Cumulative merge time of primary shards 0 min 主分片累积合并时间
Cumulative merge count of primary shards 0 主分片累积合并次数
Min cumulative merge time across primary shards 0 min 跨主分片累积最小合并时间
Median cumulative merge time across primary shards 0 min 跨主分片累积中位合并时间
Max cumulative merge time across primary shards 0 min 跨主分片累积最大合并时间
Cumulative merge throttle time of primary shards 0 min 主分片累计节流合并时间
Min cumulative merge throttle time across primary shards 0 min 主分片累计节流最小合并时间
Median cumulative merge throttle time across primary shards 0 min 主分片累计节流中位合并时间
Max cumulative merge throttle time across primary shards 0 min 主分片累计节流最大合并时间
Cumulative refresh time of primary shards 0 min 主分片累积refresh时间
Cumulative refresh count of primary shards 15 主分片累积refresh次数
Min cumulative refresh time across primary shards 0 min 主分片累积最小refresh时间
Median cumulative refresh time across primary shards 0 min 主分片累积中位refresh时间
Max cumulative refresh time across primary shards 0 min 主分片累积最大refresh时间
Cumulative flush time of primary shards 0 min 主分片累积flush时间
Cumulative flush count of primary shards 0 主分片累积flush次数
Min cumulative flush time across primary shards 0 min 主分片累积最小flush时间
Median cumulative flush time across primary shards 0 min 主分片累积中位flush时间
Max cumulative flush time across primary shards 0 min 主分片累积最大flush时间
Total Young Gen GC time 2.694 s Young GC总时间
Total Young Gen GC count 170 Young GC总次数
Total Old Gen GC time 0 s Old GC总时间
Total Old Gen GC count 0 Old GC总次数
Store size 1.07E-06 GB 存储大小
Translog size 5.12E-07 GB Translog大小
Heap used for segments 0 MB segments使用的堆内内存
Heap used for doc values 0 MB doc values使用的堆内内存
Heap used for terms 0 MB terms使用的堆内内存
Heap used for norms 0 MB norms使用的堆内内存
Heap used for points 0 MB points使用的堆内内存
Heap used for stored fields 0 MB stored fields使用的堆内内存
Segment count 0 Segment数量
Total Ingest Pipeline count 0
Total Ingest Pipeline time 0 s
Total Ingest Pipeline failed 0
error rate bulk 0 % 错误率
Min Throughput query-match-all 100 ops/s
Mean Throughput query-match-all 100 ops/s
Median Throughput query-match-all 100 ops/s
Max Throughput query-match-all 100 ops/s
50th percentile latency query-match-all 2.518748515 ms
90th percentile latency query-match-all 3.393146186 ms
99th percentile latency query-match-all 4.929880542 ms
99.9th percentile latency query-match-all 6.498478545 ms
100th percentile latency query-match-all 8.77224002 ms
50th percentile service time query-match-all 1.522833598 ms
90th percentile service time query-match-all 1.95039534 ms
99th percentile service time query-match-all 3.240323039 ms
99.9th percentile service time query-match-all 4.757250794 ms
100th percentile service time query-match-all 6.071650889 ms
error rate query-match-all 0 %