背景：目前国内有大量的公司都在使用 Elasticsearch，包括阿里、京东、滴滴、今日头条、小米、vivo等诸多知名公司。除了搜索功能之外，Elasticsearch还结合Kibana、Logstash、Elastic Stack还被广泛运用在大数据近实时分析领域，包括日志分析、指标监控等多个领域。

本节内容：Elasticsearch的排序原理。

4、单字段多值排序

5、字符串排序与多字段

我们知道，Elasticsearch默认情况下，返回的结果是按照相关性_score进行排序的，即最相关的文档排在最前。在日常业务当中，Elasticsearch排序会被经常使用，今天我带着大家看看Elasticsearch sort参数含义以及如何使用sort进行排序。

1、默认按照_score排序

为了按照相关性来排序，需要将相关性_score表示为一个数值。在 Elasticsearch 中，相关性得分由一个浮点数进行表示，并在搜索结果中通过 _score参数返回，默认排序是按照_score降序。

http://localhost:9201/student/_search

查询请求，比如需要查询id为1的数据。

{
    "query" : {
        "bool" : {
            "filter" : {
                "term" : {
                    "id" : 1
                }
            }
        }
    }
}

查询结果如下，

{
    "took": 3,"timed_out": false,"_shards": {
        "total": 1,"successful": 1,"skipped": 0,"Failed": 0
    },"hits": {
        "total": {
            "value": 1,"relation": "eq"
        },"max_score": 0,"hits": [
            {
                "_index": "student","_type": "_doc","_id": "1","_score": 0,// 相关性评分，无意义的值
                "_source": {
                    "love": "I like to collect rock albums","createTime": "2022-05-28 14:19:05","name": "test1","id": "1","age": 1
                }
            }
        ]
    }
}

上面的相关性评分可能对于生产环境而言并没有实际业务意义。因为当使用 filter过滤时，这表明只是希望获取匹配 id为1的文档数据，而并没有试图确定这些文档的相关性。如果有多个文档，此时文档会按照随机顺序返回，并且每个文档都会评为零分。

如果我们想把这个没有意义的分数过滤掉。可以使用 constant_score 关键字对查询条件进行替换：

{
    "query" : {
        "constant_score" : { //constant_score替换前面的bool
            "filter" : {
                "term" : {
                    "id" : 1
                }
            }
        }
    }
}

最终查询结果如下，

{
    "took": 5,"max_score": 1,"_score": 1,//恒定分值，默认为1
                "_source": {
                    "love": "I like to collect rock albums","age": 1
                }
            }
        ]
    }
}

此时执行与前面相同的查询请求，返回的所有文档_score的恒定值为1。

2、按照单字段排序

在实际业务场景中，通常会根据具体的单个业务字段进行排序，比如数值、日期等。

请求参数，比如我们需要查询按照创建倒序进行对学习排序，此时可以使用sort参数进行实现。

{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "id": 1
        }
      }
    }
  },"sort": {
    "createTime": {
      "order": "desc"
    }
  }
}

响应参数如下，

{
    "took": 2,"max_score": null,//返回为空
        "hits": [
            {
                "_index": "student","_score": null,//返回为空
                "_source": {
                    "love": "I like to collect rock albums","age": 1
                },"sort": [ // 新的节点
                    1653747545000 //排序字段值
                ]
            }
        ]
    }
}

此时，我们发现_score的值为null,此时表示_score没有用于排序。

createTime 字段的值表示为自 epoch (January 1,1970 00:00:00 UTC)以来的毫秒数，通过 sort 字段的值进行返回。

每个返回结果中会有一个新的节点sort元素，它包含了用于排序的值。在这个案例中，我们按照 createTime 进行排序，在内部被索引为自epoch以来的毫秒数。 long 类型数1653747545000等价于日期字符串2022-5-28 22:19:50UTC 。

其次 _score 和 max_score 字段都是 null 。计算 _score对性能会有比较大的损耗，通常仅用于排序；我们一般情况下，并不会根据相关性排序，所以记录_score是没有意义的。如果你的需要场景确实需要计算_score，此时可以将在请求参数中加track_scores参数，并设置值为true 。

{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "id": 1
        }
      }
    }
  },"track_scores": true,// 将track_scores设置为true
  "sort": {
    "createTime": {
      "order": "desc"
    }
  }
}

字段将会默认升序排序，而按照 _score 的值进行降序排序。

3、按照多字段排序

假定我们想要结合使用 createTime 和_score 进行查询，并且匹配的结果首先按照日期排序，然后按照相关性排序。

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "love": "I like to collect rock albums"
        }
      },"filter": {
        "term": {
          "id": 1
        }
      }
    }
  },"sort": [
    {
      "createTime": {
        "order": "desc"
      }
    },{
      "_score": {
        "order": "desc"
      }
    }
  ]
}

排序条件的顺序是很重要的。结果首先按第一个条件排序，仅当结果集的第一个 sort 值完全相同时才会按照第二个条件进行排序，以此类推。

多级排序并不一定包含_score字段。你也可以根据实际业务场景，针对一些不同的字段联合进行排序。

4、单字段多值排序

这种场景是单个字段需要根据多个值进行排序，而且这些值并没有固有的顺序；一个字段多值进行排序，这时应该选择哪个进行排序呢？

如果是数字或日期，你可以将多值字段减为单值，这可以通过使用 min 、 max 、 avg 或是 sum 排序模式。

比如，你可以按照每个 createTime 字段中的最早日期进行排序，通过以下方法：

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "love": "I like to collect rock albums"
        }
      },"sort": {
    "createTime": {
      "order": "asc","mode": "min"
    }
  }
}

返回结果，

{
    "took": 10,"_source": {
                    "love": "I like to collect rock albums","sort": [
                    1653747545000
                ]
            }
        ]
    }
}

此种应用场景实际生产环境中使用比较少，具体使用需要结合自身业务需求而定。

5、字符串排序与多字段

有一些业务场景下，我们需要根据某个字段的字符串值进行排序。这在普通的关系型数据库中是很难实现的，那在Elasticsearch是怎么处理的呢？

为了对字符串字段进行排序，这个字段在创建索引时，需包含一项：index为not_analyzed。但是我们仍需要 analyzed 字段，这样才能以全文进行查询。

通常有一个简单的方法解决这个问题：就是用两个字段存储同一个字符串，一个设置为analyzed 用于搜索，另一个设置为not_analyzed用于排序。

但是如果重复保存相同的字符串两次，在_source字段是浪费空间的。我们所希望的是传递一个单字段但是却用两种方式索引它。所有的 _core_field 类型 (strings,numbers,Booleans,dates) 接收一个 fields 参数。

此时，在建立映射是，可设置如下：

// < 7.x版本
"love": { 
    "type":     "string","fields": {
        "raw": { //子字段
            "type":  "string","index": "not_analyzed" //设置为not_analyzed 
        }
    }
}

// >= 7.x 版本
"love": {
    "type":     "keyword","fields": {
        "raw": {
        "type":  "keyword"
        }
    }
}

love 字段与之前的一样: 是一个analyzed全文字段。而新增加的 love.raw 子字段是 not_analyzed.

现在，至少只要我们重新索引了我们的数据，使用 love 字段用于搜索，love.raw 字段用于排序。

请求样例如下,

{
  "sort": "love.raw"
}

如果没建该字段，则会提示如下信息：

{
    "error": {
        "root_cause": [
            {
                "type": "query_shard_exception","reason": "No mapping found for [love.raw] in order to sort on","index_uuid": "PJE50ZroS4OiTMObGhkw7Q","index": "student"
            }
        ],"type": "search_phase_execution_exception","reason": "all shards Failed","phase": "query","grouped": true,"Failed_shards": [
            {
                "shard": 0,"index": "student","node": "ufFZIzzWQkaNgoJXsUn3Sg","reason": {
                    "type": "query_shard_exception","index": "student"
                }
            }
        ]
    },"status": 400
}

此时需要重建索引信息如下，

{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },"age": {
        "type": "integer"
      },"love": {
        "type":     "keyword","fields": {
          "raw": {
            "type":  "keyword"
          }
        }
      },"createTime": {
        "format": "yyyy-MM-dd HH:mm:ss","type": "date"
      }
    }
  }
}

最终查询结果如下，

{
    "took": 6,"hits": {
        "total": {
            "value": 20,"createTime": "2022-06-03 17:37:16","name": "test9","sort": [
                    "I like to collect rock albums"
                ]
            },{
                "_index": "student","_id": "3","createTime": "2022-06-03 17:37:17","id": "3","age": 3
                },"_id": "5","id": "5","age": 5
                },"_id": "7","createTime": "2022-06-03 17:37:18","id": "7","age": 7
                },"_id": "9","id": "9","age": 9
                },"_id": "11","createTime": "2022-06-03 17:37:19","id": "11","age": 11
                },"_id": "13","id": "13","age": 13
                },"_id": "15","id": "15","age": 15
                },"_id": "17","createTime": "2022-06-03 17:37:20","id": "17","age": 17
                },"_id": "19","id": "19","age": 19
                },"sort": [
                    "I like to collect rock albums"
                ]
            }
        ]
    }
}

第14篇：一文读懂Elasticsearch强大的排序能力

1、默认按照_score排序

2、按照单字段排序

3、按照多字段排序

4、单字段多值排序

5、字符串排序与多字段

相关推荐