我的Postgresql 9.3 DB中有3亿个地址,我想用pg_trgm模糊搜索行.最终目的是实现与谷歌地图搜索一样的搜索功能.
当我使用pg_trgm搜索这些地址时,获得结果需要大约30秒.有许多行符合0.3的默认相似性阈值条件,但我只需要大约5或10个结果.我创建了一个trigram GiST索引:
CREATE INDEX addresses_trgm_index ON addresses USING gist (address gist_trgm_ops);
这是我的查询:
SELECT address,similarity(address,'981 maun st') AS sml FROM addresses WHERE address % '981 maun st' ORDER BY sml DESC LIMIT 10;
生产环境的测试表已被删除.我从测试环境中显示EXPLAIN输出.大约有700万行,它需要大约1.6秒来获得结果.拥有3亿,需要30多个.
ebdb=> explain analyse select address,'781 maun st') as sml from addresses where address % '781 maun st' order by sml desc limit 10; QUERY PLAN ———————————————————————————————————————————————————————————————————————————————— Limit (cost=7615.83..7615.86 rows=10 width=16) (actual time=1661.004..1661.010 rows=10 loops=1) -> Sort (cost=7615.83..7634.00 rows=7268 width=16) (actual time=1661.003..1661.005 rows=10 loops=1) Sort Key: (similarity((address)::text,'781 maun st'::text)) Sort Method: top-N heapsort Memory: 25kB -> Index Scan using addresses_trgm_index on addresses (cost=0.41..7458.78 rows=7268 width=16) (actual time=0.659..1656.386 rows=5241 loops=1) Index Cond: ((address)::text % '781 maun st'::text) Total runtime: 1661.066 ms (7 rows)
解决方法
Postgresql 9.3 … Is there a good way to improve the performance or is it a good plan to do table partitioning?
表分区根本没用.
但是,有一个很好的方法:升级到Postgres的当前版本. GiST索引有很多改进,特别是对于pg_trgm模块和一般的大数据.使用Postgres 9.6或即将推出的Postgres 10(目前为测试版)应该会快得多.
你“最近邻居”看起来是正确的但是对于一个小的LIMIT使用这个等价的查询:
SELECT address,'981 maun st') AS sml FROM addresses WHERE address % '981 maun st' ORDER BY address <-> '981 maun st' LIMIT 10;
It will usually beat the first formulation when only a small number of the closest matches is wanted.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。