我在Google上搜索了很多关于Postgresql与MS sql 2008 R2性能的文章,发现很多文章都说Postgresql的性能优于MSsql 2008 R2.所以,我用我公司的真实案例进行了真正的测试.
硬件:
我在ESXi 5.5中构建了两个vmware guest虚拟机,每个guest虚拟机都具有相同的虚拟硬件规范,其中包括(我只列出了关键项):
>内存:4GB
> cpu:一个带有两个内核的虚拟插槽
>虚拟磁盘:ubuntu服务器为50GB,Windows 7为200GB(如您所知,Microsoft总是占用更多存储空间)
VM Host(单个盒子):
> RAM:16GB,带有超线程的intel-i5 2核心
>高清:1TGB 2.5“
软件:
> Windows 7 Ultima MS sql Server 2008 R2,都具有默认参数
> Ubuntu Linux Server 12.04 Postgresql 9.3,都带有默认参数
>数据库工具:Db Visualizer 9.0.9(在另一台笔记本电脑上)
> JDBC:
> Postgresql:postgresql-9.3-1101.jdbc4
> MSsql 2008R2:sqljdbc_4.0.2206.100_cht.tar
表模式(Postgresql和MSsql 2008 R2上相同,命名约定除外)
我有三个表格如下:
CREATE TABLE gourmet (
vendor_url CHaraCTER varying(200),
ticket_price CHaraCTER varying(8000),
tour_advice CHaraCTER varying(8000),
gourmet_name_c CHaraCTER varying(180),
tel_ext CHaraCTER varying(5),
gourmet_name_e CHaraCTER varying(60),
arrangement CHaraCTER varying(8000),
data_source CHaraCTER varying(200),
open_time CHaraCTER varying(8000),
vendor_name CHaraCTER varying(60),
gps_x CHaraCTER varying(20),
gourmet_id CHaraCTER varying(36) NOT NULL,
gps_y CHaraCTER varying(20),
updated_by CHaraCTER varying(20),
recom_level DOUBLE PRECISION,
fax_num CHaraCTER varying(10),
address_c CHaraCTER varying(240),
fax_area_code CHaraCTER varying(4),
long_desc CHaraCTER varying(8000),
for_search_only BOOLEAN,
status CHaraCTER varying(1),
category CHaraCTER varying(3),
area_code CHaraCTER varying(4),
county CHaraCTER varying(3),
email CHaraCTER varying(40),
updated_date TIMESTAMP(6) WITHOUT TIME ZONE,
data_source_url CHaraCTER varying(200),
tour_duration CHaraCTER varying(400),
short_name CHaraCTER varying(180),
created_by CHaraCTER varying(20),
created_date TIMESTAMP(6) WITHOUT TIME ZONE,
town CHaraCTER varying(3),
address_e CHaraCTER varying(80),
parking_info CHaraCTER varying(8000),
remark CHaraCTER varying(8000),
location_info CHaraCTER varying(8000),
tel_no CHaraCTER varying(10),
CONSTRAINT gourmet_pk PRIMARY KEY (gourmet_id)
);
CREATE INDEX _dta_index_gourmet_5_574625090__k9_k4_k1_k10_k31_k2_11_14_15_33 ON gourmet (
county,
status,
gourmet_id,
town,
vendor_url,
gourmet_name_c
);
CREATE INDEX _dta_index_gourmet_7_1796201449__k9_k1_k2 ON gourmet
(
county,
gourmet_id,
gourmet_name_c
);
CREATE INDEX idx_grourmet_status ON gourmet (
status
);
=======================================================
CREATE TABLE photos (
source CHaraCTER varying(100),
caption CHaraCTER varying(100),
display_order INTEGER,
main_photo BOOLEAN,
bytes_original BYTEA,
folder_id CHaraCTER varying(36),
photo_id INTEGER DEFAULT nextval('photos_photo_id_seq'::regclass) NOT NULL,
bytes_thumb BYTEA,
CONSTRAINT photos_pk PRIMARY KEY (photo_id)
);
CREATE INDEX _dta_index_photos_5_1925581898__k2_k5_1 ON photos (
folder_id,
main_photo
);
CREATE INDEX idx_photos_folder_id ON photos (
folder_id
);
CREATE INDEX _dta_index_photos_5_1925581898__k5_k2_k1 ON photos (
main_photo,
folder_id,
photo_id
);
====================================================
CREATE TABLE counters (
item_id CHaraCTER varying(36) NOT NULL,
counter_name CHaraCTER varying(30) NOT NULL,
target_date CHaraCTER(10) NOT NULL,
total_hit BIGINT,
CONSTRAINT counters_pk PRIMARY KEY (item_id, counter_name, target_date)
);
表美食中有4108条记录,表格中有37451条记录,表格中有11659353条记录.
为了避免结果集的传输消耗时间,所以我只计算记录号.
SELECT count(*) FROM (
SELECT a.gourmet_id item_id, a.gourmet_name_c item_name, a.vendor_url external_url, 'fnf-item.aspx?pid=' || a.gourmet_id fun_taiwan_url,
b.photo_id,coalesce(SUM(c.total_hit),0) hit_count,to_char(a.created_date, 'YYYY/MM/DD HH24:MI:SS') created_date,
'1' is_gourmet, a.for_search_only, a.area_code || '-' || a.tel_no tel_no, a.tel_ext, a.county, a.town, a.address_c , 'gourmet.png' icon_name ,
Now() as updated_date, 'NonorderScheduler' as updated_by
FROM gourmet a
LEFT JOIN photos b ON b.folder_id=a.gourmet_id AND b.main_photo='1'
LEFT JOIN counters c ON c.item_id=a.gourmet_id AND c.counter_name='FnfHit'
WHERE a.status='A'
GROUP BY a.gourmet_id, a.gourmet_name_c, length(a.vendor_url),a.vendor_url, 'fnf-item.aspx?pid=' || a.gourmet_id,b.photo_id, to_char(a.created_date,'YYYY/MM/DD HH24:MI:SS'),
a.for_search_only, a.area_code || '-' || a.tel_no, a.tel_ext,a.county, a.town, a.address_c
) t
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS; --Clean Query Cache
GO
SELECT count(*) FROM (
SELECT a.GourmetId ItemId, a.GourmetNameC ItemName, a.vendorUrl ExternalUrl, 'fnf-item.aspx?pid=' + a.GourmetId FunTaiwanUrl,
b.PhotoId,ISNULL(SUM(c.TotalHit),0) HitCount, CONVERT(VARCHAR,a.CreatedDate,111) CreatedDate,
'1' IsGourmet, a.ForSearchOnly, a.AreaCode + '-' + a.TelNo TelNo, a.TelExt, a.County, a.Town, a.AddressC , 'gourmet.png' IconName ,
GetDate() as UpdatedDate, 'NonorderScheduler' as UpdatedBy
FROM Gourmet a
LEFT JOIN Photos b ON b.FolderId=a.GourmetId AND b.MainPhoto=1
LEFT JOIN Counters c ON c.ItemId=a.GourmetId AND c.CounterName='FnfHit'
WHERE a.[Status]='A'
GROUP BY a.GourmetId, a.GourmetNameC, len(a.vendorUrl),a.vendorUrl, 'fnf-item.aspx?pid=' + a.GourmetId,b.PhotoId, CONVERT(VARCHAR,a.CreatedDate,111),
a.ForSearchOnly, a.AreaCode + '-' + a.TelNo, a.TelExt,a.County, a.Town, a.AddressC
) t
测试方法:
我从另一台笔记本电脑在DbVisualizer中执行sql语句. vmware主机中只有两个vmware guest虚拟机.
当我开始测试Postgresql时,Windows 7旗舰版不会在ubuntu服务器上运行前台软件,反之亦然.
在Windows 7旗舰版的MS sql Server 2008中使用相同的表模式,相同的记录和相同的sql语句,我发现结果与那些说Postgresql比MSsql2008R2更快的文章有很大的不同.
对于计算结果集大小,MSsql2008 R2花费0.016秒,Postgresql花费132秒.正如您所看到的,在MSsql2008R2上执行sql之前,我首先清理其查询缓存,但我没有清除Postgresql上的查询缓存.
我的问题是,在这种情况下,我应该如何调整Postgresql以便它可以比MSsql2008 R2更快地查询?
我知道有些人可能会要求我使用explain analyze命令调整sql语句.但是,基本上,我不想调整sql语句,除非服务器(OS Postgresql)参数调整到最佳状态,因为我计划将我公司的所有数据从MSsql2008R2移动到Postgresql.因此,无法为每个系统调整每个sql语句.
如果通过调整OS参数和Postgresql参数可以解决我的问题,我宁愿不在短时间内调整sql语句.你知道这将是一项巨大的工作.
谢谢您的帮助.
Attached the explain analyze result of PostgreSQL
HashAggregate (cost=305063.90..328086.20 rows=920892 width=126) (actual time=4595.501..4598.155 rows=4019 loops=1)
-> nested Loop Left Join (cost=450.60..272832.68 rows=920892 width=126) (actual time=4.304..2567.521 rows=1690393 loops=1)
-> Hash Right Join (cost=450.04..1230.97 rows=4132 width=118) (actual time=4.223..20.423 rows=4019 loops=1)
Hash Cond: ((b.folder_id)::text = (a.gourmet_id)::text)
-> Bitmap Heap Scan on photos b (cost=200.44..848.93 rows=12149 width=13) (actual time=0.896..5.782 rows=12166 loops=1)
Filter: main_photo
-> Bitmap Index Scan on _dta_index_photos_5_1925581898__k5_k2_k1 (cost=0.00..197.41 rows=12149 width=0) (actual time=0.840..0.840 rows=12166 loops=1)
Index Cond: (main_photo = true)
-> Hash (cost=199.36..199.36 rows=4019 width=114) (actual time=3.305..3.305 rows=4019 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 603kB
-> Seq Scan on gourmet a (cost=0.00..199.36 rows=4019 width=114) (actual time=0.012..1.768 rows=4019 loops=1)
Filter: ((status)::text = 'A'::text)
Rows Removed by Filter: 90
-> Index Scan using counters_pk on counters c (cost=0.56..60.72 rows=223 width=16) (actual time=0.029..0.132 rows=421 loops=4019)
Index Cond: (((item_id)::text = (a.gourmet_id)::text) AND ((counter_name)::text = 'FnfHit'::text))
Total runtime: 4600.750 ms
MSsql2008R2解释分析结果:
|--Compute Scalar(DEFINE:([Expr1013]='fnf-item.aspx?pid='+[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId], [Expr1014]=isnull([Expr1012],(0)), [Expr1015]='1', [Expr1016]='gourmet.png', [Expr1017]=getdate(), [Expr1018]='NonorderScheduler'))
|--Compute Scalar(DEFINE:([Expr1012]=CASE WHEN [Expr1027]=(0) THEN NULL ELSE [Expr1028] END))
|--Stream Aggregate(GROUP BY:([a].[GourmetId], [Expr1008], [Expr1009], [b].[PhotoId], [Expr1010], [Expr1011]) DEFINE:([Expr1027]=COUNT_BIG([SlowTravel].[dbo].[Counters].[TotalHit] as [c].[TotalHit]), [Expr1028]=SUM([SlowTravel].[dbo].[Counters].[TotalHit] as [c].[TotalHit]), [a].[GourmetNameC]=ANY([SlowTravel].[dbo].[Gourmet].[GourmetNameC] as [a].[GourmetNameC]), [a].[vendorUrl]=ANY([SlowTravel].[dbo].[Gourmet].[vendorUrl] as [a].[vendorUrl]), [a].[ForSearchOnly]=ANY([SlowTravel].[dbo].[Gourmet].[ForSearchOnly] as [a].[ForSearchOnly]), [a].[TelExt]=ANY([SlowTravel].[dbo].[Gourmet].[TelExt] as [a].[TelExt]), [a].[County]=ANY([SlowTravel].[dbo].[Gourmet].[County] as [a].[County]), [a].[Town]=ANY([SlowTravel].[dbo].[Gourmet].[Town] as [a].[Town]), [a].[AddressC]=ANY([SlowTravel].[dbo].[Gourmet].[AddressC] as [a].[AddressC])))
|--nested Loops(Left Outer Join, OUTER REFERENCES:([a].[GourmetId], [Expr1026]) WITH UnorDERED PREFETCH)
|--Merge Join(Left Outer Join, MERGE:([a].[GourmetId])=([b].[FolderId]), RESIDUAL:([SlowTravel].[dbo].[Photos].[FolderId] as [b].[FolderId]=[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId]))
| |--Sort(ORDER BY:([a].[GourmetId] ASC))
| | |--Compute Scalar(DEFINE:([Expr1008]=len([SlowTravel].[dbo].[Gourmet].[vendorUrl] as [a].[vendorUrl]), [Expr1009]='fnf-item.aspx?pid='+[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId], [Expr1010]=CONVERT(varchar(30),[SlowTravel].[dbo].[Gourmet].[CreatedDate] as [a].[CreatedDate],111), [Expr1011]=([SlowTravel].[dbo].[Gourmet].[AreaCode] as [a].[AreaCode]+'-')+[SlowTravel].[dbo].[Gourmet].[TelNo] as [a].[TelNo]))
| | |--Table Scan(OBJECT:([SlowTravel].[dbo].[Gourmet] AS [a]), WHERE:([SlowTravel].[dbo].[Gourmet].[Status] as [a].[Status]='A'))
| |--Index Seek(OBJECT:([SlowTravel].[dbo].[Photos].[_dta_index_Photos_5_1925581898__K5_K2_K1] AS [b]), SEEK:([b].[MainPhoto]=(1)) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([SlowTravel].[dbo].[Counters].[Counters_PK] AS [c]), SEEK:([c].[CounterName]='FnfHit' AND [c].[ItemId]=[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId]) ORDERED FORWARD)
如何避免在解释结果中排序? Actuall我没有在sql语句中提出任何问题.
谢谢您的帮助.
解决方法:
How do I avoid sort in the explain result? Actuall I didn’t ask any sort in the sql Statement.
您已要求汇总行.一种方法是对数据集进行排序,然后扫描它以折叠重复数据集.这可能比散列聚合更快,这是Postgresql知道如何进行分组的另一种方式.
因此,虽然您没有明确说出“对行进行排序”,但由于您要求的内容,它仍在对它们进行排序.
直接的问题是Postgresql对于它用于排序的内存量非常保守:
Sort Method: external merge disk: 317656kB
并且正在进行300MB的磁盘排序.如果你take a look at the plain on explain.depesz.com,你可以很清楚地看到.
如果你:
SET work_mem = '400MB';
不幸的是,改变配置并不简单,因为Postgresql对资源管理并不太聪明.根据有关work_mem的文档,每个会话最多可使用work_mem字节或每个会话加入.因此,如果你有max_connections = 50并且你正在运行复杂的查询,你可能会发现自己使用了许多千兆字节的工作内存,超出了可用内存,并且遇到了交换.你真的不想要的.
它似乎也在计数器上执行seqscan,但由于它发现大约1/4的行符合条件,这可能是正确的做法 – 索引扫描可能会更慢.
我发现默认的work_mem方式过于保守,并且倾向于在任何相当大的系统上将其设置为至少100MB.我也更喜欢在前面运行带有PgBouncer的Postgresql,以及一个低max_connections,允许我在每个单独的连接上投入更多资源.
坦率地说,我很想知道MS sql Server如何执行此操作,因为您报告的数字对于这样的查询来说是惊人的.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。