sql-server – Ubuntu Server 12.04 v.s.上的PostgreSQL 9.3 Windows 7 Ultima上的MS SQL Server 2008 R2

我在Google上搜索了很多关于Postgresql与MS sql 2008 R2性能的文章,发现很多文章都说Postgresql的性能优于MSsql 2008 R2.所以,我用我公司的真实案例进行了真正的测试.

硬件：
我在ESXi 5.5中构建了两个vmware guest虚拟机,每个guest虚拟机都具有相同的虚拟硬件规范,其中包括(我只列出了关键项)：

>内存：4GB
> cpu：一个带有两个内核的虚拟插槽
>虚拟磁盘：ubuntu服务器为50GB,Windows 7为200GB(如您所知,Microsoft总是占用更多存储空间)

VM Host(单个盒子)：

> RAM：16GB,带有超线程的intel-i5 2核心
>高清：1TGB 2.5“

软件：

> Windows 7 Ultima MS sql Server 2008 R2,都具有默认参数
> Ubuntu Linux Server 12.04 Postgresql 9.3,都带有默认参数
>数据库工具：Db Visualizer 9.0.9(在另一台笔记本电脑上)
> JDBC：

> Postgresql：postgresql-9.3-1101.jdbc4
> MSsql 2008R2：sqljdbc_4.0.2206.100_cht.tar

表模式(Postgresql和MSsql 2008 R2上相同,命名约定除外)

我有三个表格如下：

CREATE TABLE gourmet (
  vendor_url CHaraCTER varying(200),
  ticket_price CHaraCTER varying(8000),
  tour_advice CHaraCTER varying(8000),
  gourmet_name_c CHaraCTER varying(180),
  tel_ext CHaraCTER varying(5),
  gourmet_name_e CHaraCTER varying(60),
  arrangement CHaraCTER varying(8000),
  data_source CHaraCTER varying(200),
  open_time CHaraCTER varying(8000),
  vendor_name CHaraCTER varying(60),
  gps_x CHaraCTER varying(20),
  gourmet_id CHaraCTER varying(36) NOT NULL,
  gps_y CHaraCTER varying(20),
  updated_by CHaraCTER varying(20),
  recom_level DOUBLE PRECISION,
  fax_num CHaraCTER varying(10),
  address_c CHaraCTER varying(240),
  fax_area_code CHaraCTER varying(4),
  long_desc CHaraCTER varying(8000),
  for_search_only BOOLEAN,
  status CHaraCTER varying(1),
  category CHaraCTER varying(3),
  area_code CHaraCTER varying(4),
  county CHaraCTER varying(3),
  email CHaraCTER varying(40),
  updated_date TIMESTAMP(6) WITHOUT TIME ZONE,
  data_source_url CHaraCTER varying(200),
  tour_duration CHaraCTER varying(400),
  short_name CHaraCTER varying(180),
  created_by CHaraCTER varying(20),
  created_date TIMESTAMP(6) WITHOUT TIME ZONE,
  town CHaraCTER varying(3),
  address_e CHaraCTER varying(80),
  parking_info CHaraCTER varying(8000),
  remark CHaraCTER varying(8000),
  location_info CHaraCTER varying(8000),
  tel_no CHaraCTER varying(10),
  CONSTRAINT gourmet_pk PRIMARY KEY (gourmet_id)
);

CREATE INDEX _dta_index_gourmet_5_574625090__k9_k4_k1_k10_k31_k2_11_14_15_33 ON gourmet (
  county,
  status,
  gourmet_id,
  town,
  vendor_url,
  gourmet_name_c
);

CREATE INDEX  _dta_index_gourmet_7_1796201449__k9_k1_k2 ON   gourmet
(
  county,
  gourmet_id,
  gourmet_name_c
);

CREATE INDEX idx_grourmet_status ON gourmet (
  status
);
=======================================================

CREATE TABLE photos (
  source CHaraCTER varying(100),
  caption CHaraCTER varying(100),
  display_order INTEGER,
  main_photo BOOLEAN,
  bytes_original BYTEA,
  folder_id CHaraCTER varying(36),
  photo_id INTEGER DEFAULT nextval('photos_photo_id_seq'::regclass) NOT NULL,
  bytes_thumb BYTEA,
  CONSTRAINT photos_pk PRIMARY KEY (photo_id)
);

CREATE INDEX  _dta_index_photos_5_1925581898__k2_k5_1 ON  photos  (
  folder_id,
  main_photo
);

CREATE INDEX idx_photos_folder_id ON photos (
  folder_id
);

CREATE INDEX _dta_index_photos_5_1925581898__k5_k2_k1 ON  photos  (
  main_photo,
  folder_id,
  photo_id
);

====================================================

CREATE TABLE counters (
  item_id CHaraCTER varying(36) NOT NULL,
  counter_name CHaraCTER varying(30) NOT NULL,
  target_date CHaraCTER(10) NOT NULL,
  total_hit BIGINT,
  CONSTRAINT counters_pk PRIMARY KEY (item_id, counter_name, target_date)
);

表美食中有4108条记录,表格中有37451条记录,表格中有11659353条记录.

为了避免结果集的传输消耗时间,所以我只计算记录号.

Postgresql中使用的sql语句：

SELECT count(*) FROM (
  SELECT        a.gourmet_id item_id, a.gourmet_name_c item_name, a.vendor_url external_url,  'fnf-item.aspx?pid=' || a.gourmet_id fun_taiwan_url,                    
        b.photo_id,coalesce(SUM(c.total_hit),0) hit_count,to_char(a.created_date, 'YYYY/MM/DD HH24:MI:SS') created_date,                                                  
        '1' is_gourmet, a.for_search_only, a.area_code || '-' || a.tel_no tel_no, a.tel_ext, a.county, a.town, a.address_c , 'gourmet.png' icon_name    ,            
        Now() as updated_date, 'NonorderScheduler' as updated_by                                                                                                                                         
  FROM      gourmet a                                                                                                                                                                                          
  LEFT JOIN photos b ON b.folder_id=a.gourmet_id AND b.main_photo='1'                                                                                       
  LEFT JOIN counters c ON c.item_id=a.gourmet_id AND c.counter_name='FnfHit'                                                                              
  WHERE     a.status='A'                                                                                                                               
  GROUP BY  a.gourmet_id, a.gourmet_name_c, length(a.vendor_url),a.vendor_url, 'fnf-item.aspx?pid=' || a.gourmet_id,b.photo_id, to_char(a.created_date,'YYYY/MM/DD HH24:MI:SS'),
        a.for_search_only, a.area_code || '-' || a.tel_no, a.tel_ext,a.county, a.town, a.address_c
) t

MSsql 2008 R2中使用的sql：

CHECKPOINT; 
GO 
DBCC DROPCLEANBUFFERS; --Clean Query Cache
GO
SELECT count(*) FROM (
  SELECT        a.GourmetId ItemId, a.GourmetNameC ItemName, a.vendorUrl ExternalUrl,  'fnf-item.aspx?pid=' + a.GourmetId FunTaiwanUrl,                    
        b.PhotoId,ISNULL(SUM(c.TotalHit),0) HitCount, CONVERT(VARCHAR,a.CreatedDate,111) CreatedDate,                                                  
        '1' IsGourmet, a.ForSearchOnly, a.AreaCode + '-' + a.TelNo TelNo, a.TelExt, a.County, a.Town, a.AddressC , 'gourmet.png' IconName   ,            
        GetDate() as UpdatedDate, 'NonorderScheduler' as UpdatedBy                                                                                                                                       
  FROM      Gourmet a                                                                                                                                                                                          
  LEFT JOIN Photos b ON b.FolderId=a.GourmetId AND b.MainPhoto=1                                                                                       
  LEFT JOIN Counters c ON c.ItemId=a.GourmetId AND c.CounterName='FnfHit'                                                                              
  WHERE     a.[Status]='A'                                                                                                                               
  GROUP BY  a.GourmetId, a.GourmetNameC, len(a.vendorUrl),a.vendorUrl, 'fnf-item.aspx?pid=' + a.GourmetId,b.PhotoId,    CONVERT(VARCHAR,a.CreatedDate,111),
        a.ForSearchOnly, a.AreaCode + '-' + a.TelNo, a.TelExt,a.County, a.Town, a.AddressC 
) t

实际上,除了语法和函数之外,两个sql语句是相同的.

测试方法：

我从另一台笔记本电脑在DbVisualizer中执行sql语句. vmware主机中只有两个vmware guest虚拟机.
当我开始测试Postgresql时,Windows 7旗舰版不会在ubuntu服务器上运行前台软件,反之亦然.

在Windows 7旗舰版的MS sql Server 2008中使用相同的表模式,相同的记录和相同的sql语句,我发现结果与那些说Postgresql比MSsql2008R2更快的文章有很大的不同.

对于计算结果集大小,MSsql2008 R2花费0.016秒,Postgresql花费132秒.正如您所看到的,在MSsql2008R2上执行sql之前,我首先清理其查询缓存,但我没有清除Postgresql上的查询缓存.

我的问题是,在这种情况下,我应该如何调整Postgresql以便它可以比MSsql2008 R2更快地查询？

我知道有些人可能会要求我使用explain analyze命令调整sql语句.但是,基本上,我不想调整sql语句,除非服务器(OS Postgresql)参数调整到最佳状态,因为我计划将我公司的所有数据从MSsql2008R2移动到Postgresql.因此,无法为每个系统调整每个sql语句.

如果通过调整OS参数和Postgresql参数可以解决我的问题,我宁愿不在短时间内调整sql语句.你知道这将是一项巨大的工作.

谢谢您的帮助.

Attached the explain analyze result of PostgreSQL

    HashAggregate  (cost=305063.90..328086.20 rows=920892 width=126) (actual time=4595.501..4598.155 rows=4019 loops=1)
      ->  nested Loop Left Join  (cost=450.60..272832.68 rows=920892 width=126) (actual time=4.304..2567.521 rows=1690393 loops=1)
            ->  Hash Right Join  (cost=450.04..1230.97 rows=4132 width=118) (actual time=4.223..20.423 rows=4019 loops=1)
                  Hash Cond: ((b.folder_id)::text = (a.gourmet_id)::text)
                  ->  Bitmap Heap Scan on photos b  (cost=200.44..848.93 rows=12149 width=13) (actual time=0.896..5.782 rows=12166 loops=1)
                        Filter: main_photo
                        ->  Bitmap Index Scan on _dta_index_photos_5_1925581898__k5_k2_k1  (cost=0.00..197.41 rows=12149 width=0) (actual time=0.840..0.840 rows=12166 loops=1)
                              Index Cond: (main_photo = true)
                  ->  Hash  (cost=199.36..199.36 rows=4019 width=114) (actual time=3.305..3.305 rows=4019 loops=1)
                        Buckets: 1024  Batches: 1  Memory Usage: 603kB
                        ->  Seq Scan on gourmet a  (cost=0.00..199.36 rows=4019 width=114) (actual time=0.012..1.768 rows=4019 loops=1)
                              Filter: ((status)::text = 'A'::text)
                              Rows Removed by Filter: 90
            ->  Index Scan using counters_pk on counters c  (cost=0.56..60.72 rows=223 width=16) (actual time=0.029..0.132 rows=421 loops=4019)
                  Index Cond: (((item_id)::text = (a.gourmet_id)::text) AND ((counter_name)::text = 'FnfHit'::text))
    Total runtime: 4600.750 ms

MSsql2008R2解释分析结果：

  |--Compute Scalar(DEFINE:([Expr1013]='fnf-item.aspx?pid='+[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId], [Expr1014]=isnull([Expr1012],(0)), [Expr1015]='1', [Expr1016]='gourmet.png', [Expr1017]=getdate(), [Expr1018]='NonorderScheduler'))
       |--Compute Scalar(DEFINE:([Expr1012]=CASE WHEN [Expr1027]=(0) THEN NULL ELSE [Expr1028] END))
            |--Stream Aggregate(GROUP BY:([a].[GourmetId], [Expr1008], [Expr1009], [b].[PhotoId], [Expr1010], [Expr1011]) DEFINE:([Expr1027]=COUNT_BIG([SlowTravel].[dbo].[Counters].[TotalHit] as [c].[TotalHit]), [Expr1028]=SUM([SlowTravel].[dbo].[Counters].[TotalHit] as [c].[TotalHit]), [a].[GourmetNameC]=ANY([SlowTravel].[dbo].[Gourmet].[GourmetNameC] as [a].[GourmetNameC]), [a].[vendorUrl]=ANY([SlowTravel].[dbo].[Gourmet].[vendorUrl] as [a].[vendorUrl]), [a].[ForSearchOnly]=ANY([SlowTravel].[dbo].[Gourmet].[ForSearchOnly] as [a].[ForSearchOnly]), [a].[TelExt]=ANY([SlowTravel].[dbo].[Gourmet].[TelExt] as [a].[TelExt]), [a].[County]=ANY([SlowTravel].[dbo].[Gourmet].[County] as [a].[County]), [a].[Town]=ANY([SlowTravel].[dbo].[Gourmet].[Town] as [a].[Town]), [a].[AddressC]=ANY([SlowTravel].[dbo].[Gourmet].[AddressC] as [a].[AddressC])))
                 |--nested Loops(Left Outer Join, OUTER REFERENCES:([a].[GourmetId], [Expr1026]) WITH UnorDERED PREFETCH)
                      |--Merge Join(Left Outer Join, MERGE:([a].[GourmetId])=([b].[FolderId]), RESIDUAL:([SlowTravel].[dbo].[Photos].[FolderId] as [b].[FolderId]=[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId]))
                      |    |--Sort(ORDER BY:([a].[GourmetId] ASC))
                      |    |    |--Compute Scalar(DEFINE:([Expr1008]=len([SlowTravel].[dbo].[Gourmet].[vendorUrl] as [a].[vendorUrl]), [Expr1009]='fnf-item.aspx?pid='+[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId], [Expr1010]=CONVERT(varchar(30),[SlowTravel].[dbo].[Gourmet].[CreatedDate] as [a].[CreatedDate],111), [Expr1011]=([SlowTravel].[dbo].[Gourmet].[AreaCode] as [a].[AreaCode]+'-')+[SlowTravel].[dbo].[Gourmet].[TelNo] as [a].[TelNo]))
                      |    |         |--Table Scan(OBJECT:([SlowTravel].[dbo].[Gourmet] AS [a]), WHERE:([SlowTravel].[dbo].[Gourmet].[Status] as [a].[Status]='A'))
                      |    |--Index Seek(OBJECT:([SlowTravel].[dbo].[Photos].[_dta_index_Photos_5_1925581898__K5_K2_K1] AS [b]), SEEK:([b].[MainPhoto]=(1)) ORDERED FORWARD)
                      |--Clustered Index Seek(OBJECT:([SlowTravel].[dbo].[Counters].[Counters_PK] AS [c]), SEEK:([c].[CounterName]='FnfHit' AND [c].[ItemId]=[SlowTravel].[dbo].[Gourmet].[GourmetId] as [a].[GourmetId]) ORDERED FORWARD)

如何避免在解释结果中排序？ Actuall我没有在sql语句中提出任何问题.
谢谢您的帮助.

解决方法:

How do I avoid sort in the explain result? Actuall I didn’t ask any sort in the sql Statement.

您已要求汇总行.一种方法是对数据集进行排序,然后扫描它以折叠重复数据集.这可能比散列聚合更快,这是Postgresql知道如何进行分组的另一种方式.

因此,虽然您没有明确说出“对行进行排序”,但由于您要求的内容,它仍在对它们进行排序.

直接的问题是Postgresql对于它用于排序的内存量非常保守：

Sort Method: external merge disk: 317656kB

并且正在进行300MB的磁盘排序.如果你take a look at the plain on explain.depesz.com,你可以很清楚地看到.

如果你：

SET work_mem = '400MB';

在运行查询之前,它应该是一个完全不同的.

不幸的是,改变配置并不简单,因为Postgresql对资源管理并不太聪明.根据有关work_mem的文档,每个会话最多可使用work_mem字节或每个会话加入.因此,如果你有max_connections = 50并且你正在运行复杂的查询,你可能会发现自己使用了许多千兆字节的工作内存,超出了可用内存,并且遇到了交换.你真的不想要的.

它似乎也在计数器上执行seqscan,但由于它发现大约1/4的行符合条件,这可能是正确的做法 – 索引扫描可能会更慢.

我发现默认的work_mem方式过于保守,并且倾向于在任何相当大的系统上将其设置为至少100MB.我也更喜欢在前面运行带有PgBouncer的Postgresql,以及一个低max_connections,允许我在每个单独的连接上投入更多资源.

坦率地说,我很想知道MS sql Server如何执行此操作,因为您报告的数字对于这样的查询来说是惊人的.

sql-server – Ubuntu Server 12.04 v.s.上的PostgreSQL 9.3 Windows 7 Ultima上的MS SQL Server 2008 R2

相关推荐