Exadata、Greenplum、Netezza配置大对比

Exadata vs Greenplum vs Netezza

 

Exadata实体图

Oracle-Sun  Exadata X2 数据库一体机实体图:

 

 

 

Exadata Smart Flash Logging新特性

从Exadata Storage Software 11.2.2.4开始引入了Exadata Smart Flash Logging的新特性,该特性允许LGWR进程将redo同时并行写入flash cache 和 disk controller 中, 只要在flash cache 和 disk controller中有任意一个率先写完成就会通知RDBMS数据库继续工作, 该特性着眼于改善了Exadata对Redo写入的响应时间和吞吐量。(The feature allows redo writes to be written to both flash cache and disk controller cache, with an acknowledgement sent to the RDBMS as soon as either of these writes complete; this improves response times and thoughput.);特别有利于改善log file sync等待事件是主要性能瓶颈的系统。

当频繁写redo重做日志的IO成为Exadata一体机的主要性能瓶颈的时候,Exadata开发部门自然想到了通过DBM上已有的flashcache来减少响应时间的办法。但是又要保证不会因为flashcache的引入而导致redo关键日志信息的丢失:

 

The main problem that number of writes in redo logs is very high, even there are no activity in database. Therefore Flash Cache disk will reach his write limit very fast – some days or months (I am not see exact test results). In this way you will lost flash cache disk and all data on them. But losing redo logs is very unpleasant case of database unavailability, which can lie in big downtime and possible data loss.
As you already know, 11.2.2.4.0 introduced the Smart Flash Log feature. For customers that are not in 11.2.2.4.0, don’t suggest putting the redo logs on a diskgroup that uses griddisks carved from flashdisks. There are different issues when using redo logs on the flashcache in previous versions and those should be avoided.

 

 

只要是安装过Exadata Storage Software 11.2.2.4补丁的系统都会隐式地启用该Exadata Smart Flash Logging特性,但是它同时也要求数据库版本要大于Database 11.2.0.2 Bundle Patch 11。

Metalink目前没有介绍如何在已经启用Exadata Smart Flash Logging的DBM上禁用(disable)该特性。

 

实际每个cell会分配512MB的flashcache用于Smart Flash Logging,因此现在每个cell的可用flash空间为 364.75Gb 。

 

不仅局限于Online Redo Log可以受益于Smart Flash Logging,Standby Redo Log 也可以从该特性中得到性能提升,前提是满足必要的软件版本组合cell patch 11.2.2.4 and Database 11.2.0.2 Bundle Patch 11 or greate。

可以通过CellCLI 命令行了解现有的Smart Flash Logging配置,若有输出则说明配置了Smart Flash Logging。

 

CellCLI> LIST FLASHLOG DETAIL

 

更多信息可以参考文档”Exadata Smart Flash Logging Explained”,引用如下:

 

Smart Flash Logging works as follows. When receiving a redo log write request, Exadata will do
parallel writes to the on-disk redo logs as well as a small amount of space reserved in the flash
hardware. When either of these writes has successfully completed the database will be
immediately notified of completion. If the disk drives hosting the logs experience slow response
times, then the Exadata Smart Flash Cache will provide a faster log write response time.
Conversely, if the Exadata Smart Flash Cache is temporarily experiencing slow response times
(e.g., due to wear leveling algorithms), then the disk drive will provide a faster response time.
Given the speed advantage the Exadata flash hardware has over disk drives, log writes should be
written to Exadata Smart Flash Cache, almost all of the time, resulting in very fast redo write
performance. This algorithm will significantly smooth out redo write response times and provide
overall better database performance.

The Exadata Smart Flash Cache is not used as a permanent store for redo data – it is just a
temporary store for the purpose of providing fast redo write response time. The Exadata Smart
Flash Cache is a cache for storing redo data until this data is safely written to disk. The Exadata
Storage Server comes with a substantial amount of flash storage. A small amount is allocated for
database logging and the remainder will be used for caching user data. The best practices and
configuration of redo log sizing, duplexing and mirroring do not change when using Exadata
Smart Flash Logging. Smart Flash Logging handles all crash and recovery scenarios without
requiring any additional or special administrator intervention beyond what would normally be
needed for recovery of the database from redo logs. From an end user perspective, the system
behaves in a completely transparent manner and the user need not be aware that flash is being
used as a temporary store for redo. The only behavioral difference will be consistently low
latencies for redo log writes.

By default, 512 MB of the Exadata flash is allocated to Smart Flash Logging. Relative to the 384
GB of flash in each Exadata cell this is an insignificant investment for a huge performance
benefit. This default allocation will be sufficient for most situations. Statistics are maintained to
indicate the number and frequency of redo writes serviced by flash and those that could not be
serviced, due to, for example, insufficient flash space being allocated for Smart Flash Logging.
For a database with a high redo generation rate, or when many databases are consolidated on to
one Exadata Database Machine, the size of the flash allocated to Smart Flash Logging may need
to be enlarged. In addition, for consolidated deployments, the Exadata I/O Resource Manager
(IORM) has been enhanced to enable or disable Smart Flash Logging for the different databases
running on the Database Machine, reserving flash for the most performance critical databases.

 

以及<Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine>官方白皮书,公开的文档地址

 

 

Download (PDF, 351KB)

Exadata X2-2 vs EMC Greenplum DCA vs Netezza TwinFin 12主要配置大对比

下图列出了Oracle Exadata X2-2 vs EMC Greenplum DCA vs Netezza TwinFin 12 三种一体机的主要配置对比:

 

Oracle Sun Exadata V2 ,X2-2,X2-8 主要配置对比

  v2 Full Rack x2-2 Full Rack x2-8 Full Rack
Database servers 8 x Sun Fire x4170 1U 8 x Sun Fire x4170 M2 1U 2 x Sun Fire x4800 5U
Database CPUs Xeon E5540 quad core 2.53GHz Xeon X5670 six cores 2.93GHz Xeon X7560 eight cores 2.26GHz
database cores 64 96 128
database RAM 576GB 768GB 2TB
Storage cells 14 x SunFire X4275 14 x SunFire X4270 M2 14 x SunFire X4270 M2
storage cell CPUs Xeon E5540 quad core 2.53GHz Xeon L5640 six cores 2.26GHz Xeon L5640 six cores 2.26GHz
storage cells CPU cores 112 168 168
IO performance & capacity 15K RPM 600GB SAS or 2TB SATA 7.2K RPM disks 15K RPM 600GB SAS (HP model – high performance) or 2TB SAS 7.2K RPM disks (HC model – high capacity) 15K RPM 600GB SAS (HP model – high performance) or 2TB SAS 7.2K RPM disks (HC model – high capacity)
Note that 2TB SAS are the same old 2 TB drives with new SAS electronics. (Thanks Kevin Closson for ref) Note that 2TB SAS are the same old 2 TB drives with new SAS electronics. (Thanks Kevin Closson for ref)
Flash Cache 5.3TB 5.3TB 5.3TB
Database Servers networking 4 x 1GbE x 8 servers = 32 x 1GbE 4 x 1GbE x 8 servers + 2 x 10GbE x 8 servers = 32 x 1Gb + 16 x 10GbEE 8 x 1GbE x 2 servers + 8 x 10GbE x 2 servers = 16 x 1Gb + 16 x 10GbEE
InfiniBand Switches QDR 40Gbit/s wire QDR 40Gbit/s wire QDR 40Gbit/s wire
InfiniBand ports on database servers (total) 2 ports x 8 servers = 16 ports 2 ports x 8 servers = 16 ports 8 ports x 2 servers = 16 ports
Database Servers OS Oracle Linux only Oracle Linux (possible Solaris later, still unclear) Oracle Linux or Solaris x86

Exadata Database Machine Host的操作系统OS版本

之前有同事问我Exadata用的是什么操作系统这个问题?

最早Oracle与HP合作的Exadata V1采用的是Oracle Enterprise Linux,而Oracle-Sun Exadata V2则目前还仅提供OEL,但是已经通过了Solaris -11 Express在 Exadata V2上的测试, 所以很快Exadata V2将会有Solaris的选择。

目前现有的Exadata X2-2 和 X2-8 绝大多数采用2个OEL 5的小版本:

较早出厂的使用OEL 5.3
# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.3 (Carthage)

近期出场的使用OEL 5.5

# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)

# uname -a
Linux vrh1.us.oracle.com 2.6.18-128.1.16.0.1.el5 #1 SMP Tue x86_64 x86_64 x86_64 GNU/Linux

 

The IB should be one of the compatible cards specified in Note 888828.1
If you build a backup server machine it is best tro build as close a clone of the Exadata Compute nodes as you can get.
I.e. install OEL 5 Update 5 and one of the IB cards specified in the note and you will have the correct ofed versions and kernel
This will guarantee interoperabilty and correct operation with the kernel and ofed drivers
From the doc
InfiniBand OFED Software
Exadata Storage Servers and database servers will interoperate with different InfiniBand OFED software versions, however, Oracle recommends that all versions be the same unless performing a rolling upgrade. Review Note 1262380.1 for database server software and firmware guidelines.

InfiniBand HCA
Exadata Storage Servers and database servers will interoperate with different InfiniBand host channel adapter (HCA) firmware versions, however, Oracle recommends that all versions be the same unless performing a rolling upgrade. Review Note 1262380.1 for database server software and firmware guidelines.

For a complete list of the Oracle QDR Infinband adaptors see here:

http://www.oracle.com/technetwork/documentation/oracle-net-sec-hw-190016.html#infinibandadp

For the compute nodes all firmware updates must be done via the bundle patches descibed in Doc 888828.1
So I would advise upgrading to the latest supported bundel patch.

For you backup server choose the same model card that came with the X2 compute nodes.
Install Oracle Eterprise Linux Release 5 Update 5
Upgrade the firmware to the same firmware an on the X2 or higher if not already the same,

Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions [ID 888828.1]

Exadata混合列压缩如何处理INSERT和UPDATE

Hybrid Columnar Compression混合列压缩是Exadata数据库一体机的核心功能之一,与普通的高级压缩特性(advanced compression)不同,Hybrid columnar compression (HCC) 仅仅在Exadata平台上可用。使用HCC的情况下数据压缩存放在CU(compression unit压缩单位中),一个CU单位包括多个数据库块,这是出于单数据块不利于以列值压缩算法的考量所决定的,当一个CU包含多个block时可以实现较优的列值压缩算法。

同时对于普通的INSERT/UPDATE操作,需要造成对行级数据的压缩降级,即在经历UPDATE/INSERT后原本HCC压缩的行可能变成普通高级压缩的水平。

 

hybrid columnar compression与数据仓库行为的批量初始化导入(bulk initial load)配合,直接路径导入(direct load)例如ALTER TABLE MOVE, IMPDP或直接路径插入(append INSERT),使用HCC的前提是这些数据将不会被频繁修改或从不被修改。

 

当你更新混合列压缩启动的表中的数据行时,相关整个的压缩单位CU中的数据将被锁住。 被更新的这些数据将不得不从原HCC压缩级别降级到例如无压缩或for OLTP压缩的水准。

 

我们来看以下例子:

 

 

SQL*Plus: Release 11.2.0.2.0 Production on Wed Sep 12 06:14:53 2012

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> grant dba to scott;

Grant succeeded.

SQL> conn scott/oracle
Connected.
SQL> 
SQL> create table hcc_maclean tablespace users compress for query high as select * from dba_objects;

Table created.

  1* select rowid,owner,object_name,dbms_rowid.rowid_block_number(rowid) from hcc_maclean where owner='MACLEAN'
SQL> /

ROWID                          OWNER                          OBJECT_NAME          DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------ ------------------------------ -------------------- ------------------------------------
AAAThuAAEAAAHTJAOI             MACLEAN                        SALES                                               29897
AAAThuAAEAAAHTJAOJ             MACLEAN                        MYCUSTOMERS                                         29897
AAAThuAAEAAAHTJAOK             MACLEAN                        MYCUST_ARCHIVE                                      29897
AAAThuAAEAAAHTJAOL             MACLEAN                        MYCUST_QUERY                                        29897
AAAThuAAEAAAHTJAOh             MACLEAN                        COMPRESS_QUERY                                      29897
AAAThuAAEAAAHTJAOi             MACLEAN                        UNCOMPRESS                                          29897
AAAThuAAEAAAHTJAOj             MACLEAN                        CHAINED_ROWS                                        29897
AAAThuAAEAAAHTJAOk             MACLEAN                        COMPRESS_QUERY1                                     29897

8 rows selected.

select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from hcc_maclean where owner='MACLEAN';

session A:

update hcc_maclean set OBJECT_NAME=OBJECT_NAME||'DBM' where rowid='AAAThuAAEAAAHTJAOI';

session B:

update hcc_maclean set OBJECT_NAME=OBJECT_NAME||'DBM' where rowid='AAAThuAAEAAAHTJAOJ';

SQL> select sid,wait_event_text,BLOCKER_SID from v$wait_chains;

       SID WAIT_EVENT_TEXT                                                  BLOCKER_SID
---------- ---------------------------------------------------------------- -----------
        13 enq: TX - row lock contention                                            136
       136 SQL*Net message from client

可以看到session A block B,这验证了HCC压缩后update row所在CU会造成整个CU被锁住的说法

SQL> alter system checkpoint;

System altered.

SQL> /     

System altered.

SQL> alter system dump datafile 4 block 29897
  2  ;

  Block header dump:  0x010074c9
 Object id on Block? Y
 seg/obj: 0x1386e  csc: 0x00.1cad7e  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x10074c8 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.001cabfa
0x02   0x000a.00a.00000430  0x00c051a7.0169.17  ----    1  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

avsp=0x14
tosp=0x14
        r0_9ir2=0x0
        mec_kdbh9ir2=0x0
                      76543210
        shcf_kdbh9ir2=----------
                  76543210
        flag_9ir2=--R-----      Archive compression: Y
                fcls_9ir2[0]={ }
0x16:pti[0]     nrow=1  offs=0
0x1a:pri[0]     offs=0x30
block_row_dump:
tab 0, row 0, @0x30
tl: 8016 fb: --H-F--N lb: 0x2  cc: 1          ==>整个CU指向ITL 0x02
nrid:  0x010074ca.0
col  0: [8004]
Compression level: 02 (Query High)
 Length of CU row: 8004
kdzhrh: ------PC CBLK: 1 Start Slot: 00
 NUMP: 01
 PNUM: 00 POFF: 7984 PRID: 0x010074ca.0
CU header:
CU version: 0   CU magic number: 0x4b445a30
CU checksum: 0xf8faf86e
CU total length: 8694
CU flags: NC-U-CRD-OP
ncols: 15
nrows: 995
algo: 0
CU decomp length: 8487   len/value length: 100111
row pieces per row: 1
num deleted rows: 1
deleted rows: 904,
START_CU:

 

 

我们可以使用如下方式衡量row的压缩情况:

 

 

SQL> select DBMS_COMPRESSION.GET_COMPRESSION_TYPE('SCOTT','HCC_MACLEAN','AAAThuAAEAAAHTJAOk') from dual;

DBMS_COMPRESSION.GET_COMPRESSION_TYPE('SCOTT','HCC_MACLEAN','AAATHUAAEAAAHTJAOK'
--------------------------------------------------------------------------------
                                                                               4

 

COMP_NOCOMPRESS CONSTANT NUMBER := 1;
COMP_FOR_OLTP CONSTANT NUMBER := 2;
COMP_FOR_QUERY_HIGH CONSTANT NUMBER := 4;
COMP_FOR_QUERY_LOW CONSTANT NUMBER := 8;
COMP_FOR_ARCHIVE_HIGH CONSTANT NUMBER := 16;
COMP_FOR_ARCHIVE_LOW CONSTANT NUMBER := 32;

COMP_RATIO_MINROWS CONSTANT NUMBER := 1000000;
COMP_RATIO_ALLROWS CONSTANT NUMBER := -1;

上表列出了压缩类型的常数值,例如COMP_FOR_QUERY_HIGH是4,COMP_FOR_QUERY_LOW 是8

这里我们从上述查询GET_COMPRESSION_TYPE指定rowid的情况下得到的是4说明该列以COMP_FOR_QUERY_HIGH形式压缩:



SQL>  update hcc_maclean set OBJECT_NAME=OBJECT_NAME||'DBM' where owner='MACLEAN';

8 rows updated.

SQL> commit;

Commit complete.




SQL>  select DBMS_COMPRESSION.GET_COMPRESSION_TYPE('SCOTT','HCC_MACLEAN',rowid) from HCC_MACLEAN where owner='MACLEAN';

DBMS_COMPRESSION.GET_COMPRESSION_TYPE('SCOTT','HCC_MACLEAN',ROWID)
------------------------------------------------------------------
                                                                 1
                                                                 1
                                                                 1
                                                                 1
                                                                 1
                                                                 1
                                                                 1
                                                                 1

8 rows selected.

以上更新一定量数据后可以看到COMPRESSION_TYPE由COMP_FOR_QUERY_HIGH降级为COMP_NOCOMPRESS,这说明这张表虽然compress for query high但部分数据在更新后实际不再被压缩。

在11g中这些非压缩态复萌的数据行不会自动升级成HCC状态。必要的时候手动作 ALTER TABLE MOVE或在线重定义以便将非压缩态的数据转换回HCC状态。



SQL>  ALTER TABLE hcc_MACLEAN move COMPRESS FOR ARCHIVE HIGH;

Table altered.

SQL> select DBMS_COMPRESSION.GET_COMPRESSION_TYPE('SCOTT','HCC_MACLEAN',rowid) from HCC_MACLEAN where owner='MACLEAN';

DBMS_COMPRESSION.GET_COMPRESSION_TYPE('SCOTT','HCC_MACLEAN',ROWID)
------------------------------------------------------------------
                                                                16
                                                                16
                                                                16
                                                                16
                                                                16
                                                                16
                                                                16
                                                                16

8 rows selected.

数据仓库:Oracle Exadata和Netezza的比较

虽然都是王婆卖瓜,但胜在他山之石可以攻玉,了解一下Exadata 之外的 一体机选择。

 

oracle_exadata_netezzaTwinfin_compared

 

 

沪公网安备 31010802001379号

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569