Exadata Cell 隐含参数的获取方式与反向负载转移(reverse offloading)

原文链接:http://www.dbaleet.org/get_exadata_cell_hidden_parameter_and_exadata_reverse_offloading/

Exadata一体机的offloading有一个很重要的功能就是db节点和cell节点之间的负载可以互相感知。

例如数据库节点大量使用smart scan,smart scan把负载offload到cell一端,在某些情况下,可能导致cell节点的cpu负载过高,甚至过载的情况。相反db节点由于把负载offload到cell节点反而比较空闲。

在这种情况下reverse offloading就派上用场了,cell如果发现cpu的占用率过高,这个时候会将一部分原本使用smart scan的查询不使用smart I/O过滤, 而是直接使用普通db的block I/O返回给db节点以缓解cell节点CPU的负载,等到cell的负载降下来,再次smart scan, 这个过程就叫做reverse offloading。当然如果此时发现db节点的CPU也很高,那么cell就不返回block I/O的数据块。整个这个过程对于数据来说是透明的。

所以在某些情况下,不要相信Exadata不需要索引的传言,即使所有的扫描都是smart scan!
也并不是说只要可能走smart scan,就一定非要强制其进行smart scan。如果smart scan过度很可能导致cell节点的cpu非常高,这个时候建立合适的索引不适用smart scan未尝不是明智之举。

有几个专有的统计信息叫做“cell physical IO bytes sent directly to DB node to balance CPU” 来标记reverse offloading这个过程。

如下:

SQL> select name, value from v$sysstat where name like'%balance%';

NAME VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes sent directly to DB node to balance CPU 0

这项统计信息的描述是:

The number of I/O bytes sent back to the database server for processing due to CPU usage on Oracle Exadata Storage Server.

也就是说cell因自身cpu占用过高使用普通block I/O扫描返回给db节点的I/O字节数。

那么这个过程是通过什么参数来控制的呢?遗憾的是我虽然知道有这个特性但是却一时也记不起来了,我只记得这是一个cell端设置的隐含参数。如果能把所有的cell的隐含参数摆在我面前,我一定能够一眼认出它来。遗憾的是,我并没有一份包含所有的cell的隐含参数的列表,也不知道如何获取这个信息。至少目前我并不知道通过什么命令或者视图可以查询到cell端使用的所有隐含参数信息。

经过一段时间的思考以后。。。

我想既然cell端的systemstate是dump cell端存储软件的内存信息的,cellsrv中应该包含当前所有参数的信息。

于是尝试进行cell端的systemsate dump,方法我在以前的文章如何在Exadata的cell节点做systemstate dump中有提到过。

[root@dm01cel01 ~]# cellcli
CellCLI: Release 12.1.1.1.0 - Production on Tue Oct 08 02:24:45 MDT 2013

Copyright (c) 2007, 2013, Oracle. All rights reserved.
Cell Efficiency Ratio: 589

CellCLI> alter cell events="immediate cellsrv.cellsrv_statedump(0,0)"
Dump sequence #2 has been written to /opt/oracle/cell/log/diag/asm/cell/dm01cel01/trace/svtrc_27579_59.trc
Cell dm01cel01 successfully altered

 

打开/opt/oracle/cell/log/diag/asm/cell/dm01cel01/trace/svtrc_27579_59.trc,搜索parameter就能得到一份当前cell隐含参数的列表(令人遗憾的是暂时无法获取这些隐含参数的描述信息),如下所示:

+ Parameters for process at dump time

Dumping configuration parameter values
Unable to lookup value for parameter local_ipaddresses
ipaddress1 = 192.168.10.11/22 (default = NULL)
Unable to lookup value for parameter ipaddress2
Unable to lookup value for parameter ipaddress3
Unable to lookup value for parameter ipaddress4
version = 0.0 (default = )
_cell_max_pll_pred_writes = 36
_cell_pred_writes_autotune_enabled = TRUE
_cell_max_pll_pred_reads = 36
_cell_pred_reads_autotune_enabled = TRUE
_cell_max_flash_largeios = 48
_cell_num_threads_in_short_wait = 40
_cell_max_pll_pred_filters = 24 (default = 0)
_cell_pred_filters_autotune_enabled = TRUE
_cell_pred_filter_max_iosize = 1073741824
_cell_num_threads = 100
_cell_num_buffers = 5000
_cell_num_1mb_buffers = 8000 (default = 0)
_cell_num_1mb_bwr_buffers = 180
_cell_num_1mb_brr_buffers = 180
_cell_max_dynbufs_memsize = 3072 (default = 0)
_cell_listener_port = 5042
_cell_listener_backlog = 1000
_cell_listener_pll_jobs = 23
_cell_listener_req_batch = 100
_cell_num_0_byte_recv_ports = 4
_ms_cell_ioctl_timeout = 30000
_cell_iorm_test_mode = FALSE
_cell_iorm_perf_stats = FALSE
_cell_iorm_wl_mode = 0
_cell_iorm_hipri_alloc = 0
_cell_iorm_medpri_alloc = 0
_cell_iorm_lowpri_alloc = 0
_cell_iorm_asm_alloc = 0
_cell_iorm_lutil_limit = 0
_cell_iorm_hints_enabled = FALSE
_iorm_hint0 = -1
_iorm_priority0 = -1
_iorm_hint1 = -1
_iorm_priority1 = -1
_iorm_hint2 = -1
_iorm_priority2 = -1
_iorm_hint3 = -1
_iorm_priority3 = -1
_iorm_hint4 = -1
_iorm_priority4 = -1
_iorm_hint5 = -1
_iorm_priority5 = -1
_iorm_hint6 = -1
_iorm_priority6 = -1
_iorm_hint7 = -1
_iorm_priority7 = -1
_cell_iorm_pri_catidx = -1
_cell_iorm_pri_dbidx = -1
_cell_iorm_pri_cgidx = -1
_cell_iorm_enable = TRUE
_cell_iorm_max_io = 0
_cell_iorm_max_lio = 0
_cell_iorm_conc_writes = 0
_cell_iorm_deadline = 0
_cell_iorm_fake_dbs = 0
_cell_hard_disable = FALSE
_cell_raise_softassert_on_harderr = FALSE
_cell_enable_ossnet_checksum = FALSE
_cell_enable_skgxp_stats = TRUE
_skgxp_udp_use_tcb = TRUE
_skgxp_udp_use_tcb_client = TRUE
_cell_memory_tracing = TRUE
_cell_dmpsga_enabled = FALSE
_cell_enable_dynamic_credits = TRUE
_cell_num_ios_per_predjob = 10
_cell_num_pred_flashio_corrupt_retries = 1000
_cell_pred_dump_disk_onclose = FALSE
_cell_pred_polling_ctl_enabled = TRUE
_cell_pred_sim_block_byteord_conv = FALSE
_cell_max_kuty_failure_diagnostics = 0
_cell_print_all_params = FALSE
_cell_pred_disable_destbuf_refill = FALSE
_cell_smartio_passthru_enabled = FALSE
_cell_pred_no_predio_limit = FALSE
_cell_pred_enable_io_buffer_eviction = TRUE
_cell_pred_enable_dest_buffer_eviction = TRUE
_cell_pred_enable_flashio = TRUE
_cell_snapshot_bufsize = 1
_cell_snapshot_interval = 100
_cell_gen_time_stats_level = 1
_cell_gen_time_stats_timer_level = 0
_cell_force_split_gdisk = FALSE
_cell_testlevel = 0
_cell_max_receive_buffers_per_port = 600
_cell_num_8k_buffers = 10000
_cell_num_16k_buffers = 5000
_cell_num_32k_buffers = 5000
_cell_num_64k_buffers = 5000
_cell_max_receive_buffers_8k_port = 1000
_cell_max_receive_buffers_1mb_port = 50
_cell_crash_on_error = 0
_cell_crash_on_error_skip_n = 0
_cell_1mb_buffers_hugepage_support = TRUE
_skgxp_udp_interface_detection_time_secs = 4
_skgxp_gen_ant_ping_misscount = 2
_disable_diskmon_tcp_monitor = FALSE
_disable_diskmon_subnet_manager_query = FALSE
_skgxp_min_zcpy_len = 2147483647
_skgxp_min_rpc_rcv_zcpy_len = 2147483647
_skgxp_zcpy_flags = 2147483647
_skgxp_ctx_flags1 = 0
_skgxp_ctx_flags1mask = 0
_skgxp_dynamic_protocol = 0
_skgxp_inets = 0
_skgxpg_last_parameter = 27
_skgxp_ant_options = 0
_libcell_enable_libcell_interrupts = 1
_cell_rcvport_hist_size = 0
_skgxp_gen_rpc_no_path_check_in_sec = 1
_skgxp_gen_rpc_timeout_in_sec = 300
_skgxp_gen_ant_off_rpc_timeout_in_sec = 30
_reconnect_to_cell_freq_in_sec = 2
_reconnect_to_cell_attempts = 7
_disconnect_to_cell_attempts = 2
_reconnect_controls_reset_interval = 60
_dskm_disable_reconnect_to_cell = FALSE
_cell_disable_resource_leak_check = FALSE
_cell_disable_ant_check_reid = FALSE
_cell_disable_proactive_drop = FALSE
_cell_server_event = 
_cell_client_event = 
_cell_reserve_hugepage_memory_mb = 24
_cell_tolerates_max_backward_drift_microsecs = 300000
_cell_num_sched_log_entries = 8192
_cell_storage_index_columns = 8 (default = 0)
_cell_storage_index_partial_reads_threshold_percent = 85
_cell_storage_index_partial_rd_sectors = 512
_cell_enable_storage_index_for_loads = TRUE
_cell_enable_storage_index_for_writes = TRUE
_cell_storage_index_diag_mode = 0
_cell_storage_index_sizing_factor = 2
_cell_pred_max_smartio_sessions = 3820 (default = 0)
_cell_pred_max_con_ccfilter = 23 (default = 14)
_cell_pred_max_con_filters = 0
_cell_pred_num_ios_toissue_keptobj = 2
_cell_pred_max_cus_per_filter = 1
_cell_load_timezone_during_boot = TRUE
_cell_sendport_private_rqh_pool_size = 10
_cell_sendport_global_rqh_num_pools = 512
_cell_sendport_global_rqh_pool_maxincr = 150
_cell_capability_version = 0
_cell_iolat_stats_disable = FALSE
_cell_pred_mapelem_split_size = -1
_cell_perf_flags = 0
_cell_enable_sbuf_check = FALSE
_cell_disable_crash_dump_enhancement = FALSE
_cell_buffer_expiration_hours = 48
_cell_object_expiration_hours = 24
_cell_pred_max_num_outstanding_ios = 1000
_cell_mutex_stats = 0
_cell_port_activity_threshold = 300000
_cell_ant_port_activity_threshold = 1800000
_cell_ant_port_noopen_threshold = 60000
_cell_in_lrg_testing = FALSE
_cell_io_hang_reboot = TRUE
_cell_iohang_wtfc_reboot = FALSE
_cell_io_hang_time = 90
_cell_io_hang_kill_time = 95
_cell_io_hang_disable = FALSE
_cell_write_simulate_hard_error_freq = 0
_cell_assert_on_flash_data_corruption = 0
_cell_memalloc_analysis = disabled
_cell_flashcache_diag_reads_frequency = 0
_cell_read_flash_data_verif_level = 3
_cell_read_flash_gdisk_verif_level = 3
_cell_max_retry_on_read_flash_gdisk_verif_err = 2
_cell_enable_read_verif_on_these_gdisks = 
_cell_enable_read_verif_on_gdisk_first_N_MB = -1
_cell_flash_cache_sanity_checking = 0
_cellrsdef_fast_restart = 1
_cell_max_memory = 22355 (default = 0)
_cell_max_connections = 1500 (default = 0)
_cell_sga_lowmem_threshold_size = 1024 (default = 0)
_cell_nomem_threshold_enabled = TRUE
_cell_sga_lowmem_threshold_enabled = TRUE
_cell_disable_heap_summary = FALSE
_cell_disable_flashcache_hung_io_handling = FALSE
_cell_flashcache_aging_writes_enabled = TRUE
_cell_flashcache_lru_max_hot_percent = 50
_cell_flashcache_max_FDOM_outst_ios = 70
_cell_populate_flash_max_FDOM_outst_wr_ios = 100
_cell_auto_close_fd_interval = 120
_cell_dump_sga_on_oom_exception = FALSE
_cell_quarantine_manager_disabled = FALSE
_cell_qm_disable_sql_step_quarantine = FALSE
_cell_qm_disable_disk_region_quarantine = FALSE
_cell_qm_db_quarantine_threshold = 3
_cell_qm_offload_quarantine_threshold = 3
_cell_thread_max_trace_file_size = -1
_cell_redolog_fast_ack = FALSE
_cell_max_tsld_hd_svctm_ms = 500
_cell_max_tsld_fd_svctm_ms = 100
_cell_max_rltv_svctm_ratio = 10
_cell_svctm_ratio_wt = 100
_cell_err_num_wt = 20
_cell_iopoor_perf_disable = FALSE
_cell_disable_flashcache_db_blk_chksum = FALSE
_cell_disable_flash_gdisk_db_blk_chksum = FALSE
_cell_auto_dump_errstack = TRUE
_cell_perf_max_hd_proa_fail = 0
_cell_perf_max_fd_proa_fail = 16
_cell_max_hd_hung_reboot = 2
_cell_max_fd_hung_reboot = 9
_cell_si_max_num_diag_mode_dumps = 20
_cell_fc_persistence_max_io_retry = 4
_cell_fc_persistence_state = 0
_cell_fc_md_shadow_paging_enabled = FALSE
_cell_fc_bootstrap_timeout = 20000000
_cell_fc_cache_mirror_writes = 1
_cell_fc_dw_batch_size = 1
_cell_simulate_railroad_crashes = FALSE
_cell_qm_max_simulated_railroad_crashes = 2
_cell_latency_warning_threshold = 
_cell_latency_threshold_check_interval = 360000
_cell_latency_threshold_print_warning = FALSE
_cell_si_expensive_debug_tracing = FALSE
_cell_poor_perf_schedule_time = 5
_cell_iohang_schedule_time = 1
_cell_io_hang_drop_flash = TRUE
_cell_io_hang_drop_hard = FALSE
_cell_assert_unsafe_allocmem = FALSE
_cell_fplib_fix_control = 0
Unable to lookup value for parameter _lost_cache_detect
_cell_num_vers_check_fail_messages = 0
_cell_qm_db_quarantine_time_threshold = 86400
_cell_flashlog_flags = 0
_cell_flashlog_max_active_table_size = 1024
_cell_secure_erase_power = 5
_cell_mpp_cpu_freq = 2
_ms_listener_port = 5043
_cell_mpp_threshold = 90
_cell_mpp_max_pushback = 50
_si_write_diag_disable = FALSE
_cell_max_cellsrvstat_sessions = 3
_cell_si_diag_mode_force = FALSE
_cell_tracefile_max_size = 1610612736
_cell_bwrite_si_build_disabled = FALSE
_cell_state_dump_options = 0

有了这份参数列表以后我就能认出来了
控制这个功能的隐含参数为

_cell_mpp_threshold = 90
_cell_mpp_max_pushback = 50

其中_cell_mpp_threshold表示cell cpu的阈值,默认为90%,_cell_mpp_max_pushback表示返回Block I/O的最大比例,默认为50%。
也就是说默认情况下如果cell节点的cpu超过90%, 那么cell就会使用不经smart scan过滤Block I/O返回给db节点,这部分I/O的最大比例为50%。

我们可以在cellinit.ora参数文件中修改这个默认值。也可以使用如下方式修改当前内存中的值:

CellCLI> alter cell events = "immediate cellsrv.cellsrv_setparam('_cell_mpp_threshold','75')"

免责声明:

以上过程和隐含参数仅供参考,请不要在非Oracle Support的指导下设置这些隐含参数,如果造成任何负面影响,本人不承担由此造成的任何损失。

Comment

*

沪ICP备14014813号

沪公网安备 31010802001379号