oracle 19c X$MESSAGE

oracle 19c X$MESSAGE

 

 

SQL> select description , dest from x$messages order by dest;
DESCRIPTION															 DEST
-------------------------------------------------------------------------------------------------------------------- 
run all timeout and interrupt actions for this background
KSB action for X-instance calls 												 *
KSB action for bast checking													 *
generic shutdown background													 *
ksim instance group membership notifier 											 *
KSB action for ksbxic() calls													 *
enter / exit graph test specified wait												 *
Monitor Cleanup 														 *
svr actn for shrd grp reg/dereg 												 *
attach processes to dlm 													 *
init function for LCK1														 *
check for KJCI cross-instance requests												 *
Scumnt mount lock														 *
Poll system events broadcast channel												 *
database close in progress													 *
kfk FD cleanup															 *
Auto BMR message														 ABMR
Auto BMR Listener														 ABMR
ACMS Watchdog Task														 ACMS
ACMS initialization														 ACMS
ASM to slave BG msg														 ARB*
ARCH initialization														 ARC*
Archiver wakeup 														 ARC*
Archiver disconnect														 ARCH
Archiver shutdown														 ARCH
Job Queue Interrupt														 CJQ*
Job Queue Interrupt														 CJQ*
Job Queue Interrupt														 CJQ*
Job Queue Timout														 CJQ*
Shutdown Job Queue Process													 CJQ*
Check for async in-memory job messages												 CJQ0
Check for async messages from other instances											 CJQ0
Coordinator send broadcast timeout												 CJQ0
kcbz update TSE bh														 CKPT
CKPT reuse range call continuation												 CKPT
update recovery-based i/o statistics												 CKPT
Db Checkpt Request check													 CKPT
Db Checkpt Compl check														 CKPT
Get Proxy Lock															 CKPT
read CCID on standby														 CKPT
readable standby CKPT channel													 CKPT
start change tracking in ckpt													 CKPT
hold alert level														 CKPT
Flashback barrier														 CKPT
subscribe/unsubscribe to flashback broadcast channel										 CKPT
Cleanup move operations 													 CKPT
get (XR,4,0) enqueue														 CKPT
release (XR,4,0) enqueue													 CKPT
mira CKPT channel														 CKPT
KKX: drop ncomp dll action													 CKPT
Compile Environment Monitor													 CKPT
get RAC load enq in NULL													 CKPT
kfc CKPT mount disk group													 CKPT
kfc CKPT dismount disk group													 CKPT
kfc CKPT driver 														 CKPT
ASM-DB communication interrupt action												 CKPT
ASM-DB communication timeout action												 CKPT
unlock an extent map														 CKPT
CKPT stat update timeout action 												 CKPT
CKPT reuse call completion action												 CKPT
CKPT reuse range call														 CKPT
CKPT reuse call continuation													 CKPT
CKPT subscribe reuse block range channel											 CKPT
CKPT subscribe object channel													 CKPT
close control file														 CKPT
CPU dynamic reconfiguration													 CKPT
identify control file														 CKPT
change tracking message 													 CTWR
change tracking timeout action													 CTWR
shutdown DBRM															 DBRM
do RM action in DBRM														 DBRM
subscribe to quiesce channel													 DBRM
unsubscribe to quiesce channel													 DBRM
DBRM ADG in-memory state refresh												 DBRM
DBRM Timeout Actions														 DBRM
RM running count check														 DBRM
Check for MDBRM to LDBRM message												 DBRM
Check for LDBRM to MDBRM message												 DBRM
DBRM Active Session Limit Event 												 DBRM
Check queuing policy														 DBRM
SQL Memory Management Calculation												 DBRM
check for quiesce messages													 DBRM
unquiesce the instance during database close											 DBRM
mount/dismount all db files													 DBW*|BW*
get/release open thread enqueue 												 DBW*|BW*
DBWR monitor checkpoint 													 DBW*|BW*
DBWR write buffers														 DBW*|BW*
Refresh Snapshot														 DBW0
sync PDB															 DBW0
Release startup spfile lock													 DBW0
Get startup spfile lock 													 DBW0
kfcb Poke DBW0															 DBW0
kfcb force dismount disk group													 DBW0
kfc mount disk group														 DBW0
kfc dismount disk group 													 DBW0
kfc invalidate file extent													 DBW0
Reserve lock name space lock													 DBW0
Release lock name space lock													 DBW0
complete Release space call													 DBW0
periodic shadow lost write													 DBW0
unit test															 DBW0
close and unlock db file													 DBW0
offline db file 														 DBW0
verify/invalidate all db files													 DBW0
foreign key invalidation													 DBW0
recovery db file verification													 DBW0
DBWO timeout kcbifc														 DBW0
Message to flush IMU txns													 DBW0
Db Instance Lock Mgmt														 DBW0
Db File check															 DBW0
Db mount lock															 DBW0
identify db file														 DBW0
lock db file															 DBW0
acquire enq during pdbopen by HARIM												 DBW0
Process new DBs that join ASM locally												 DIA*
poradebug commands														 DIAG
perform process diagnostic													 DIAG
hanganalyze request														 DIAG
Clusterwise dump request													 DIAG
DMON do critical instance eval and registration 										 DMON
DMON Wakeup															 DMON
DMON Interrupt Routine														 DMON
DMON shutdown															 DMON
DMON Verify Standby shutdown for PM violation											 DMON
DMON Execute Task Message													 DMON
DSKM check for message from master DiskMon											 DSKM
IORM action															 DSKM
DSKM action															 DSKM
DSKM fini															 DSKM
DSKM init															 DSKM
DSKM procures HCA loadavg and computes offloaded write thresh									 DSKM
SAGE CC Action															 DSKM
FSFP Wakeup															 FSFP
FSFP Register Observer Message													 FSFP
FSFP Shutdown Message														 FSFP
Inactive Account Time Job													 GEN0
Heartbeat action for HSM connectivity												 GEN0
Periodic Remove Old Key From Standby												 GEN0
GEN0 Master Check														 GEN0
XStream timeout action														 GEN0
Diskgroup Resource Action													 GEN0
kfk timeout action in GEN0													 GEN0
ASM UFG health check timeout													 GEN0
kill client															 GEN0
initiate block repair														 GEN0
ASM kffil check timeout 													 GEN0
Volume Resource Action														 GEN0
UTS Async Dump															 GEN0
PDB SGA init															 GEN0
PDB close abort 														 GEN0
GWM RAC affinity action 													 GEN0
pdb event stats action														 GEN0
Wait Outlier thresholds 													 GEN0
Process wait outlier														 GEN0
KSU GUID MAC Address update													 GEN0
event nfy timeout action													 GEN0
KSM SGA slaves spawn														 GEN0
UM Background action														 GEN0
Parameter spfile sync														 GEN0
check for parameters from other instances											 GEN0
test kspset() from within bg													 GEN0
Action-Based process Test													 GEN0
Spawn processes on behalf of someone else											 GEN0
Direct NFSv4 RENEW lease operation												 GEN0
DTrace based Kernel IO	Outlier  Processing											 GEN0
Auxilary ipc init gen0 action													 GEN0
Auxilary ipc finish gen0 action 												 GEN0
Auxilary ipc timedout gen0 action												 GEN0
Auxilary ipc intr gen0 action													 GEN0
Auxilary ipc message gen0 action												 GEN0
prespawn init check														 GEN0
prespawn timeout check														 GEN0
prespawn clean check														 GEN0
Start Exadata specific backgrounds												 GEN0
GEN0 willl check for DSKM health												 GEN0
KSVMSVC timeout action														 GEN0
quarantine cdb actions														 GEN0
quarantine per-pdb actions													 GEN0
event nfy timeout action													 GEN0
KSNS background action														 GEN0
check instance runtime														 GEN0
ITM background action														 GEN0
Hang Manager parameters 													 GEN0
SGA memory tuning parameter update - GEN0											 GEN0
kcb L2 cache verify file header 												 GEN0
kcbz background redodump													 GEN0
kcb DW object cooling														 GEN0
kcb Cloud ENCTS check														 GEN0
kcl initiate persistent read mostly												 GEN0
kcl update persistent read mostly info												 GEN0
kcl downconvert object locks													 GEN0
periodic PDB tasks														 GEN0
clear preserved buffers and sessions												 GEN0
BA container GEN0 action													 GEN0
get quiesce enqueue														 GEN0
release quiesce enqueue 													 GEN0
GEN0 Timeout KCQ cleanup													 GEN0
prcoessing object/range redo markers												 GEN0
Prepare flashback log														 GEN0
GEN0
Fetch recent RCVID list 													 GEN0
CLI Create All Slave Tasks													 GEN0
CLI Create One Slave Task													 GEN0
MGA CMI background action													 GEN0|IPC0
AWR Raw Metrics Capture 													 GEN1
session param update action													 GEN1
MGA Stat refresh background action												 GEN1
Create vatopa mapping														 GEN1
KSB GEN1 init															 GEN1
ksusehst kill slave														 GEN1
KFFG sync timeout action													 GEN1
pcs timeout action														 GEN1
kspcs Oradism init														 GEN1
kspcs Oradism close														 GEN1
ksusehst record session usage action												 GEN1
ksusehst predict session usage action												 GEN1
session update action														 GEN2
GMON async DB messaging interrupt action											 GMON
GMON async DB messaging timeout action												 GMON
update PST															 GMON
update PST status														 GMON
check PST															 GMON
query PST															 GMON
update PST on reconf														 GMON
PST dismount disk group 													 GMON
set heartbeat sensing														 GMON
sense a heartbeat														 GMON
timeout 															 GMON
refresh 															 GMON
shutdown process														 GMON
Run unit test on all mounted groups												 GMON
emulate i/o errors on a disk													 GMON
Dump PST															 GMON
Global Txn SHUTdown														 GTX*
K2Q Timeout action														 GTX*
free instance lock														 GTX*
convert instance lock														 GTX*
get instance lock														 GTX*
IMCO cycle action														 IMCO
IMCO init															 IMCO
IMCO action															 IMCO
IMCO FastStart Defer Write Scheduler												 IMCO
IMCO Trickle Repopulate 													 IMCO
IMCO global dictionary action													 IMCO
IMCO ADO action 														 IMCO
IMCO worker action														 IMCO
INSV Shutdown Message														 INSV
INSV Receive Message														 INSV
INSV Interrupt															 INSV
KSGL timeout															 IPC0
PMON notify IPC0 of process failure												 IPC0
KSGL node join															 IPC0
KSGL node exit															 IPC0
KSRMA mount															 IPC0
KSRMA Recovery Log Allocation													 IPC0
KSRMA RMA OP															 IPC0
KSWL PMON notify IPC0 of process failure											 IPC0
KTURMA Message action														 IPC0
KSGL mount in IPC0														 IPC0
KSGL initialize service 													 IPC0
MGA CMI background init action													 IPC0
KSIPC MGA Segment Check 													 IPC0
KSIPC finish action														 IPC0
KSIPC Grp Refresh action													 IPC0
KSIPC reconfig action														 IPC0
KSIPC shutdown action														 IPC0
KSIPC msg action														 IPC0
KSIPC interrupt action														 IPC0
KSIPC initialize server 													 IPC0
KSIPC timeout action														 IPC0
KJFM update process heartbeat													 LCK*|DIA*|LGWR|CKPT|DBRM|IPC0|ARC*
kxfp signal recv function													 LCK0
ksim reg/dereg instance group													 LCK0
ksim query instance group													 LCK0
ksim polling interrupt action													 LCK0
ksim get value															 LCK0
ksim cache line update														 LCK0
KSB Rolling Migration interrupt action												 LCK0
KSR Control message service routine												 LCK0
KSXP look for slow sends													 LCK0
Cross-instance broadcast message												 LCK0
KSXR remote instance died													 LCK0
KSXR finialize															 LCK0
Get java patching locks 													 LCK0
Free java patching locks													 LCK0
Check SCN adjust														 LCK0
gc global ckpt timeout														 LCK0
gc policy timeout														 LCK0
gc policy interrupt														 LCK0
kcl stop gcscn bcast														 LCK0
kcl start gcscn bcast														 LCK0
get TS enq for sort segment													 LCK0
release TS enq for sort segment 												 LCK0
sequence bckgrnd instance lock													 LCK0
Get instance lock														 LCK0
Escalate Lock															 LCK0
Get instance lock for recovery													 LCK0
Release instance lock														 LCK0
Recalled lock															 LCK0
kea signal recv function													 LCK0
ksim LCK0 functions														 LCK0
get and hold global enqueue													 LCK0
KQLM interrupt action														 LCK1
KQLM invalidation instance lock operation											 LCK1
kfr Poke LGWR															 LGWR
kfr Incr Ckpt															 LGWR
kfr ACD relocation														 LGWR
kfr Disable Reloc CIC														 LGWR
LGWR Set DCON bit														 LGWR
LGWR failure															 LGWR
LGWR re-eval member locks													 LGWR
Redo writer IO's														 LGWR
Redo writer post action 													 LGWR
Redo writer log switch operations												 LGWR
Redo writer heartbeat														 LGWR
Redo writer generate offline immed marker											 LGWR
Cache all logfiles														 LGWR
Advance RTA boundary														 LGWR
Suspend Redo Gen														 LGWR
Suspend Redo Gen														 LGWR
LGWR initialization														 LGWR
LGWR flush workers														 LGWR
Open/close/mount/dismount thread												 LGWR
readable standby LGWR channel													 LGWR
Standby media recovery info cleanup												 LGWR
Switchover/PDB relocate message channel subscribe										 LGWR
Redo writer quiesce IMC on standby												 LGWR
Redo writer advance query scn on standby											 LGWR
Redo Writer end query scn on standby												 LGWR
action to cleanup mira context													 LGWR
dml_locks = 0 global enforcement												 LGWR
get KTU instance lock														 LGWR
convert KTU instance lock													 LGWR
free KTU instance lock														 LGWR
LGWR to Start DMON														 LGWR
kfr mount disk group														 LGWR
kfr Dismount disk group 													 LGWR
KJBFP PBR writer main														 LMFC
KJBFP increment PRI														 LMFC
Complete deferred initialization of components											 MMAN
Reset advisory pool when advisory turned ON											 MMAN
Handle sga_target deferred request delete											 MMAN
Handle mem_target resize													 MMAN
Handle sga_target resize													 MMAN
SGA deferred allocated granules Initialization											 MMAN
Memory Management														 MMAN
SGA deferred allocated granules move												 MMAN
tune undo retention														 MMNL
Get current scheduling delay													 MMNL
MMNL Periodic MQL Selector													 MMNL
AWR Raw Metrics Capture 													 MMNL
MMNL DB close															 MMNL
ASH Sampler (KEWA)														 MMNL
Sweep PL/SQL incidents														 MMON
Process staged incidents													 MMON
DDE Periodic Dump Scheduler													 MMON
DDE MMON action to schedule async action slaves 										 MMON
Execute GWM MMON task														 MMON
AWR Auto Flush Task														 MMON
AWR PDB Auto Flush Task 													 MMON
AWR UMF Auto Flush Task 													 MMON
AWR Auto Purge Task														 MMON
AWR PDB Auto Purge Task 													 MMON
AWR Auto DBFUS Task														 MMON
AWR Auto CPU USAGE Task 													 MMON
KEWR SlavePool Test MMON Main													 MMON
ASH Emergency Flusher (KEWA)													 MMON
ASH Progressive Flusher (KEWA)													 MMON
Advisor delete expired tasks													 MMON
MMON Initialization action													 MMON
MMON Post-Initialization action 												 MMON
MMON Remote action Listener													 MMON
MMON Local action Listener													 MMON
MMON Completion Callback Dispatcher												 MMON
MMON set edition interrupt action												 MMON
MMON ping action														 MMON
MMON testing action 2														 MMON
MMON testing slave														 MMON
run-once action driver														 MMON
shutdown MMON															 MMON
MMON DB open															 MMON
MMON DB close															 MMON
MMON periodic cleanup														 MMON
MMON Autotask Action														 MMON
metrics monitoring														 MMON
threshold reconciliation													 MMON
threshold reloading														 MMON
ADDM (KEH)															 MMON
sql tuning hard kill defense													 MMON
autotask status check														 MMON
Maintain AWR Baseline Thresholds Task												 MMON
WCR: Record Action Switcher													 MMON
WCR: Record State Check 													 MMON
WCR: Record SharedDir Check													 MMON
WCR: Capture Stats Update													 MMON
WCR: Abort capture on IO error													 MMON
KEC Create directory														 MMON
WCR: Replay Action Switcher													 MMON
WCR: Replay State Check 													 MMON
WCR: Replay Deadlock Detection													 MMON
WCR: Replay Stats Update													 MMON
WCR: Replay Admin Alive Check													 MMON
WCR: Replay Send References													 MMON
WCR: Replay remote clock propagation												 MMON
WCR: Replay remote posts													 MMON
SQL Monitoring Garbage Collector												 MMON
Report Capture Daemon														 MMON
Report Capture Test (KERPI)													 MMON
Real-Time ADDM Trigger														 MMON
Index usage tracking statistics flush												 MMON
Execute on-demand tuning task (KESTS)												 MMON
Check for sync messages from other instances											 MMON
Check for autostart messages from other instances										 MMON
Coordinator autostart timeout													 MMON
Job Autostart action force													 MMON
ADR Auto Purge Task														 MMON
ADR Space Management Statistics Flush												 MMON
ADR PDB Auto Purge Task 													 MMON
Schedule slave to update incident meter 											 MMON
Update KGSTM Translation													 MMON
Update shared pool advice stats 												 MMON
Refresh active service cache													 MMON
reload failed KSPD callbacks													 MMON
reconfiguration MMON action													 MMON
MMON request to purge LTXID history table											 MMON
SGA memory tuning parameter update												 MMON
SGA memory tuning														 MMON
flashcache object keep monitor													 MMON
deferred controlfile autobackup action												 MMON
recovery area alert action													 MMON
Flashback Marker														 MMON
SMON_SCN_TIME Copy to PDBs													 MMON
undo usage															 MMON
UNDO MMON ACTION														 MMON
Block Cleanout Optim, Undo Segment Scan 											 MMON
BCO:																 MMON
Flashback Archive RAC Health Check												 MMON
Flashback Archive RAC action													 MMON
CLI AutoPartition														 MMON
ILM segment access flush													 MMON
ILM row access flush														 MMON
tune undo retention lob 													 MMON
MMON Periodic LOB  MQL Selector 												 MMON
MMON Periodic LOB Spc Analyze													 MMON
tablespace alert monitor													 MMON
ILM check															 MMON
ILM cleanup															 MMON
Column-Level Statistics flush													 MMON
Cleanup of unpinned KGL handles 												 MMON
OLS Cleanup															 MMON
Triton Session Cleanup														 MMON
ASM Audittrail cleanup														 MMON
dblink logon table cleanup													 MMON
Sample Shared Server Activity													 MMON
JAVAVM poll for JIT														 MMON
Shutdown JAVAVM JIT														 MMON
Compute cache stats in background												 MMON
SPM: Auto-purge expired SQL plan baselines											 MMON
SPM: Check SMB size														 MMON
SPM: Delete excess sqllog$ batches												 MMON
flushing workload information for optimizer											 MMON
cleaning up workload information for optimizer											 MMON
light-weight checks for optimizer statistics advisor										 MMON
SQLQ: Auto-purge expired SQL quarantine configurations										 MMON
PQ: Adjust Slave Pool														 MMON
KSXM Advance DML Frequencies													 MMON
KSXM Broadcast DML Frequencies													 MMON
PGA Max Size (QESMM)														 MMON
Cleanup client cache server state in background 										 MMON
MMON TSM Cleanup														 MMON
alert message cleanup														 MMON
alert message purge														 MMON
Network Server shutdown 													 NSS*
Network Server forced														 NSS*
Network Server wakeup														 NSS*
NetSlave Shutdown Message													 NSV*
NetSlave Initialize Message													 NSV*
NetSlave Interrupt Function													 NSV*
NetSlave Wakeup Message 													 NSV*
pmon dtp init															 PMON|CLMN
scrub for timedout processes													 PSP0
measure scheduling delay													 PSP0
KSO initialize PSP0														 PSP0
fork new processes														 PSP0
KSO finalize PSP0														 PSP0
kxfp remote slave spawn recv function												 PXMN
free PX memory chunks in background												 PXMN
periodic PDB tasks														 PXMN
DSKM to master BG msg														 RBAL
register to node local process group												 RBAL
Interrupt Action														 RBAL
Timeout interrupt action													 RBAL
ASM to master BG msg														 RBAL
ASM to BG mesg															 RBAL|GMON
BG load lib msg 														 RBAL|PSP0
distributed recovery shutdown													 RECO
distributed recovery wakeup													 RECO
RMON BG Driver															 RMON
Payload action to BG														 RMON
RMON Init Action														 RMON
shutdown RMON process														 RMON
action for buddy instance													 RMS0
do work for new gpnp instances													 RMS0
action to cleanup buddy instance context											 RMS0
RSM Wakeup															 RSM0
RSM Set Parameter														 RSM0
RSM Interrupt Routine														 RSM0
shut down															 RSMN
main																 RSMN
Test Driver wakeup														 RTTD
RTTD initialization														 RTTD
Open/close flashback thread													 RVWR
RVWR IO's															 RVWR
SMCO Action															 SMCO
SMCO Timout															 SMCO
ASM instance recovery														 SMON
Reliable Test Dummy Call													 TEST
Test managed actions														 TEST
TEST initialization														 TEST
Check TEST initialization complete												 TEST
Initiate KSBCITST														 TEST
Monitor initialization														 TMON
Monitor wakeup															 TMON
TPZ initialization														 TPZ*
Test Process wakeup														 TPZ*
Multi procs per DTP														 UTMU
Action to process msgs from instance processes											 XDMG
Diskgroup timeout actions for Exadata disks auto manage module									 XDMG
Infrequent timeout actions for Exadata disks auto manage module 								 XDMG
Interrupt actions for Exadata disks auto manage module										 XDMG
530 rows selected.
SQL> select * from v$version;
BANNER
----------------------------------------------------------------------------------------------------------------------------------------------------------------
BANNER_FULL
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BANNER_LEGACY																			     CON_ID
---------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production													  0

prm dul supports all version oracle pluggable database

prm oracle recovery software supports oracle 12cR1 12cR2 18c 19C PDB CDB Container database pluggable database

PRM-SCAN extract datafile from corrupted asm disk group

PRM-SCAN extract datafile from corrupted asm disk group

let ‘s go !

asmcmd
lsdsk
lsdg

let corrupt the asm disk header by manual

dd if=/dev/zero of=/dev/vdb1 bs=1024k count=10 conv=notrunc

this will write zero to asm header 10M

SQL> alter diskgroup data mount;
alter diskgroup data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup “DATA” cannot be mounted
ORA-15040: diskgroup is incomplete

fail to mount the corrupted asm disk group

ok

use PRMSCAN to scan the failed asm diskgroup’s disk

this only require read-only permission

it will take long time to scan , up to the disk size

java -jar PRMScan.jar –outputsh o1.sh

this command will output lots of dd command , help you extract datafiles blocks and merge them into datafiles

let see it

some datafiles may need to be ignored , find out your datafiles !!

execute the shell will generate datafiles under your current directory !

if you have multiple db’s system01.dbf store on same diskgroup , you may need to distinguish them by yourself, or you can call parnassusData service

let’s extract data from the datafile using prm !!

prm scan can work on

1. corrupted asm diskgroup ,failed to mount
2. corrupted / formatted filesystem
3. deleted datafile from asm diskgroup
4. deleted datafile from filesystem

filesystem(ntfs,ext 2/3/4,xfs whatever filesystem type)

we provide prm scan as a product (bundle with prm) or a service .

contact us service@parnassusdata.com !!

2019年1月的清迈之旅

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

RHEL/CENTOS 7中需要注意的OS设置 会影响ORACLE数据库实例运行

 

 

 

 

1、 Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被删除

Oracle Database – Enterprise Edition – 版本 11.2.0.4 和更高版本
Linux x86-64

症状

Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被神秘删除.

更改

原因

Oracle Linux 7和Redhat Linux 7都有一个内核服务systemd-tmpfiles-clean.service,由systemd管理并删除临时位置的文件。

上述服务删除:

  1. 在/tmp 中的文件/目录超过10天没有访问的(在tmp.conf中定义)
  2. 在/var/tmp中的文件/目录超过30天没有访问的(在tmp.conf中定义)

通过检查文件/目录的所有atime/mtime/ctime来确定“没有访问”。

 

解决方案

排除套接字文件被内核服务systemd-tmpfiles-clean.service删除

要排除tmp目录中的套接字文件被tempfile clean服务删除,请更改/usr/lib/tmpfiles.d/tmp.conf的内容并添加

x /tmp/.oracle*

x /var/tmp/.oracle*

x /usr/tmp/.oracle*

上面的“x”选项指示systemd-tmpfiles-clean.service排除列出目录中的文件。

注意:目录/var/tmp/.oracle包含许多“特殊”套接字文件,本地客户端使用这些文件通过IPC协议(sqlnet)连接到各种Oracle进程,包括TNS监听器,CSS,CRS和EVM守护进程甚至是数据库或ASM实例。在Clusterware运行时删除套接字文件时  ,会出现Doc ID 391790.1的症状

 

 

 

 

 

ALERT: Setting RemoveIPC=yes on Redhat 7.2 and higher Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)

 

ontroled by the option RemoveIPC in the /etc/systemd/logind.conf configuration file,
see man logind.conf(5) for details.

The default value for RemoveIPC in RHEL7.2 and higher is yes.

As a result, when the last oracle or grid user disconnects, the OS removes shared memory segments and semaphores for those users.
As Oracle ASM and Databases use shared memory segments for SGA, removing shared memory segments will crash the Oracle ASM and database instances.

Please refer to the Redhat bug 1264533  – https://bugzilla.redhat.com/show_bug.cgi?id=1264533

OCCURRENCE

The problem affects all applications including Oracle Databases that use the shared memory segments and semaphores; thus, both, Oracle ASM and database instances are affected.

Oracle Linux 7.2 avoids this problem by setting RemoveIPC to no explicitly on /etc/systemd/logind.conf configuration file,
but if /etc/systemd/logind.conf is touched or modified before the upgrade started, the yum/update will write the correct/new configuration file (with RemoveIPC=no) as logind.conf.rpmnew,
and if user retains their original configuration file, then most likely the failures described in this note will occur.
To avoid this problem, after the upgrade be sure to edit the logind.conf and set RemoveIPC=no.  This is documented in the Oracle Linux 7.2 release notes.

SYMPTOMS

1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.

2) Upgrading to 11.2 and 12c GI/CRS fails.

3) After Redhat Linux is upgraded to 7.2 and higher, 11.2 and 12c ASM and database instances crash.

 

The removal of the IPC objects by systemd-logind may happen at any time, as such the failure patterns can vary greatly, here are some examples of how failures may look like:

 

Most common error that occurs is that the following is found in the asm or database alert.log:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1

 

The second observed error occurs during installation and upgrade when asmca fails with the following error:
KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
KFOD-00105: Could not open pfile ‘init@.ora’

 

The third observed error occurred during installation and upgrade:
Creation of ASM password file failed. Following error occurred: Error in Process: $GRID_HOME/bin/orapwdEnter password for SYS:

OPW-00009: Could not establish connection to Automatic Storage Management instance

2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed
2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM

 

The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed:
Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error
ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]

 

WORKAROUND

1) Set RemoveIPC=no in /etc/systemd/logind.conf

2) Reboot the server or restart systemd-logind as follows:
# systemctl daemon-reload
# systemctl restart systemd-logind

PATCHES

Migrating to Oracle Linux 7.2 and higher from Redhat 7.2 and higher resolves this problem.

If migrating to Oracle Linux 7.2 is not possible, please use the above workaround by setting RemoveIPC=no in /etc/systemd/logind.conf

prmscan recover deleted oracle datafiles on windows

prmscan recover deleted oracle datafiles on windows

 

 

 

caution , don't write anything on the disk/partition ,
which stored the deleted oracle datafiles
caution , don't write anything to the disk/partition ,
which stored the deleted oracle datafiles
caution , don't write anything to the disk/partition ,
which stored the deleted oracle datafiles
shutdown your oracle instance first!!
run prmscan , it will scan the disk/partition ,
and merge blocks.
don't put the software and result on the disk/partition 
which stored the deleted oracle datafiles
don't put the software and result on the disk/partition 
which stored the deleted oracle datafiles
install jdk 1.8 and cmder first !!
run cmder as administrator !!
scan disk E partition
java -jar prmscan.jar --scan \\.\E:
merge block , make datafiles
java -jar prmscan.jar --outputsh new.sh
replace \\.\ with \\\.\ in the new.sh
sh new.sh
use prm-dul , dictionary-mode , unload the data !!
select name from v$datafile where ts#=0;
C:\APP\VC\ORADATA\ORCL\DATAFILE\O1_MF_SYSTEM_GYK0O766_.DBF
C:\Users\vc\Desktop\prmscan.0.2.6\PD_DBF7.dbf
ok! it works !

在windows上使用prmscan恢复误删除的oracle数据文件中的数据

oracle datafile数据文件被误删除 或者 文件系统被误格式化 /损坏, 均可以采用prmscan软件极大程度上扫描磁盘上残存的数据块,并合并成数据文件,来达到恢复数据的目的。

 

prmscan支持windows,linux和unix

 

prm scan 恢复Windows下损坏的文件系统或被从文件系统上删除的oracle数据文件中的数据。

 

 

 

 

 

 

误删除oracle数据文件的情况下
建议先关数据库实例,然后不要在误删除数据文件所在磁盘上写入任何新数据!!
建议先关数据库实例,然后不要在误删除数据文件所在磁盘上写入任何新数据!!
建议先关数据库实例,然后不要在误删除数据文件所在磁盘上写入任何新数据!!
使用prmscan恢复该数据文件
注意不要将软件和扫描结果存放在和被删除的数据文件一个磁盘上!!
注意不要将软件和扫描结果存放在和被删除的数据文件一个磁盘上!!
注意不要将软件和扫描结果存放在和被删除的数据文件一个磁盘上!!
否则会导致覆盖数据,而无法恢复数据
否则会导致覆盖数据,而无法恢复数据
否则会导致覆盖数据,而无法恢复数据
先安装JDK 1.8和cmder !!!
先安装JDK 1.8和cmder !!!
先安装JDK 1.8和cmder !!!
1.扫描磁盘 java -jar prmscan.jar --scan XX
2.碎片合并被删除的文件 java -jar prmscan.jar --outputsh xx
==> 注意不要在被删除文件所在磁盘上作以上操作!!!
注意要以管理员权限启动cmder 
注意要以管理员权限启动cmder 
注意要以管理员权限启动cmder
cd C:\Users\vc\Desktop\prmscan.0.2.6
C:
java -jar PRMScan.jar --scan \\.\E:
\\.\E: ==>数据在e盘 就写\\.\E:, 在F盘就写 \\.\F:
打开生成的sh文件 替换 \\.\ 为 \\\.\
然后执行
sh new.sh
会生成一个 对应的数据文件
select name from v$datafile where ts#=0;
C:\APP\VC\ORADATA\ORCL\DATAFILE\O1_MF_SYSTEM_GYK0O766_.DBF
C:\Users\vc\Desktop\prmscan.0.2.6\PD_DBF6.dbf ==>对应的数据文件
解压prm软件,启动程序,字典模式,加入上面查出来的system01.dbf和新生成的文件
然后就能恢复被删除文件中的数据了!!
ok!

使用pg_filedump恢复PostgreSQL中无备份的被误删除delete的行数据

如果自己搞不定可以找诗檀软件专业PostgreSQL数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

在postgreSQL中少量数据的删除时可以通过pg_filedump工具恢复被误删除的数据行的。

 

其原理是虽然数据被删除了,但在数据文件中其仅仅是被标记为删除,而并未被清空;只要在auto vaccum之前都可以通过pg_filedump工具恢复这部分数据。

 

下面是一个例子:

 

 

create database testa;
\c testa
create table novels (name varchar(200), id int);
insert into novels values('三国演义',1);
insert into novels values('水浒传',2);
insert into novels values('西游记',3);
insert into novels values('红楼梦',4);
select oid from pg_database where datname='testa';
oid  
-------
17420
select oid,relfilenode from pg_class where relname='novels';
oid  | relfilenode 
-------+-------------
17421 |       17421
(1 row)
testa=# delete from novels;
DELETE 4
testa=# select * from novels;
name | id 
------+----
(0 rows)
postgres@vultr:~$ ls -l /var/lib/postgresql/10/main/base/17420/17421 
-rw------- 1 postgres postgres 8192 Nov 25 05:11 /var/lib/postgresql/10/main/base/17420/17421
root@vultr:~# chmod 700 pg_filedump_rc
root@vultr:~# ./pg_filedump_rc -D charn,int /var/lib/postgresql/10/main/base/17420/17421
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 11.0
*
* File: /var/lib/postgresql/10/main/base/17420/17421
* Options used: -D charn,int 
*
* Dump created on: Mon Nov 25 05:17:40 2019
*******************************************************************
Block    0 ********************************************************
----- Block Offset: 0x00000000 Offsets: Lower 40 (0x0028) Block: Size 8192 Version 4 Upper 8024 (0x1f58) LSN: logid 0 recoff 0x07db2630 Special 8192 (0x2000) Items: 4 Free Space: 7984 Checksum: 0xd24c Prune XID: 0x0000037b Flags: 0x0000 () Length (including item array): 40 ------ Item 1 -- Length: 44 Offset: 8144 (0x1fd0) Flags: NORMAL COPY: 三国演义 1 Item 2 -- Length: 40 Offset: 8104 (0x1fa8) Flags: NORMAL COPY: 水浒传 2 Item 3 -- Length: 40 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 西游记 3 Item 4 -- Length: 40 Offset: 8024 (0x1f58) Flags: NORMAL COPY: 红楼梦 4 *** End of File Encountered. Last Block Read: 0 *** root@vultr:~# ./pg_filedump_rc -D charn,int /var/lib/postgresql/10/main/base/17420/17421|grep COPY COPY: 三国演义 1 COPY: 水浒传 2 COPY: 西游记 3 COPY: 红楼梦 4

PostgreSQL FATAL: could not read block 0 in file Input/output error错误

FATAL: could not read block 0 in file  Input/output error
ERROR:  could not read block 84 in file "base/16386/670007": Input/output error
ERROR:  could not read block 4707 of relation 1663/16384/16564: Success

 

如果PG的登陆出现上述错误 则一般说明PG数据文件出现了磁盘故障,导致无法登陆数据库,可以尝试用以下步骤解决

 

步骤1 设置 zero_damaged_pages 参数 并尝试重建系统索引,zero_damaged_pages加入到postgre配置文件中,并reload或重启pg服务进程:

 

 

 

reindexdb -p 5433 --system dbname

 

步骤2 设置 ignore_system_indexes=true , 以尝试忽略system index的影响:

 

 

pg_ctl -D /data -o '-c ignore_system_indexes=true' start
并连接数据库
psql $dbname

 

步骤3 重建全库 索引

reindex database "dbname";

步骤4 vacuum 全表库

vacuum full analyze verbose;
vacuum full verbose;

步骤5 禁用约束

update pg_class set relchecks=0 where relname='tablename';

步骤六 重建索引

reindex database "dbname";

最后建议用pg_dump 导出全库 后 重建数据库。

如果做了上述步骤还不行, 可以考虑用pg_filedump 抽取所有文件内的数据后再重建数据库。

PostgreSQL checksum

文章来源: https://yq.aliyun.com/articles/675942

在计算机系统中,checksum 通常用于校验数据在传输或存取过程中是否发生错误。PostgreSQL 从 9.3 开始支持 checksum,以发现数据因磁盘、 I/O 损坏等原因造成的数据异常。本文介绍 PostgreSQL 中 checksum 的使用及其实现原理。

 

概述

PostgreSQL 从 9.3 开始支持数据页的 checksum,可以在执行 initdb 时指定 -k 或 --data-checksums 参数开启 checksum,但开启 checksum 可能会对系统性能有一定影响,官网描述如下:

Use checksums on data pages to help detect corruption by the I/O system that would otherwise be silent. Enabling checksums may incur a noticeable performance penalty. This option can only be set during initialization, and cannot be changed later. If set, checksums are calculated for all objects, in all databases.

启用 checksum 后,系统会对每个数据页计算 checksum,从存储读取数据时如果检测 checksum 失败,则会发生错误并终止当前正在执行的事务,该功能使得 PostgreSQL 自身拥有了检测 I/O 或硬件错误的能力。

Checksum 引入一个 GUC 参数 ignore_checksum_failure,该参数若设置为 true,checksum 校验失败后不会产生错误,而是给客户端发送一个警告。当然,checksum 失败意味着磁盘上的数据已经损坏,忽略此类错误可能导致数据损坏扩散甚至导致系统奔溃,此时宜尽早修复,因此,若开启 checksum,该参数建议设置为 false

 

实现原理

 

设置 checksum

 

数据页的 checksum 在从 Buffer pool 刷到存储时才设置,当页面再此读取至 Buffer pool 时进行检测。

PostgreSQL 中 Buffer 刷盘的逻辑集中在 FlushBuffer 中,其中设置 checksum 的逻辑如下:

 

/*
* Update page checksum if desired.  Since we have only shared lock on the
* buffer, other processes might be updating hint bits in it, so we must
* copy the page to private storage if we do checksumming.
*/
bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);

 

 

比较有意思的是,其他进程可能会在只加 content-lock 共享锁的情况下并发修改 page 的 Hint Bits,从而导致 checksum 值发生变化,为确保 page 的内容及其 checksum 保持一致,PostgreSQL 采用了 先复制页,然后计算 checksum 的方式,如下:

 

 

/*
* We allocate the copy space once and use it over on each subsequent
* call.  The point of palloc'ing here, rather than having a static char
* array, is first to ensure adequate alignment for the checksumming code
* and second to avoid wasting space in processes that never call this.
*/
if (pageCopy == NULL)
pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
memcpy(pageCopy, (char *) page, BLCKSZ);
((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);

 

 

即先将数据页的内容拷贝一份,拷贝的数据自然不会被其他进程修改,然后基于该拷贝页计算并设置 checksum 值。

 

 

checksum 算法

数据页的 checksum 算法基于 FNV-1a hash 改造而来,其结果为 32 位无符号整型。由于 PageHeaderData 中 pd_checksum 是 16 位无符号整型,因此将其截取 16 位作为数据页的 checksum 值,如下:

 

/*
* Save pd_checksum and temporarily set it to zero, so that the checksum
* calculation isn't affected by the old checksum stored on the page.
* Restore it after, because actually updating the checksum is NOT part of
* the API of this function.
*/
save_checksum = cpage->phdr.pd_checksum;
cpage->phdr.pd_checksum = 0;
checksum = pg_checksum_block(cpage);
cpage->phdr.pd_checksum = save_checksum;
/* Mix in the block number to detect transposed pages */
checksum ^= blkno;
/*
* Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
* one. That avoids checksums of zero, which seems like a good idea.
*/
return (checksum % 65535) + 1;

 

pg_checksum_block 函数计算数据页的 32 位 checksum 值,具体算法可以参考源码,在此不详述。

 

 

检测 checksum

 

 

PostgreSQL 会在页面从存储读入内存时检测其是否可用,调用函数为 PageIsVerified,该函数不仅会检测正常初始化过的页(non-zero page),还会检测 全零页(all-zero page)

为什么会出现 全零页 呢?
在特定场景下表中可能出现 全零页,比如有进程扩展了一个表,即在该表中添加了一个新页,但在 WAL 日志写入存储之前,进程崩溃了。此时新加的页可能已经在表文件中,下次重启时就会读取到。

对于 non-zero page,检测其 checksum 是否一致以及 page header 信息是否正确,若 checksum 失败,但 header 信息正确,此时会根据 ignore_checksum_failure 值判断验证是否通过;对于 all-zero page,如果为全零,则验证通过。

若验证失败,两种处理方式:

  • 若读取数据的模式为 RBM_ZERO_ON_ERROR 且 GUC 参数 zero_damaged_pages 为 true,则将该页全部置 0
  • 报错,invalid page

 

checksum 与 Hint bits

 

 

数据页写至存储时,如果写失败,可能会导致破碎的页(torn page),PostgreSQL 通过 full_page_writes 特性解决此类写失败导致数据不可用的问题。

Hint Bits 是数据页中用于标识事务状态的标记位,一般情况下,作为提示位,不是很重要。但如果使用了 checksum,Hint Bits 的变化会导致 checksum 值发生改变。设想如果一个页面发生部分写,恰好把某些 Hint Bits 写错,此页面可能并不影响正常使用,但 checksum 会抛出异常,此时应如何恢复呢?

在 checksum 的实现中,checkpoint 后,如果页面因更新 Hint Bits 第一次被标记为 dirty,需要记录一个 Full Page Image 至 WAL 日志中,以应对以上提到的因 Hint Bits 更新丢失导致 checksum 失败的问题,具体实现可参考 MarkBufferDirtyHint。对于已经是 dirty 的页,更新 Hint Bits 则不需要记录 WAL 日志,因为在 checkpoint 后,第一次将该页标记为 dirty 时已经写入了对应的 Full Page Image

可见,在启用 checksum 的情况下,checkpoint 后页面的第一次修改如果是更新 Hint Bits, 会写 Full Page Image 至 WAL 日志,这会导致 WAL 日志占用更多的存储空间。

关于 PostgreSQL checksum 和 Full Page Image 的关系,可以参考 stackoverflow 上这个问题

 

 

查看 checksum

PostgreSQL 10 在 pageinspect 插件中添加了函数 page_checksum() 用来查看 page 的 checksum,当然使用 page_header() 也可以查看 page 的 checksum,如下:

 

postgres=# SELECT page_checksum(get_raw_page('pg_class', 0), 0);
page_checksum
---------------
17448
(1 row)
postgres=# SELECT * FROM page_header(get_raw_page('pg_class', 0));
lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
0/78A1E918 |    17448 |     0 |   200 |   368 |    8192 |     8192 |       4 |         0
(1 row)

总结

Checksum 使 PostgreSQL 具备检测因硬件故障或传输导致数据不一致的能力,一旦发生异常,通常会报错并终止当前事务,用户可以尽早察觉数据异常并予以恢复。当然,开启 checksum 也会引入一些开销,体现在两个方面:

  • 计算数据页的 checksum 会引入一些 CPU 开销,具体开销取决于 checksum 算法的效率
  • checkpoint 后,若因更新 Hint Bits 将页面第一次置为 dirty 会写一条记录 Full Page Image 的 WAL 日志,以用于恢复因更新 Hint Bits 产生的破碎页。

对于数据可用性要求较高的场景,通常建议将 full_page_writes 和 checksum 都打开,前者用于避免写失败导致的数据缺失,后者用于尽早发现因硬件或传输导致数据不一致的场景,一旦发现,可以利用 full_page_writes 和 checksum 记录在 WAL 日志中的 Full Page Image 进行数据恢复。

References

沪ICP备14014813号

沪公网安备 31010802001379号