prm dul supports all version oracle pluggable database

prm oracle recovery software supports oracle 12cR1 12cR2 18c 19C PDB CDB Container database pluggable database

PRM-SCAN extract datafile from corrupted asm disk group

PRM-SCAN extract datafile from corrupted asm disk group

let ‘s go !


let corrupt the asm disk header by manual

dd if=/dev/zero of=/dev/vdb1 bs=1024k count=10 conv=notrunc

this will write zero to asm header 10M

SQL> alter diskgroup data mount;
alter diskgroup data mount
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup “DATA” cannot be mounted
ORA-15040: diskgroup is incomplete

fail to mount the corrupted asm disk group


use PRMSCAN to scan the failed asm diskgroup’s disk

this only require read-only permission

it will take long time to scan , up to the disk size

java -jar PRMScan.jar –outputsh

this command will output lots of dd command , help you extract datafiles blocks and merge them into datafiles

let see it

some datafiles may need to be ignored , find out your datafiles !!

execute the shell will generate datafiles under your current directory !

if you have multiple db’s system01.dbf store on same diskgroup , you may need to distinguish them by yourself, or you can call parnassusData service

let’s extract data from the datafile using prm !!

prm scan can work on

1. corrupted asm diskgroup ,failed to mount
2. corrupted / formatted filesystem
3. deleted datafile from asm diskgroup
4. deleted datafile from filesystem

filesystem(ntfs,ext 2/3/4,xfs whatever filesystem type)

we provide prm scan as a product (bundle with prm) or a service .

contact us !!

















RHEL/CENTOS 7中需要注意的OS设置 会影响ORACLE数据库实例运行





1、 Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被删除

Oracle Database – Enterprise Edition – 版本 和更高版本
Linux x86-64


Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被神秘删除.



Oracle Linux 7和Redhat Linux 7都有一个内核服务systemd-tmpfiles-clean.service,由systemd管理并删除临时位置的文件。


  1. 在/tmp 中的文件/目录超过10天没有访问的(在tmp.conf中定义)
  2. 在/var/tmp中的文件/目录超过30天没有访问的(在tmp.conf中定义)





要排除tmp目录中的套接字文件被tempfile clean服务删除,请更改/usr/lib/tmpfiles.d/tmp.conf的内容并添加

x /tmp/.oracle*

x /var/tmp/.oracle*

x /usr/tmp/.oracle*


注意:目录/var/tmp/.oracle包含许多“特殊”套接字文件,本地客户端使用这些文件通过IPC协议(sqlnet)连接到各种Oracle进程,包括TNS监听器,CSS,CRS和EVM守护进程甚至是数据库或ASM实例。在Clusterware运行时删除套接字文件时  ,会出现Doc ID 391790.1的症状






ALERT: Setting RemoveIPC=yes on Redhat 7.2 and higher Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)


ontroled by the option RemoveIPC in the /etc/systemd/logind.conf configuration file,
see man logind.conf(5) for details.

The default value for RemoveIPC in RHEL7.2 and higher is yes.

As a result, when the last oracle or grid user disconnects, the OS removes shared memory segments and semaphores for those users.
As Oracle ASM and Databases use shared memory segments for SGA, removing shared memory segments will crash the Oracle ASM and database instances.

Please refer to the Redhat bug 1264533  –


The problem affects all applications including Oracle Databases that use the shared memory segments and semaphores; thus, both, Oracle ASM and database instances are affected.

Oracle Linux 7.2 avoids this problem by setting RemoveIPC to no explicitly on /etc/systemd/logind.conf configuration file,
but if /etc/systemd/logind.conf is touched or modified before the upgrade started, the yum/update will write the correct/new configuration file (with RemoveIPC=no) as logind.conf.rpmnew,
and if user retains their original configuration file, then most likely the failures described in this note will occur.
To avoid this problem, after the upgrade be sure to edit the logind.conf and set RemoveIPC=no.  This is documented in the Oracle Linux 7.2 release notes.


1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.

2) Upgrading to 11.2 and 12c GI/CRS fails.

3) After Redhat Linux is upgraded to 7.2 and higher, 11.2 and 12c ASM and database instances crash.


The removal of the IPC objects by systemd-logind may happen at any time, as such the failure patterns can vary greatly, here are some examples of how failures may look like:


Most common error that occurs is that the following is found in the asm or database alert.log:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1


The second observed error occurs during installation and upgrade when asmca fails with the following error:
KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
KFOD-00105: Could not open pfile ‘init@.ora’


The third observed error occurred during installation and upgrade:
Creation of ASM password file failed. Following error occurred: Error in Process: $GRID_HOME/bin/orapwdEnter password for SYS:

OPW-00009: Could not establish connection to Automatic Storage Management instance

2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed
2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM


The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed:
Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error
ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]



1) Set RemoveIPC=no in /etc/systemd/logind.conf

2) Reboot the server or restart systemd-logind as follows:
# systemctl daemon-reload
# systemctl restart systemd-logind


Migrating to Oracle Linux 7.2 and higher from Redhat 7.2 and higher resolves this problem.

If migrating to Oracle Linux 7.2 is not possible, please use the above workaround by setting RemoveIPC=no in /etc/systemd/logind.conf

prmscan recover deleted oracle datafiles on windows

prmscan recover deleted oracle datafiles on windows




caution , don't write anything on the disk/partition ,
which stored the deleted oracle datafiles

caution , don't write anything to the disk/partition ,
which stored the deleted oracle datafiles

caution , don't write anything to the disk/partition ,
which stored the deleted oracle datafiles

shutdown your oracle instance first!!

run prmscan , it will scan the disk/partition ,
and merge blocks.

don't put the software and result on the disk/partition 
which stored the deleted oracle datafiles

don't put the software and result on the disk/partition 
which stored the deleted oracle datafiles

install jdk 1.8 and cmder first !!

run cmder as administrator !!

scan disk E partition

java -jar prmscan.jar --scan \.\E:

merge block , make datafiles

java -jar prmscan.jar --outputsh

replace \.\ with \\.\ in the


use prm-dul , dictionary-mode , unload the data !!

select name from v$datafile where ts#=0;


ok! it works !


oracle datafile数据文件被误删除 或者 文件系统被误格式化 /损坏, 均可以采用prmscan软件极大程度上扫描磁盘上残存的数据块,并合并成数据文件,来达到恢复数据的目的。




prm scan 恢复Windows下损坏的文件系统或被从文件系统上删除的oracle数据文件中的数据。











先安装JDK 1.8和cmder !!!
先安装JDK 1.8和cmder !!!
先安装JDK 1.8和cmder !!!

1.扫描磁盘 java -jar prmscan.jar --scan XX

2.碎片合并被删除的文件 java -jar prmscan.jar --outputsh xx

==> 注意不要在被删除文件所在磁盘上作以上操作!!!


cd C:\Users\vc\Desktop\prmscan.0.2.6
java -jar PRMScan.jar --scan \.\E:

\.\E: ==>数据在e盘 就写\.\E:, 在F盘就写 \.\F:

打开生成的sh文件 替换 \.\ 为 \\.\



会生成一个 对应的数据文件

select name from v$datafile where ts#=0;

C:\Users\vc\Desktop\prmscan.0.2.6\PD_DBF6.dbf ==>对应的数据文件







服务热线 : 13764045638   QQ号:47079569    邮箱




其原理是虽然数据被删除了,但在数据文件中其仅仅是被标记为删除,而并未被清空;只要在auto vaccum之前都可以通过pg_filedump工具恢复这部分数据。





create database testa;
\c testa

create table novels (name varchar(200), id int);

insert into novels values('三国演义',1);
insert into novels values('水浒传',2);
insert into novels values('西游记',3);
insert into novels values('红楼梦',4);

select oid from pg_database where datname='testa';

select oid,relfilenode from pg_class where relname='novels';

  oid  | relfilenode 
 17421 |       17421
(1 row)

testa=# delete from novels;

testa=# select * from novels;
 name | id 
(0 rows)

postgres@vultr:~$ ls -l /var/lib/postgresql/10/main/base/17420/17421 
-rw------- 1 postgres postgres 8192 Nov 25 05:11 /var/lib/postgresql/10/main/base/17420/17421

root@vultr:~# chmod 700 pg_filedump_rc

root@vultr:~# ./pg_filedump_rc -D charn,int /var/lib/postgresql/10/main/base/17420/17421

* PostgreSQL File/Block Formatted Dump Utility - Version 11.0
* File: /var/lib/postgresql/10/main/base/17420/17421
* Options used: -D charn,int 
* Dump created on: Mon Nov 25 05:17:40 2019

Block    0 ********************************************************
----- Block Offset: 0x00000000 Offsets: Lower 40 (0x0028) Block: Size 8192 Version 4 Upper 8024 (0x1f58) LSN: logid 0 recoff 0x07db2630 Special 8192 (0x2000) Items: 4 Free Space: 7984 Checksum: 0xd24c Prune XID: 0x0000037b Flags: 0x0000 () Length (including item array): 40 ------ Item 1 -- Length: 44 Offset: 8144 (0x1fd0) Flags: NORMAL COPY: 三国演义 1 Item 2 -- Length: 40 Offset: 8104 (0x1fa8) Flags: NORMAL COPY: 水浒传 2 Item 3 -- Length: 40 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 西游记 3 Item 4 -- Length: 40 Offset: 8024 (0x1f58) Flags: NORMAL COPY: 红楼梦 4 *** End of File Encountered. Last Block Read: 0 *** root@vultr:~# ./pg_filedump_rc -D charn,int /var/lib/postgresql/10/main/base/17420/17421|grep COPY COPY: 三国演义 1 COPY: 水浒传 2 COPY: 西游记 3 COPY: 红楼梦 4

PostgreSQL FATAL: could not read block 0 in file Input/output error错误

FATAL: could not read block 0 in file  Input/output error

ERROR:  could not read block 84 in file "base/16386/670007": Input/output error
ERROR:  could not read block 4707 of relation 1663/16384/16564: Success


如果PG的登陆出现上述错误 则一般说明PG数据文件出现了磁盘故障,导致无法登陆数据库,可以尝试用以下步骤解决


步骤1 设置 zero_damaged_pages 参数 并尝试重建系统索引,zero_damaged_pages加入到postgre配置文件中,并reload或重启pg服务进程:




reindexdb -p 5433 --system dbname


步骤2 设置 ignore_system_indexes=true , 以尝试忽略system index的影响:



pg_ctl -D /data -o '-c ignore_system_indexes=true' start


psql $dbname


步骤3 重建全库 索引

reindex database "dbname";

步骤4 vacuum 全表库

vacuum full analyze verbose;
vacuum full verbose;

步骤5 禁用约束

update pg_class set relchecks=0 where relname='tablename';

步骤六 重建索引

reindex database "dbname";

最后建议用pg_dump 导出全库 后 重建数据库。

如果做了上述步骤还不行, 可以考虑用pg_filedump 抽取所有文件内的数据后再重建数据库。

PostgreSQL checksum


在计算机系统中,checksum 通常用于校验数据在传输或存取过程中是否发生错误。PostgreSQL 从 9.3 开始支持 checksum,以发现数据因磁盘、 I/O 损坏等原因造成的数据异常。本文介绍 PostgreSQL 中 checksum 的使用及其实现原理。



PostgreSQL 从 9.3 开始支持数据页的 checksum,可以在执行 initdb 时指定 -k 或 --data-checksums 参数开启 checksum,但开启 checksum 可能会对系统性能有一定影响,官网描述如下:

Use checksums on data pages to help detect corruption by the I/O system that would otherwise be silent. Enabling checksums may incur a noticeable performance penalty. This option can only be set during initialization, and cannot be changed later. If set, checksums are calculated for all objects, in all databases.

启用 checksum 后,系统会对每个数据页计算 checksum,从存储读取数据时如果检测 checksum 失败,则会发生错误并终止当前正在执行的事务,该功能使得 PostgreSQL 自身拥有了检测 I/O 或硬件错误的能力。

Checksum 引入一个 GUC 参数 ignore_checksum_failure,该参数若设置为 true,checksum 校验失败后不会产生错误,而是给客户端发送一个警告。当然,checksum 失败意味着磁盘上的数据已经损坏,忽略此类错误可能导致数据损坏扩散甚至导致系统奔溃,此时宜尽早修复,因此,若开启 checksum,该参数建议设置为 false




设置 checksum


数据页的 checksum 在从 Buffer pool 刷到存储时才设置,当页面再此读取至 Buffer pool 时进行检测。

PostgreSQL 中 Buffer 刷盘的逻辑集中在 FlushBuffer 中,其中设置 checksum 的逻辑如下:


 * Update page checksum if desired.  Since we have only shared lock on the
 * buffer, other processes might be updating hint bits in it, so we must
 * copy the page to private storage if we do checksumming.
bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);



比较有意思的是,其他进程可能会在只加 content-lock 共享锁的情况下并发修改 page 的 Hint Bits,从而导致 checksum 值发生变化,为确保 page 的内容及其 checksum 保持一致,PostgreSQL 采用了 先复制页,然后计算 checksum 的方式,如下:



 * We allocate the copy space once and use it over on each subsequent
 * call.  The point of palloc'ing here, rather than having a static char
 * array, is first to ensure adequate alignment for the checksumming code
 * and second to avoid wasting space in processes that never call this.
if (pageCopy == NULL)
    pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);

memcpy(pageCopy, (char *) page, BLCKSZ);
((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);



即先将数据页的内容拷贝一份,拷贝的数据自然不会被其他进程修改,然后基于该拷贝页计算并设置 checksum 值。



checksum 算法

数据页的 checksum 算法基于 FNV-1a hash 改造而来,其结果为 32 位无符号整型。由于 PageHeaderData 中 pd_checksum 是 16 位无符号整型,因此将其截取 16 位作为数据页的 checksum 值,如下:


* Save pd_checksum and temporarily set it to zero, so that the checksum
* calculation isn't affected by the old checksum stored on the page.
* Restore it after, because actually updating the checksum is NOT part of
* the API of this function.
save_checksum = cpage->phdr.pd_checksum;
cpage->phdr.pd_checksum = 0;
checksum = pg_checksum_block(cpage);
cpage->phdr.pd_checksum = save_checksum;

/* Mix in the block number to detect transposed pages */
checksum ^= blkno;

* Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
* one. That avoids checksums of zero, which seems like a good idea.
return (checksum % 65535) + 1;


pg_checksum_block 函数计算数据页的 32 位 checksum 值,具体算法可以参考源码,在此不详述。



检测 checksum



PostgreSQL 会在页面从存储读入内存时检测其是否可用,调用函数为 PageIsVerified,该函数不仅会检测正常初始化过的页(non-zero page),还会检测 全零页(all-zero page)

为什么会出现 全零页 呢?
在特定场景下表中可能出现 全零页,比如有进程扩展了一个表,即在该表中添加了一个新页,但在 WAL 日志写入存储之前,进程崩溃了。此时新加的页可能已经在表文件中,下次重启时就会读取到。

对于 non-zero page,检测其 checksum 是否一致以及 page header 信息是否正确,若 checksum 失败,但 header 信息正确,此时会根据 ignore_checksum_failure 值判断验证是否通过;对于 all-zero page,如果为全零,则验证通过。


  • 若读取数据的模式为 RBM_ZERO_ON_ERROR 且 GUC 参数 zero_damaged_pages 为 true,则将该页全部置 0
  • 报错,invalid page


checksum 与 Hint bits



数据页写至存储时,如果写失败,可能会导致破碎的页(torn page),PostgreSQL 通过 full_page_writes 特性解决此类写失败导致数据不可用的问题。

Hint Bits 是数据页中用于标识事务状态的标记位,一般情况下,作为提示位,不是很重要。但如果使用了 checksum,Hint Bits 的变化会导致 checksum 值发生改变。设想如果一个页面发生部分写,恰好把某些 Hint Bits 写错,此页面可能并不影响正常使用,但 checksum 会抛出异常,此时应如何恢复呢?

在 checksum 的实现中,checkpoint 后,如果页面因更新 Hint Bits 第一次被标记为 dirty,需要记录一个 Full Page Image 至 WAL 日志中,以应对以上提到的因 Hint Bits 更新丢失导致 checksum 失败的问题,具体实现可参考 MarkBufferDirtyHint。对于已经是 dirty 的页,更新 Hint Bits 则不需要记录 WAL 日志,因为在 checkpoint 后,第一次将该页标记为 dirty 时已经写入了对应的 Full Page Image

可见,在启用 checksum 的情况下,checkpoint 后页面的第一次修改如果是更新 Hint Bits, 会写 Full Page Image 至 WAL 日志,这会导致 WAL 日志占用更多的存储空间。

关于 PostgreSQL checksum 和 Full Page Image 的关系,可以参考 stackoverflow 上这个问题



查看 checksum

PostgreSQL 10 在 pageinspect 插件中添加了函数 page_checksum() 用来查看 page 的 checksum,当然使用 page_header() 也可以查看 page 的 checksum,如下:


postgres=# SELECT page_checksum(get_raw_page('pg_class', 0), 0);
(1 row)

postgres=# SELECT * FROM page_header(get_raw_page('pg_class', 0));
    lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid
 0/78A1E918 |    17448 |     0 |   200 |   368 |    8192 |     8192 |       4 |         0
(1 row)


Checksum 使 PostgreSQL 具备检测因硬件故障或传输导致数据不一致的能力,一旦发生异常,通常会报错并终止当前事务,用户可以尽早察觉数据异常并予以恢复。当然,开启 checksum 也会引入一些开销,体现在两个方面:

  • 计算数据页的 checksum 会引入一些 CPU 开销,具体开销取决于 checksum 算法的效率
  • checkpoint 后,若因更新 Hint Bits 将页面第一次置为 dirty 会写一条记录 Full Page Image 的 WAL 日志,以用于恢复因更新 Hint Bits 产生的破碎页。

对于数据可用性要求较高的场景,通常建议将 full_page_writes 和 checksum 都打开,前者用于避免写失败导致的数据缺失,后者用于尽早发现因硬件或传输导致数据不一致的场景,一旦发现,可以利用 full_page_writes 和 checksum 记录在 WAL 日志中的 Full Page Image 进行数据恢复。


postgreSQL 坏块与checksum使用验证

 su - postgres
initdb -k -D $PGDATA
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

fixing permissions on existing directory /var/lib/postgresql/10/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/10/main -l logfile start

pg_ctl -D /var/lib/postgresql/10/main -l logfile start
waiting for server to start.... done
server started
psql (10.10 (Ubuntu 10.10-0ubuntu0.18.04.1))
Type "help" for help.



tar -zxvf dellstore2-normal-1.0.tar.gz

createdb mac
psql mac -f dellstore2-normal-1.0/dellstore2-normal-1.0.sql  

postgres@vultr:~$ psql mac
psql (10.10 (Ubuntu 10.10-0ubuntu0.18.04.1))
Type "help" for help.

mac=# \d
                    List of relations
 Schema |           Name           |   Type   |  Owner   
 public | categories               | table    | postgres
 public | categories_category_seq  | sequence | postgres
 public | cust_hist                | table    | postgres
 public | customers                | table    | postgres
 public | customers_customerid_seq | sequence | postgres
 public | inventory                | table    | postgres
 public | orderlines               | table    | postgres
 public | orders                   | table    | postgres
 public | orders_orderid_seq       | sequence | postgres
 public | products                 | table    | postgres
 public | products_prod_id_seq     | sequence | postgres
 public | reorder                  | table    | postgres
(12 rows)

psql mac -c "SELECT relname, relpages, reltuples, relfilenode FROM pg_class
  WHERE relkind = 'r' AND relname NOT LIKE 'pg%'
  ORDER BY relpages DESC
  LIMIT 1;"
  relname  | relpages | reltuples | relfilenode 
 customers |      488 |     20000 |       16478
(1 row)

psql mac -c "SELECT datname, oid FROM pg_database;"

postgres@vultr:~$ psql mac -c "SELECT datname, oid FROM pg_database;"
  datname  |  oid  
 postgres  | 13055
 mac       | 16457
 template1 |     1
 template0 | 13054
(4 rows)


pg_ctl stop -D $PGDATA

waiting for server to shut down.... done

perl $PGDATA/base/16457/16478 5

pg_ctl start -D $PGDATA

psql mac -c "SELECT * FROM customers;"

postgres@vultr:~$ psql mac -c "SELECT * FROM customers;"
2019-11-20 07:11:42.487 UTC [24934] WARNING:  page verification failed, calculated checksum 56287 but expected 28524
WARNING:  page verification failed, calculated checksum 56287 but expected 28524
2019-11-20 07:11:42.487 UTC [24934] ERROR:  invalid page in block 1 of relation base/16457/16478
2019-11-20 07:11:42.487 UTC [24934] STATEMENT:  SELECT * FROM customers;
ERROR:  invalid page in block 1 of relation base/16457/16478

postgres@vultr:~$ psql mac 
psql (10.10 (Ubuntu 10.10-0ubuntu0.18.04.1))
Type "help" for help.

SHOW zero_damaged_pages ;
SHOW ignore_checksum_failure;
SET zero_damaged_pages to on;
set ignore_checksum_failure to on;

postgres=#  SELECT * FROM customers;



SET ignore_checksum_failure to on ;
ALTER SYSTEM SET ignore_checksum_failure = on;
SELECT pg_reload_conf();


沪公网安备 31010802001379号