07年博客迁移:datafile block extract lab

昨天在家里的Linux服务器上，尝试用C 写一个抽取data block 的例子，用到的system_call() 简单得很就是标准的文件读写. 块头的读取比较顺利，block_type种类大多我不熟悉，那是应用见的少了，只见过表，索引；IOT,cluster则从没见过。不过其实国内用这类高级特性的恐怕也是极少，加了一个Oracle 的mail-list,看外国人对DB层的研究确实不懈的，而且人家只要这技术有优势就有本事和胆量拿来用，这份精神实在不殆。块头之后是事务槽，同行字典一样多少不定，是以行数据是倒过来存储的,即由尾而头，这番道理估计现下的DB上都是一般的。然而Oracle 之所以精妙与这事务槽同回滚段实现的读一致大有关系，然而反过来说Oracle本身也是背了一个极大的包袱在行走能有如今的效用真是不易之极,无怪乎 latch之类要用到汇编指令,但这又加大了改换平台的难度. 行字典中最末是每行的绝对距离. offsets=sizeof(head)+phead->itc*ITL_SIZE 相对地址为: pri[j]+offsets 完成之后,抽取多行却格式总是层次不齐,前前后后迂回了几个小时,最后蓦然回首发现自己参考的格式居然是9i的,怪不得读了读取行总是不工整. 回过头来说oracle的文件格式在今天来说基本是一点悬念也没有了,然而对于shared_pool的管理理论,sql的机器optimizer,以及架构等等都无愧为龙头老大,说要超越确实千难万难,何况即便超越了,其势本身极大要,谗食也不容易. datablock的格式,都是前辈高人一个字节一个字节试出来的,其志诚嘉.

Offset

00014000

type

frmt

spare1/2_kcbh

rdba

scn

seq

flg

1 : 20 bytes
type:	0x06=trans data defined in kcb.h
frmt:	8i~9i 都是0x02 10.1.0 2k: 0x62 4k:0x82 8k:0xa2 16k:0xc2 (logfile 0x22 512 bytes)
spare1/2_kcbh:	ub1 spare1_kcbh this field is no longer used (old inc#, now always 0) ub1 spare2_kcbh this field is no longer used (old ts#, now always 0)
rdba:	0x0140000a 转换成2进制后它的前10 bit 表示file id 后22 bit 表示的block id 可以看出一个tablespace 可以有1023 个datafile ,每个datafile可以有4M 的block 10G 出现的 big datafile 这里表示的就是block id了没有file id 9.2.0试验过一个tablespace可以有1023个datafile 一个object可以存放在1023个datafile中
scn:	scn: 0x0000.0043890e
seq:	A sequence number incremented for each change to a block at the same SCN A new SCN is allocated if the sequence number wraps. 同一个SCN影响这个block中的行数大于 254 行就会为这个事务分配一个新的SCN 如下面的操作就可能引起同一个SCN但影响的同一个block 中的行超过254行 "delete from table_name" 影响的行数(最大254) 是用从 0x01 到 0xfe 表示的当这个byte 的数据为 0xff 的时候标志这个 block 坏调了---> ora-01578 Sequence number: SEQ -> 0 /* non-logged changes - do not advance seq# / SEQ -> (UB1MAXVAL-1)/ maximum possible sequence number / SEQ -> (UB1MAXVAL) / seq# to indicate a block is corrupt,equal to FF. soft corrupt*/ 0xff : When present it indicates that the block has been marked as corrupt by Oracle. either by the db_block_checking functionality or the equivalent events (10210 for data blocks, 10211 for index blocks, and 10212 for cluster blocks) when making a database change, or by the DBMS_REPAIR.FIX_CORRUPT_BLOCKS procedure, or by PMON after an unsuccessful online block recovery attempt while recovering a failed process, or by RMAN during a BACKUP, COPY or VALIDATE command with the CHECK LOGICAL option. Logical corruptions are normally due to either recovery through a NOLOGGING operation, or an Oracle software bug.
flg:	as defined in kcbh.h #define KCBHFNEW 0x01 /* new block - zeroed data area / #define KCBHFDLC 0x02 / Delayed Logging Change advance SCN/seq / #define KCBHFCKV 0x04 / ChecK Value saved-block xor's to zero / #define KCBHFTMP 0x08 / Temporary block / 这是一个可以组合的值也就是说有为 6 的时候是 2,4 两种情况的组合 Block structure as defined in kcbh.h: struct kcbh {ub1 type_kcbh; / Block type* / ub1 frmt_kcbh; /* #define KCBH_FRMT8 2 / ub1 spare1_kcbh; ub1 spare2_kcbh; krdba rdba_kcbh; / relative DBA / ub4 bas_kcbh; /* base of SCN / ub2 wrp_kcbh; / wrap of SCN / ub1 seq_kcbh; / sequence # of changes at same scn */ ub1 flg_kcbh; ub2 chkval_kcbh; };

00014010

chkval

spare3_kcbh

typ

seg/obj

csc

spare3_kcbh :	ub2 spare3_kcbh
2 : 24 bytes (总计44bytes)
typ :	1 - DATA 2 index 改成3了在10.1.0 上引起了ora-600[2032]然后ORA-27101: shared memory realm does not exist oracle进行查询的时候是根据 obj$表中的情况来判断对象的类型的,不是根据这个typ 也就是说如果有一个表但改变表中block的这个标志位，一样可以查询出数据来, 但dump block 时会出错,ORA-00600: 内部错误代码，自变量: [4555], [0], [], [], [], [], [], [] 错误中的 [0] 就是typ对应的数据在10G中改变它后update这个block的数据commit可以但rollback的报错
?	见过有其他值但用编辑器改这个值在 dump 文件中显示不出来变化
seg/obj:	0xd254
csc :	0x00.43890a The SCN at which the last full cleanout was performed on the block

00014020

csc

itc

flg

fsl

fnx

xid

*3 : 24 bytes itl (2个itl总计92bytes)**
?	见过有其他值但用编辑器改这个值在 dump 文件中显示不出来变化
itc	ITL 条目的个数 max 255超过会报ORA-02207 ORA-00060 ORA-00054 可能是没空间分配itl条目了或它的争用引起的在8i中 INITRANS default为1 , 9.2.0中 INITRANS default为2
flg	indicates that the block is on a freelist. Otherwise the flag is - 9i 的ASSM 的情况下这个值为 E ixora 上说他占用 2 bytes 但我下面的试验和他的结果有一定的出入我观察到的情况是 : Object id on Block? Y flg: O ver: 0x01 上面的3项是用同一个 byte 来表示的 flg: O ver: 0x01 Object id on Block? Y 从我的观察中 dump 出来的文件中 flg ver Object id on Block 他们共同占用的这个一个字节他的规律可以从下面的情况看出 2进制数据 flg ver Object id on Block? 0x00 - 0x00 N 0x01 0 0x00 N 0x02 - 0x01 Y 0x03 0 0x01 Y 0x04 - 0x02 Y 0x05 0 0x02 Y 0x06 - 0x03 Y 0x07 0 0x03 Y 0x08 - 0x04 N 0x09 0 0x04 N 0x0a - 0x05 Y 0x0b 0 0x05 Y 0x0c - 0x06 Y 0x0d 0 0x06 Y 0x0e - 0x07 Y 0x0f 0 0x07 Y 0x10 ... 类似上面的循环了这种情况在9i上已经改变因为ASSM的出现
fsl :	Index to the first slot on the ITL freelist. ITL TX freelist slot
fnx :	自由列表中下一块的地址 Null if this block is not on a freelist 有数据例如: fnx: 0x1000029

00014030

xid

uba

Lck Flag

Scn/Fsc

xid :	Transaction ID (UndoSeg.Slot.Wrap) 值可以用select XIDUSN, XIDSLOT,XIDSQN from v$transaction;查到 This is comprised of the rollback segment number (2 bytes), the slot number in the transaction table of that rollback segment (2 bytes), and the number of times use of that transaction table has wrapped (4 bytes).
uba :	Undo address (UndoDBA.SeqNo.RecordNo) The location of the undo for the most recent change to this block by this transaction. This is comprised of the DBA of the rollback segment block (4 bytes), the sequence number (2 bytes), and the record number for the change in that undo block (1 byte), plus 1 unused byte.
Lck Flag:	Lck 锁定的row数这里还用到了下一个 byte 的数据 2 对应的二进制表示为 0010 正好和dump文件中的 --U- 吻合 flag 1 nibble C = Committed; U = Commit Upper Bound; T = Active at CSC; B = Rollback of this UBA gives before image of the ITL. ---- = transaction is active, or committed pending cleanout C--- = transaction has been committed and locks cleaned out -B-- = this undo record contains the undo for this ITL entry --U- = transaction committed (maybe long ago); SCN is an upper bound ---T = transaction was still active at block cleanout SCN Lck 3 nibbles The number of row-level locks held in the block by this transaction.
Scn/Fsc :	If the transaction has been cleaned out, this is the commit SCN or an upper bound thereof. Otherwise the leading two bytes contain the free space credit for the transaction - that is, the number of bytes freed in the block by the transaction Scn = SCN of commited TX; Fsc = Free space credit (bytes)

00014040

Scn/Fsc

第2条itl 这里没使用

00014050

第2条itl 这里没使用

flag

ntab

nrow

4 : 14 bytes 从这个flag位置开始是data区也是下面的行的offset的起始地址
flag :	N=pctfree hit(clusters), F=don't put on free list K=flushable cluster keys. 当然还有别的标记: A ...
ntab :	这block中有几个table的数据 cluster这个就可能大于1
nrow :	block 有多少行数据

00014060

frre

fsbo

fseo

avsp

tosp

offs

nrow

row offs

frre :	First free row index entry. -1=you have to add one.
fsbo :	Free Space Begin offset 出去row dict 后面的可以放数据的空间的起始位置也可以看成是从这个区域的开始"flag"到最后一个 "row offs"占用的空间
fseo :	Free Space End offset ( 9.2.0 )参与db_block_checking的计算剩余空间 select 的时候oracle不是简单的根据offset定位row.这个值也是参与了定位row的
avsp :	Available space in the block (pctfree and pctused) ORA-01578
tosp :	Total available space when all TXs commit ( 9.2.0 )参与db_block_checking
offs :	偏移量用 cluster 的时候可以看出值
nrow :	这个table有多少行数据
row offs :	这行数据相对的起始位置 after delete & commit is 0xffff

00015FF0

length

data

block tail

5 : 用户数据
6 : 4 bytes block tail
fb :	K = Cluster Key (Flags may change meaning if this is set to show HASH cluster) C = Cluster table member H = Head piece of row D = Deleted row F = First data piece L = Last data piece P = First column continues from previous piece N = Last column continues in next piece
lb :	和上面的 ITL 的lck相对应表示这行是否被 lock 了
cc :	有几列数据这里只能表示255列超过了就会有链接行
length :	这列的数据的长度是多少 0xfa ( 250 bytes ) 其实0xfb,0xfc,0xfd 也同样是250bytes 0xfe fb 00 ( 0xfb 00 表示的251 bytes 0xfe表示row的长度超过了250 bytes) 0xff 表示number 的 null 这也是oracle中null的表现形式排序的时候null最大了字段的数据超过250字节是就用3bytes来表示字段的长度,因为如果是long类型它的字段再长它在这个block中的数据的长度不会超过64K 所以最长用3bytes来表示行的长度已经够了.再长就链接行了
data :	'a'
block tail :	改这 block 最后的4 bytes 数据中的任意肯定ora-1578 第 1 byte : 对应开始的 seq 第 2 byte : 对应开始的 type 第3,4byte : 对应开始的scn的末2为 control file 这里是control seq

10.1.0~lgone@ONE.LG.OK> create table a(v varchar2(4000)) TABLESPACE t;

Table created.

10.1.0~lgone@ONE.LG.OK> insert into a values('a');

1 row created.

Start dump data blocks tsn: 17 file#: 5 minblk 10 maxblk 10
buffer tsn: 17 rdba: 0x0140000a (5/10)
   //// buffer tsn:
         数据文件对应的 tablespace 的 number   这只是dump文件中记录的数据而已
         block 中是没有记录 tablespace 的 number 的 

scn: 0x0000.0043890e seq: 0x05 flg: 0x02 tail: 0x890e0605
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0140000a
 Object id on Block? Y
 seg/obj: 0xd254  csc: 0x00.43890a  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.00c.00001850  0x00801496.07b9.01  --U-    1  fsc 0x0000.0043890e
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

data_block_dump,data header at 0x87e125c
   ////  data_block_dump,data header at 0x87e125c
         其实这个block不是直接从 data buffer 中 dump 出来的这个表示真正dump时 block 的数据区的起始位置
         也就是下面这部分开始的位置
         
===============        ////  tsiz:    hsiz:   pbl:   bdba: 在数据文件都是没有存储的 
tsiz: 0x1fa0           //// Total data area size
                     8k的block: 8192-20(block head)-24(Transaction Header)-24*2(一个事务条)-4(block tail)=8096(0x1fa0)
hsiz: 0x14             //// Data header size  数据块头20个字节+数据块尾4个字节=24字节(0x14)
pbl: 0x087e125c        //// Pointer to buffer holding the block
bdba: 0x0140000a
     76543210

flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f9b
avsp=0x1f83
tosp=0x1f83
0xe:pti[0]  nrow=1  offs=0
0x12:pri[0] offs=0x1f9b
block_row_dump:
tab 0, row 0, @0x1f9b
tl: 5 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 1]  61
end_of_block_dump
End dump data blocks tsn: 17 file#: 5 minblk 10 maxblk 10

block 坏掉了还可以报:
    ORA-600 (4519) Cache layer block type is incorrect
    ORA-600 (4393) Check for Type for Segment header with free list
    ORA-600 (4136) Check Rollback segment block
    ORA-600 (4154) Check Rollback segment block

    Ora-600[kcbzpb_1],[d],[kind],[chk] gets signaled when the block got corrupted in memory.
    The only way it should be bad is if a stray store into memory destroyed the header or tail.
    d = blocknumber, kind= kind of corruption detected,chk = checksum flag

    ora-600[3398] and ora-600[3339]
    ora-600[3398] is not in oracle 8.
    ora-600[3398] means it failed a verification check before writing back to disk,  so it must
        be an in-memory corruption.
    ora-600[3339] comes with ora-1578 and means either disk corruption or in memory corruption after read.
    ora-600 [3339] has been removed from 7.2+
    From 7.2+  ora-600 [3398] has become ora-600 [3374] with some checks added.

2进制存储格式
               ALTER SESSION SET EVENTS '10289 trace name context forever, level 1';
               ALTER SESSION SET EVENTS '10289 trace name context off';