Insert duplicated values into table with primary keyeth

Problem: a single duplicated record causes millions of uncommitted records to rollback

One customer regularly loads large number of rows using INSERT .. SELECT into a table with a primary key. However, if any of records that contains a duplicated value on primary key column, the whole INSERT will be rolled back, as shown in an example below, which sometimes means millions of records. It’s definitely not a desired behavior.

create a source table


create table source_t as select rownum rec_id, 'string_' || to_char(rownum) rec_text from dual connect by level < 100;

insert a duplicated row into it


insert into source_t select * from source_t where rec_id = 99;
commit;

create a target table with a primary key on column ‘rec_id’


create table target_t as select * from source_t where 1=0;

create unique index pk_target_id on target_t(rec_id);

alter table target_t add constraint pk_target_id primary key(rec_id)  using index;

The insert fails because of the duplicated row

insert into target_t select * from source_t;
insert into target_t select * from source_t
*
ERROR at line 1:
ORA-00001: unique constraint (PK_TARGET_ID) violated

We can avoid the whole insert procedure rolling back caused by a few duplicated rows in 2 manners.

Method 1) using INSERT ALL .. WHEN THEN .. ELSE .. SELECT in place of INSERT .. SELECT

The INSERT ALL statement is available since 9i and was an commonly ignored feature. It can be used to conditionally insert rows into multiple tables based on selected values. In this case, we can insert all rows that comply with primary key constraint into the target table, while recording all duplicated rows into an exception table which will then be dealt with sequentially. I demonstrate how it works.

create an duplication table containing only the duplicated rows

create table dupilcation_t as select s.*, sysdate timestamp from source_t s where 1=0;

Currently the source table has 98 unique rec_ids and 2 duplicated rec_ids

select count(rec_id), cnt
  from (select rec_id, count(*) over (partition by rec_id order by rec_id) cnt from source_t)
group by cnt;

COUNT(REC_ID)        CNT
------------- ----------
           98          1
            2          2

Let’s insert the 98 unique rec_ids into target_t and record the 2 duplicated rec_ids into duplication_t

insert all
 when cnt =1 then into target_t values (rec_id, rec_text)
  else             into dupilcation_t values (rec_id, rec_text, sysdate)
select rec_id, rec_text, count(*) over (partition by rec_id order by rec_id) cnt
  from source_t;

100 rows created.

select rec_id, rec_text from duplication_t;

   REC_ID REC_TEXT
--------- ---------------------
       99 string_99
       99 string_99

Method 2: using exceptions table

create an exceptions table

SQL> @?/rdbms/admin/utlexpt1.sql
Table created.

If the exceptions table has already been created, make a cleanup of historical records

truncate table exceptions;

Temporarily disable the primary key constraint

alter table target_t disable constraint pk_target_id;
drop index pk_target_id;

insert into target_t with duplicated rows

insert into target_t select * from source_t;
100 rows created.

commit;

get duplicated rows

create or replace procedure get_dups
as
begin
    execute immediate 'alter table target_t enable constraint pk_target_id exceptions into exceptions';
exception
    when others then
        for x in (select rowid r from exceptions )
        loop
            dbms_output.put_line( 'rid = ' || x.r );
        end loop;
end;
/

exec get_dups
rid = AAATSDAAGAAAASUAAA
rid = AAATSDAAGAAAASUAAB

delete one of the duplicated rows


select * from TARGET_T where rowid in (CHARTOROWID('AAATSFAAFAAAAp8ABj'), CHARTOROWID('AAATSFAAFAAAAp8ABi'));

    REC_ID REC_TEXT
---------- -----------------------------------------------
        99 string_99
        99 string_99

delete from target_t
 where rowid in (select rid
                   from ( select t.rowid rid, row_number() over (partition by t.rec_id order by t.rowid) rn
                            from target_t t, exceptions e
                           where t.rowid = e.row_id                  
                         )  
                  where rn <> 1                 
                ); 

1 row deleted.

commit;
select rec_id, count(*) from target_t group by rec_id having count(*) <> 1;

no rows selected

alter table target_t enable constraint pk_target_id ;

Table altered.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: