Killed Sessions Remain For A Long Time

Sometimes when we kill sessions through ‘alter system kill session ‘sid,serial#”, the sessions are marked as ‘KILLED’ and it takes quite a long time to quit or even remain forever as “STATUS=KILLED” and “SERVER=PSEUDO” !

1) some examples of long-time remaining killed sessions


SQL> select SID, SERIAL#, USERNAME, PADDR, LOGON_TIME, STATUS, SERVER, OSUSER
from v$session where status='KILLED';

SID    SERIAL# USERNAME   PADDR            LOGON_TIME          STATUS   SERVER  EVENT
---------- ---------- ---------- ---------------- ------------------- -------- ----------------------------
464      43825 STKPROD    070000020C43CBC0 2011-06-28 05:13:26 KILLED   PSEUDO  SQL*Net message from client
474      44714 STKPROD    070000020C43CBC0 2011-07-07 08:18:09 KILLED   PSEUDO  SQL*Net message from client
490       7644 STKPROD    070000020C43CBC0 2011-06-24 06:48:57 KILLED   PSEUDO  SQL*Net message from client

These sessions were KILLED several days ago!

Maybe pmon is still rolling back transactions and releasing locks of them before they’re eventually removed?

2) check if there are on-going rollback processes against the killed sessions
Run this query:


select USED_UBLK from v$transaction where ses_addr in (select addr from v$session where status='KILLED');

no rows selected.

this is not the case and we cen move to step 3.

However, if we found any rollback segments is still occupied by these killed sessions, we’ll then check out how much rollback is still going to proceed and estimate how much time is still needed.

After 60 secs, run the query again


SQL> select USED_UBLK from v$transaction where ses_addr in (select addr from v$session where status='KILLED');

Time remaining is approximately like used_ublk_later/(used_ublk_before-used_ublk_later) = remaining mins

pmon will wake up every 60 secs to rollback remaining transactions. PMON will only rollback a certain number of transactions for a given connection. This number is determined by the CLEANUP_ROLLBACK_ENTRIES parameter in the INIT.ORA file. By default, this value is 20. This means that it will rollback 20 updates each wake-up period.

For example, if a process goes rogue and generates, say, 1000 uncommitted updates to the database, it will rollback CLEANUP_ROLLBACK_ENTRIES transactions each time its periodic timer expires. At the rate of 20 each time, and a wake-up period of 60 seconds (at 3 seconds, it cleans up the connections latches and at 60 it cleans up the remaining stuff), it could take PMON as long as 50 minutes to rollback all of the entries.

you can force pmon to make the cleanup immediately by oradebug.


select pid from v$process p, v$bgprocess b
 where b.paddr = p.addr
 and name='PMON';

PID
------
5

now use this pid to woke pmon immediately.


connect as sysdba
SQL> oradebug setmypid
SQL> oradebug wakeup 5

3) at this point, we know that no more transaction to rollback, why hasn’t pmon  finished cleanup and removed the killed sessions?

As explained in metalink note 1020720.102. the simplest (and probably most common) reason the session stays around is because the process is still around. The reason the process is still around is because it is waiting on “SQLNet message from client”. If it does ever get a message, it will then respond with an ORA-28 “Your session has been killed” error number. At that point the session should go away. The dedicated server process may remain alive until the client disconnects or exits.

PMON may take ownership of the session while it is cleaning up any resources held by the session at the time it was killed. If it cannot clean everything up immediately it will leave the session under the PSEUDO process while performing other tasks.

By finding the spid you can then force the process to be killed.

When issuing the ‘kill’ command be sure that you kill “DEDICATED SERVER PROCESSES”, those called:

oracle<SID> (local=NO)

where <SID> is the ORACLE_SID.

Be sure you do not kill processes such as:

ora_d000_<SID>
ora_s000_<SID>
ora_pmon_<SID>

3.1) find the process spid of the killed sessions and then kill the server processes on OS

as mentioned in bug 5453737, after killing a session, paddr is changed only in v$session, it’s not possible to find process spid by join v$process and v$session any more.

Solution:


SQL> select spid, program from v$process
where program!= 'PSEUDO'
and addr not in (select paddr from v$session)
and addr not in (select paddr from v$bgprocess)
and addr not in (select paddr from v$shared_server);

SPID         PROGRAM
------------ -------------------------------------------
1974480      oracle@aixprd02 (D000)
1405124      oracle@aixprd02
782484       oracle@aixprd02
1069188      oracle@aixprd02

$> ps -ef |egrep "1974480|1405124|782484|1069188" |grep -v grep
glporacl  782484       1   0   Jun 24      -  2:45 oracleGOLDPROD (LOCAL=NO)
glporacl 1069188       1   0   Jul 07      -  1:30 oracleGOLDPROD (LOCAL=NO)
glporacl 1405124       1   0   Jun 28      -  0:39 oracleGOLDPROD (LOCAL=NO)
glporacl 1974480       1   0   Apr 06      -  0:11 ora_d000_GOLDPROD

As we noticed, the startup time of OS processes (LOCAL=NO) with pid 782484,1069188 and 1405124 match logon_time of the killed sessions

-- processes
glporacl  782484       1   0   Jun 24      -  2:45
glporacl 1069188       1   0   Jul 07      -  1:30
glporacl 1405124       1   0   Jun 28      -  0:39

-- sessions
SID    SERIAL# USERNAME   PADDR            LOGON_TIME          STATUS
--- ---------- ---------- ---------------- ------------------- -------
464      43825 STKPROD    070000020C43CBC0 2011-06-28 05:13:26 KILLED
474      44714 STKPROD    070000020C43CBC0 2011-07-07 08:18:09 KILLED
490       7644 STKPROD    070000020C43CBC0 2011-06-24 06:48:57 KILLED

Since the dedicated server processes 782484,1069188 and 1405124 have not received any messages from client processes and are still waiting for “SQL*Net message from client”, we can safely kill the OS processes

4) kill dedicated server processes

$> kill -9 782484
$> kill -9 1069188
$> kill -9 1405124

After kill OS processes, we find that pmon finally removed the killed sessions

SQL> select SID, SERIAL#, USERNAME, PADDR, LOGON_TIME, STATUS, SERVER, EVENT
     from v$session where status='KILLED'

no rows selected.

Summary

When an Oracle session is killed, its dedicated server process attempts to inform client process “your session has been killed” and then pmon will remove sessions as KILLED from database after it rolls backup transaction and released locks.

However, if the client process was already terminated abnormally (e.g coredump or killed by Ctl-C), the dedicated server process keeps waiting on “SQL*Net message from client”, it has no chance to receive any message from client, then the session marked as KILLED remains until next reboot of instance.

Solution:

in this case, kill the dedicated server processes of the KILLED sessions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: