添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
首页 » ORACLE 9i-23c , 系统相关 » Troubleshooting errors caused by OS resource limit on AIX,HP-UX, SolarisOS, Linux

Troubleshooting errors caused by OS resource limit on AIX,HP-UX, SolarisOS, Linux

  • 2021/01/18
  • ORACLE 9i-23c , 系统相关
  • 300 views
  • Troubleshooting errors caused by OS resource limit on AIX,HP-UX, SolarisOS, Linux 已关闭评论
  • 操作系统资源限制有时会导致上面的应用程序无法fock新进程或open 文件,导致连接创建失败或实例crash, 尤其当数据库的进程数搞的很大时,开始的OS kernel resource limit没有级联的修改,就有可能导致该问题的发生。 有时Linux操作系统命令会提示: fork: retry: No child processes , 数据库创建新的连接时提示ORA-12518 or 12537。

    CASE 1, HP-UX 11.31 通过监听创建新连接报错

    TNS-12518: TNS:listener could not hand off client connection
    TNS-12536: TNS:operation would block
    TNS-12560: TNS:protocol adapter error
    TNS-00506: Operation would block
    HPUX Error: 246: Operation would block
    

    以上可看是遇到OS 进程上限,需要检查uproc和maxuproc, ORACLE环境建议值

    NODE1 Oracle推荐值 aio_max_ops >= 2048 executable_stack filecache_min filecache_max ksi_alloc_max 131072 >= nproc*8 max_async_ports 16384 >= nproc max_thread_proc >= 1024 maxdsiz 1073741824 >= 1073741824 maxdsiz_64bit 137438953472 >= 2147483648 maxssiz 134217728 >= 134217728 maxssiz_64bit 2147483648 >= 1073741824 maxuprc 20000 >= ((nproc*9)/10)+1 msgmni 16384 >= nproc msgtql 16384 >= nproc ncsize 134144 8*nproc+3072 nflocks 16384 >= nproc
    oracle@anbob:/home/oracle> kcusage                                                                                                                                                                                     
    Tunable                 Usage / Setting      
    =============================================
    filecache_max     33312395264 / 39194697728
    maxdsiz             352256000 / 1073741824
    maxdsiz_64bit       239075328 / 137438953472
    maxfiles_lim            23564 / 65535
    maxssiz                131072 / 134217728
    maxssiz_64bit         2097152 / 2147483648
    maxtsiz              13484032 / 100663296
    maxtsiz_64bit       771751936 / 1073741824
    maxuprc                 15551 / 16384
    max_thread_proc           385 / 1200
    msgmbs                      0 / 8
    msgmni                      2 / 16384
    msgtql                      0 / 16384
    nflocks                   101 / 16384
    ninode                  10403 / 1157120
    nkthread                18576 / 28688
    nproc                   16382 / 21000
    npty                        2 / 60
    nstrpty                    12 / 60
    nstrtel                     0 / 60
    nswapdev                    2 / 32
    nswapfs                     0 / 32
    semmni                    116 / 1024
    semmns                  22095 / 307200
    shmmax           161061273600 / 274877906944
    shmmni                     39 / 4096
    shmseg                      4 / 512
    

    CASE2, AIX 平台的应用运行时报错

    ORA-04030:  (TCHK^9d12ad4,eavp:kkestRCHistgrm)
    

    call stack 中包含“kghnospc“=》 kernel generic heap manager no space available in the heap, signal an error
    dump trace

    =======================================
    PRIVATE MEMORY SUMMARY FOR THIS PROCESS
    ---------------------------------------
    ******************************************************
    PRIVATE HEAP SUMMARY DUMP
    111 MB total:   #进程使用PGA 111MB
       111 MB commented, 605 KB permanent
        47 KB free (0 KB in empty extents),
         103 MB,   1 heap:    "session heap   "
    ------------------------------------------------------
    Summary of subheaps at depth 1
    110 MB total:
        35 MB commented, 109 KB permanent
        75 MB free (30 MB in empty extents),
          45 MB,   1 heap:    "kolr heap ds i "            44 MB free held
          28 MB,   3 heaps:   "koh dur heap d "            1056 KB free held
    -------------------------
    Top 10 processes:
    -------------------------
    (percentage is of 1697 MB total allocated memory)
     7% pid 201: 111 MB used of 112 MB allocated  # CURRENT PROC 当前进程使用最高,111MB
     4% pid 204: 56 MB used of 63 MB allocated (5696 KB freeable)
     4% pid 13: 60 MB used of 62 MB allocated
     4% pid 14: 59 MB used of 62 MB allocated
     3% pid 202: 53 MB used of 59 MB allocated (6016 KB freeable)
     3% pid 200: 52 MB used of 58 MB allocated (5824 KB freeable)
     3% pid 40: 52 MB used of 56 MB allocated (832 KB freeable)
     3% pid 37: 41 MB used of 55 MB allocated
     3% pid 12: 50 MB used of 52 MB allocated
     3% pid 173: 42 MB used of 49 MB allocated (5888 KB freeable)
    ================
    SWAP INFORMATION
    ----------------
    swap info: free_mem = 22096.49M rsv = 192.00M
               alloc = 112.52M avail = 49152.00M swap_free = 49039.48M
    ----- End of Customized Incident Dump(s) -----
    

    ITpub案例类似, 需要检查PGA, _pga_max_size 和_smm_max_size和OS $ ulimit -a限制和当前进程的Limit限制

    AIX 可以使用dbx查看当前进程的limit, 可以选LOCAL=NO server进程或LISTENR进程

    # dbx -a [pid]
    Type 'help' for help.
    reading symbolic information ...
    stopped in read at 0x90000000003c260 ($t1)
    0x90000000003c260 (read+0x260) e8410028             ld   r2,0x28(r1)
    (dbx) proc rlimit
    rlimit name:          rlimit_cur               rlimit_max       (units)
     RLIMIT_CPU:         (unlimited)             (unlimited)        sec
     RLIMIT_FSIZE:       (unlimited)             (unlimited)        bytes
     RLIMIT_DATA:          134217728             (unlimited)        bytes  
     RLIMIT_STACK:          33554432              4294967296        bytes
     RLIMIT_CORE:        (unlimited)             (unlimited)        bytes
     RLIMIT_RSS:            33554432             (unlimited)        bytes
     RLIMIT_AS:          (unlimited)             (unlimited)        bytes
     RLIMIT_NOFILE:           100000             (unlimited)        descriptors
     RLIMIT_THREADS:     (unlimited)             (unlimited)        per process
     RLIMIT_NPROC:       (unlimited)             (unlimited)        per user
    (dbx) 
    

    Note:
    in the dbx rlimit output, the RLIMIT_CUR is the soft limit and the RLIMIT_MAX is the hard limit. RLIMIT_CUR is the limit that is actually enforced, so the problem may persist if RLIMIT_CUR is not unlimited, even though RLIMIT_MAX may be unlimited. In this case, the instance may need to be restarted in order for RLIMIT_CUR to take on the new value.

    如果不通过监听的进程则不存在该限制(继承oracle user limit), 原因是因为通过监听创建的进程依赖监听的limit配置,监听又依赖于启动监听的用户limit, 如是LISTNEER是OHASD CRS启动那继承的root, 如果是grid手工启需要检查grid limit, 当然也存在在调整了OS limit后,进程没有重启识别不到已改变的limit.

    case 3, Solaris SunOS swap 不足出现的ora-4030

    查看进程的Limit可以使用plimit

    # plimit [PID]
    按swap排序 
    $ awk '/^zzz/{t=$5;next}/^\s*[0-9]/{print t,$4,$5}' xxxxxx_vmstat_16.10.31.1500.dat | sort -k2,2rn
    How to Configure Swap Space (Doc ID 286388.1) 建议swap space=75%* OS memory
    How does the Solaris Operating System Calculate Available Swap? (Doc ID 1010585.1)
    When a process calls the malloc()/sbrk() commands, only virtual swap is allocated.
    The operating system allocates the memory from physical disk-based swap first.
    If disk-based swap is exhausted or unconfigured, the reservation is allocated from physical memory.
    If both resources are exhausted then the malloc() call fails.
    To ensure malloc() won't fail due to lack of virtual swap, configure a large physical disk-based swap
    facility in the form of a device or swapfile.  You can monitor swap reservation via "swap -s" and "vmstat:swap",
    as described above.
    Follow the guidelines below to calculate amount of virtual swap usage:
    Virtual swap = Physical Memory + Fixed Disk swap
    

    CASE 4, linux 平台,在安装如OEM agent平台,进程不足
    在linux 平台查看当前进程limit的方法比较多,如查看proc系统中进程限制

    $ cat /proc/PID/limits
    

    Nproc在操作系统级别定义,以限制每个用户的进程数。Oracle 11.2.0.4文档建议以下内容:

    oracle soft nproc 2047
    oracle hard nproc 16384
    

    如果有运行oem agent这可能有点低, 您是否要检查自己是否超出限制?那么您可以使用“ ps”。但是请注意,默认情况下,“ ps”不会显示所有进程。在Linux中,执行多线程处理时,每个线程都实现为轻量级进程(LWP)。并且您必须使用“ -L”来查看所有这些内容。如以用户分组

    $ ps h -Led -o user | sort | uniq -c | sort -n
    

    如果不使用”-L” 还可以使用”ps -o nlwp,pid,lwp,args -u oracle | sort -n” 如有些环境Oracle 12c EM agent已启动可以启动1000多个个线程,当您达到nproc限制时,用户将无法创建新进程。clone()调用将返回EAGAIN,Oracle将其报告为:
    ORA-27300: OS system dependent operation:fork failed with status: 11
    ORA-27301: OS failure message: Resource temporarily unavailable

    下面模拟一段Franck有段简短的forc进程的代码略改动一下,测试

    [root@anbob ~]# ps h -Led -o user | sort | uniq -c | sort -n
          1 chrony
          1 dbus
          1 oracle
          1 rpc
          7 polkitd
        133 root
    [oracle@oel7db1 ~]$ ulimit -u 500
    [oracle@oel7db1 ~]$ cat fockp.c
    #include
    #include <sys/resource.h>
    #include 
    int main( int argc, char *argv[] )
            int i;
            int p[3000];
            // get nproc limit
            struct rlimit rl;
            if ( getrlimit( RLIMIT_NPROC , &rl) != 0 ) {
                printf("getrlimit() failed with errno=%d\n", errno);
                    return 255;
            // fork 3000 times
            for( i=1 ; i<= 3000 ; i++ ) { p[i] = fork(); if ( p[i] >= 0 ) {
                            if (  p[i] == 0 ) {
                                    printf("parent says fork number %d sucessful \n" , i );
                            } else {
                                    printf(" child says fork number %d pid %d \n" , i , p[i] );
                                    sleep(100);
                                    break;
                    } else {
                            printf("parent says fork number %d failed (nproc: soft=%d hard=%d) with errno=%d\n", i, rl.rlim_cur , rl.rlim_max , errno);
                            return 255;
    [oracle@anbob ~]$ ./fockp
     child says fork number 1 pid 2442
    parent says fork number 1 sucessful
     child says fork number 2 pid 2443
    parent says fork number 2 sucessful
     child says fork number 3 pid 2444
    parent says fork number 3 sucessful
     child says fork number 4 pid 2445
    parent says fork number 4 sucessful
    parent says fork number 497 sucessful
     child says fork number 498 pid 2941
    parent says fork number 498 sucessful
    parent says fork number 499 failed (nproc: soft=500 hard=500) with errno=11
    使用root查看
    [root@anbob ~]# ps h -Led -o user | sort | uniq -c | sort -n
          1 chrony
          1 dbus
          1 rpc
          7 polkitd
        133 root
        500 oracle
    

    Linux平台限制在早期/etc/limits.conf中设置并用’ulimit -u’检查,但是根据RHEL官方文档,在5-8修改参数是修改/etc/security/limits.conf。

    How to set ulimit values
    Environment
    Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8
    Issue
    How to set ulimit values
    Resolution
    Settings in /etc/security/limits.conf take the following form:
    # vi /etc/security/limits.conf
    *               -       core             
    *               -       data             
    *               -       priority         
    *               -       fsize            
    *               soft    sigpending        eg:57344
    *               hard    sigpending        eg:57444
    *               -       memlock          
    *               -       nofile            eg:1024
    *               -       msgqueue          eg:819200
    *               -       locks            
    *               soft    core             
    *               hard    nofile           
    @        hard    nproc            
              soft    nproc            
    %        hard    nproc            
              hard    nproc            
    @        -       maxlogins        
              hard    cpu              
              soft    cpu              
              hard    locks            
     can be:
    a user name
    a group name, with @group syntax
    the wildcard *, for default entry
    the wildcard %, can be also used with %group syntax, for maxlogin limit
     can have two values:
    soft for enforcing the soft limits
    hard for enforcing hard limits
     can be one of the following:
    core - limits the core file size (KB)
    data - max data size (KB)
    fsize - maximum filesize (KB)
    memlock - max locked-in-memory address space (KB)
    nofile - max number of open files
    rss - max resident set size (KB)
    stack - max stack size (KB)
    cpu - max CPU time (MIN)
    nproc - max number of processes (see note below)
    as - address space limit (KB)
    maxlogins - max number of logins for this user
    maxsyslogins - max number of logins on the system
    priority - the priority to run user process with
    locks - max number of file locks the user can hold
    sigpending - max number of pending signals
    msgqueue - max memory used by POSIX message queues (bytes)
    nice - max nice priority allowed to raise to values: [-20, 19]
    rtprio - max realtime priority
    Exit and re-login from the terminal for the change to take effect.
    

    文档中Setting nproc in /etc/security/limits.conf has no effect in Red Hat Enterprise Linux. 配置nproc不启作用,
    Resolution
    Add the desired entry in /etc/security/limits.d/90-nproc.conf instead of /etc/security/limits.conf.
    Root Cause
    For limits, the PAM stack is moving to a modular configuration. This includes the introduction of /etc/security/limits.d/90-nproc.conf, which sets the maximum number of processes to 1024 for non-root users. This was done in part to prevent fork-bombs.

    After reading /etc/security/limits.conf, individual files from the /etc/security/limits.d/ directory are read. Only files with *.conf extension will be read from this directory.

    所以如果安装oracle preinstall PRM配置oracle环境会发现它也是在/etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf中,它会覆盖/etc/security/limits.conf。

    Linux的优点之一是您可以控制几乎所有与其相关的内容。这使系统管理员可以很好地控制其系统并更好地利用系统资源。还可以修复一个已经运行中的程序的limit限制,这适用于如应用server无重启时间,可以在线修改。On Linux systems with kernel >=2.6.36 and util-linux >=2.21, you can use the prlimit command to set a process resource limits: (和solariOS有点像)

    下面演示如何修改一个已运行的程序的limit

    [root@anbob ~]# ps -ef|grep lsnr
    oracle   15837     1  0 00:02 ?        00:00:00 /u01/app/oracle/product/19.2.0/db_1/bin/tnslsnr LISTENER -inherit
    root     16128 16100  0 00:07 pts/1    00:00:00 grep --color=auto lsnr
    [root@anbob ~]# cat /proc/15837/limits
    Limit                     Soft Limit           Hard Limit           Units
    Max cpu time              unlimited            unlimited            seconds
    Max file size             unlimited            unlimited            bytes
    Max data size             unlimited            unlimited            bytes
    Max stack size            10485760             33554432             bytes
    Max core file size        0                    unlimited            bytes
    Max resident set          unlimited            unlimited            bytes
    Max processes             16384                16384                processes
    Max open files            65536                65536                files
    Max locked memory         137438953472         137438953472         bytes
    Max address space         unlimited            unlimited            bytes
    Max file locks            unlimited            unlimited            locks
    Max pending signals       14595                14595                signals
    Max msgqueue size         819200               819200               bytes
    Max nice priority         0                    0
    Max realtime priority     0                    0
    Max realtime timeout      unlimited            unlimited            us
    [oracle@anbob ~]$ gdb
    GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    (gdb) attach 15837
    Attaching to process 15837
    (gdb) set $rlim = &{0ll, 0ll}
    (gdb) print getrlimit(7, $rlim)
    $1 = 0
    (gdb) print *$rlim
    $2 = {65536, 65536}
            Limit                 
         0  Max cpu time          
         1  Max file size         
         2  Max data size         
         3  Max stack size        
         4  Max core file size    
         5  Max resident set      
         6  Max processes         
         7  Max open files        
         8  Max locked memory     
         9  Max address space     
        10  Max file locks        
        11  Max pending signals   
        12  Max msgqueue size     
        13  Max nice priority     
        14  Max realtime priority 
        15  Max realtime timeout  
    # 使用gdb modify 
    (gdb) set *$rlim[0] = 1024*4
    (gdb) print *$rlim
    $3 = {4096, 65536}
    (gdb) print setrlimit(7, $rlim)
    $4 = 0
    [root@anbob ~]# cat /proc/15837/limits|grep "open files"
    Limit                     Soft Limit           Hard Limit           Units
    Max open files            4096                 65536                files
    [root@anbob ~]# prlimit  --nofile --output RESOURCE,SOFT,HARD --pid 15837
    RESOURCE SOFT  HARD
    NOFILE   4096 65536
    # 使用prlimit修改
    [root@anbob ~]# prlimit --nofile=1024:8192 --pid 15837
    [root@anbob ~]# cat /proc/15837/limits |grep "open files"
    Limit                     Soft Limit           Hard Limit           Units
    Max open files            1024                 8192                 files
    (gdb) print getrlimit(7, $rlim)
    $5 = 0
    (gdb) print *$rlim
    $6 = {1024, 8192}
    (gdb)
    

    Note:
    resource limit限制分为soft和hard, soft limit就是实际resource限制,hard limit限制只是为了使用limit命令可以修改的最大上限。

  • 如何查询OceanBase的数据字典或VIRTUAL TABLES?
  • Oracle ASM rebalance 完成还有多久?
  • Troubleshooting Oracle 19c wait event latch free 39 “object stats modification”
  • 案例:openGauss/postgreSQL 数据库手动清理膨胀Heap Bloat (dead tup)
  • Troubleshooting Oracle ASM ORA-15041 & ORA-15074 after disk offline DROPPED.
  • Oracle 12c feature: SQL Translation Framework(文本替换) & event 10601
  • openGauss ERROR: inserted partition key does not map to any table partition Call getNextException to see other errors in the batch.
  • 恢复sys.IDL_UB1$被rename了
  • Troubleshooting oracle 12c error ORA-4021 and alert show “qsmqChkOCMV : Timeout while locking“
  •