2016-04-14 21:27:13,174 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 20542ms
GC pool 'ParNew' had collection(s): count=1 time=0ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=2 time=20898ms
2016-04-14 21:27:13,174 WARN [regionserver60020.periodicFlusher] util.Sleeper: We slept 20936ms instead of 100ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2246)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1496)
2016-04-14 21:32:40,173 WARN [B.DefaultRpcServer.handler=125,queue=5,port=60020] ipc.RpcServer: RpcServer.respondercallId: 7540 service: ClientService methodName: Multi size: 100.2 K connection: 10.160.247.139:56031: output error
2016-04-14 21:32:40,173 WARN [B.DefaultRpcServer.handler=125,queue=5,port=60020] ipc.RpcServer: B.DefaultRpcServer.handler=125,queue=5,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
(concurrent mode failure): 45876255K->21800674K(46137344K), 10.0625300 secs] 48792749K->21800674K(49283072K), [CMS Perm : 43274K->43274K(262144K)], 10.2083040 secs] [Times: user=12.02 sys=0.00, real=10.20 secs]
2016-04-14 21:22:43,990 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 10055ms
GC pool 'ParNew' had collection(s): count=2 time=244ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=10062ms
2016-04-14 21:22:44,006 WARN [ResponseProcessor for block BP-632656502-10.160.173.93-1448595094942:blk_1073941840_201226] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-632656502-10.160.173.93-1448595094942:blk_1073941840_201226
java.io.IOException: Bad response ERROR for block BP-632656502-10.160.173.93-1448595094942:blk_1073941840_201226 from datanode 10.160.173.93:50010
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
2016-04-14 21:22:43,789 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-632656502-10.160.173.93-1448595094942:blk_1073941840_201226 received exception java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.160.173.94:50010 remote=/10.160.173.94:30110]
2016-04-14 21:22:43,779 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hz-hbase4.photo.163.org:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.160.173.94:30123 dest: /10.160.173.94:50010
java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.160.173.94:50010 remote=/10.160.173.94:30123]
We stumbled over here by a difnereft page and thought I should check things out. I like what I see so now i am following you. Look forward to finding out about your web page yet again.
我的hbase集群的多个regionserver比较频繁的遇到这样一个情况,就是jvm停止工作,但是当时并没有GC发生。
日志如下:
2017-05-26 18:42:10,607 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 48721ms
No GCs detected
2017-05-26 19:12:10,790 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 49500ms
No GCs detected
3. 经过了一些参数调优,No GC 的Jvm停顿时间有所降低,但还是会发生。
以下是一个例子(2017-06-05 19:36):
2017-06-05 19:36:40,932 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 10030ms
No GCs detected
当时段的gc日志如下:
2017-06-05T19:36:03.369+0800: 797616.040: Total time for which application threads were stopped: 0.0137030 seconds
2017-06-05T19:36:22.091+0800: 797634.762: [GC2017-06-05T19:36:22.091+0800: 797634.762: [ParNew
Desired survivor size 89456640 bytes, new threshold 10 (max 10)
– age 1: 1033520 bytes, 1033520 total
– age 2: 236264 bytes, 1269784 total
– age 3: 211400 bytes, 1481184 total
– age 4: 260000 bytes, 1741184 total
– age 5: 173448 bytes, 1914632 total
– age 6: 159216 bytes, 2073848 total
– age 7: 173896 bytes, 2247744 total
– age 8: 179192 bytes, 2426936 total
– age 9: 182104 bytes, 2609040 total
– age 10: 191624 bytes, 2800664 total
: 702804K->3678K(873856K), 0.0122400 secs] 17881044K->17182126K(18699648K), 0.0123650 secs] [Times: user=0.21 sys=0.00, real=0.01 secs]
2017-06-05T19:36:22.104+0800: 797634.774: Total time for which application threads were stopped: 0.0134570 seconds
2017-06-05T19:36:23.323+0800: 797635.993: Total time for which application threads were stopped: 0.0010760 seconds
2017-06-05T19:36:23.323+0800: 797635.994: Total time for which application threads were stopped: 0.0002370 seconds
2017-06-05T19:36:40.931+0800: 797653.602: Total time for which application threads were stopped: 10.2617260 seconds
2017-06-05T19:36:40.931+0800: 797653.602: Total time for which application threads were stopped: 0.0002360 seconds
2017-06-05T19:36:40.932+0800: 797653.602: Total time for which application threads were stopped: 0.0002140 seconds
2017-06-05T19:36:40.932+0800: 797653.603: Total time for which application threads were stopped: 0.0002160 seconds
2017-06-05T19:36:40.934+0800: 797653.604: Total time for which application threads were stopped: 0.0003790 seconds
2017-06-05T19:36:40.934+0800: 797653.605: Total time for which application threads were stopped: 0.0002830 seconds
2017-06-05T19:36:40.935+0800: 797653.605: Total time for which application threads were stopped: 0.0001910 seconds
2017-06-05T19:36:40.935+0800: 797653.606: Total time for which application threads were stopped: 0.0001910 seconds
2017-06-05T19:36:40.935+0800: 797653.606: Total time for which application threads were stopped: 0.0001820 seconds
2017-06-05T19:36:40.951+0800: 797653.622: Total time for which application threads were stopped: 0.0003420 seconds
2017-06-05T19:36:40.952+0800: 797653.622: Total time for which application threads were stopped: 0.0001750 seconds
2017-06-05T19:36:40.952+0800: 797653.622: Total time for which application threads were stopped: 0.0001520 seconds
2017-06-05T19:36:40.952+0800: 797653.623: Total time for which application threads were stopped: 0.0001410 seconds
2017-06-05T19:36:45.881+0800: 797658.552: [GC2017-06-05T19:36:45.881+0800: 797658.552: [ParNew
Desired survivor size 89456640 bytes, new threshold 10 (max 10)
– age 1: 1593696 bytes, 1593696 total
– age 2: 244032 bytes, 1837728 total
– age 3: 210744 bytes, 2048472 total
– age 4: 211184 bytes, 2259656 total
– age 5: 193208 bytes, 2452864 total
– age 6: 173448 bytes, 2626312 total
– age 7: 159168 bytes, 2785480 total
– age 8: 173896 bytes, 2959376 total
– age 9: 179168 bytes, 3138544 total
– age 10: 182152 bytes, 3320696 total
: 702799K->4120K(873856K), 0.0133430 secs] 17881247K->17182824K(18699648K), 0.0134700 secs] [Times: user=0.22 sys=0.00, real=0.02 secs]
2017-06-05T19:36:45.895+0800: 797658.565: Total time for which application threads were stopped: 0.0145820 seconds
2017-06-05T19:36:53.935+0800: 797666.606: Total time for which application threads were stopped: 0.0010200 seconds