Doris FE 宕机后启动失败
13671653088 发布于2021-05-17 浏览:597 回复:1
0
收藏
最后编辑于2021-05-18

Doris版本: 0.14.7

周末遇到一个FE启动的问题,有3个follower节点,两个突然挂了(挂的原因这边还未找到),有一个主节点好的,但是9030端口无法访问了,大概出现问题15分钟左右开始处理。
sh start_fe.sh --daemon启动两个挂的FE节点,发现只有9010端口在监听,9030端口未监听,重试了4-5次,依然这样。
随后重启了centos系统,又尝试了2次,可以了。附件是某一次启动失败,只有9010端口监听时的日志。

想要了解下发生这种fe启动失败,之后9010端口有监听的原因

附件日志如下

OpenJDK 64-Bit Server VM (25.282-b08) for linux-amd64 JRE (1.8.0_282-b08), built on Jan 22 2021 15:39:19 by "mockbuild" with gcc 4.8.5 20150623 (Red Hat 4.8.5-44)
Memory: 4k page, physical 65806460k(62372796k free), swap 0k(0k free)
CommandLine flags: -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:-CMSParallelRemarkEnabled -XX:InitialHeapSize=1052903360 -XX:MaxHeapSize=42949672960 -XX:MaxNewSize=1134141440 -XX:MaxTenuringThreshold=7 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -XX:+UseMembar -XX:+UseParNewGC 
2021-05-15T00:31:17.864+0800: 1.548: [GC (Allocation Failure) 2021-05-15T00:31:17.864+0800: 1.548: [ParNew: 274752K->24867K(309056K), 0.0207680 secs] 274752K->24867K(995840K), 0.0209012 secs] [Times: user=0.15 sys=0.05, real=0.02 secs] 
2021-05-15T00:31:17.885+0800: 1.569: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(686784K)] 30032K(995840K), 0.0017796 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2021-05-15T00:31:17.887+0800: 1.571: [CMS-concurrent-mark-start]
2021-05-15T00:31:17.891+0800: 1.575: [CMS-concurrent-mark: 0.004/0.004 secs] [Times: user=0.05 sys=0.00, real=0.00 secs] 
2021-05-15T00:31:17.891+0800: 1.576: [CMS-concurrent-preclean-start]
2021-05-15T00:31:17.894+0800: 1.578: [CMS-concurrent-preclean: 0.003/0.003 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
2021-05-15T00:31:17.894+0800: 1.578: [CMS-concurrent-abortable-preclean-start]
2021-05-15T00:31:18.133+0800: 1.817: [GC (Allocation Failure) 2021-05-15T00:31:18.133+0800: 1.817: [ParNew: 299619K->28738K(309056K), 0.0409400 secs] 299619K->42531K(995840K), 0.0410328 secs] [Times: user=0.17 sys=0.04, real=0.04 secs] 
2021-05-15T00:31:18.174+0800: 1.858: [CMS-concurrent-abortable-preclean: 0.112/0.280 secs] [Times: user=0.89 sys=0.08, real=0.28 secs] 
2021-05-15T00:31:18.174+0800: 1.858: [GC (CMS Final Remark) [YG occupancy: 33943 K (309056 K)]2021-05-15T00:31:18.174+0800: 1.858: [Rescan (non-parallel) 2021-05-15T00:31:18.174+0800: 1.858: [grey object rescan, 0.0104227 secs]2021-05-15T00:31:18.185+0800: 1.869: [root rescan, 0.0103963 secs]2021-05-15T00:31:18.195+0800: 1.879: [visit unhandled CLDs, 0.0000069 secs]2021-05-15T00:31:18.195+0800: 1.879: [dirty klass scan, 0.0003229 secs], 0.0211846 secs]2021-05-15T00:31:18.195+0800: 1.879: [weak refs processing, 0.0000205 secs]2021-05-15T00:31:18.195+0800: 1.879: [class unloading, 0.0036127 secs]2021-05-15T00:31:18.199+0800: 1.883: [scrub symbol table, 0.0036245 secs]2021-05-15T00:31:18.203+0800: 1.887: [scrub string table, 0.0003619 secs][1 CMS-remark: 13793K(686784K)] 47737K(995840K), 0.0288981 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 
2021-05-15T00:31:18.203+0800: 1.887: [CMS-concurrent-sweep-start]
2021-05-15T00:31:18.208+0800: 1.892: [CMS-concurrent-sweep: 0.005/0.005 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2021-05-15T00:31:18.208+0800: 1.892: [CMS-concurrent-reset-start]
2021-05-15T00:31:18.420+0800: 2.104: [CMS-concurrent-reset: 0.211/0.211 secs] [Times: user=0.56 sys=0.19, real=0.21 secs] 
2021-05-15T00:31:18.518+0800: 2.202: [GC (Allocation Failure) 2021-05-15T00:31:18.518+0800: 2.202: [ParNew: 303490K->34304K(309056K), 0.0707860 secs] 317283K->114766K(995840K), 0.0708973 secs] [Times: user=0.55 sys=0.03, real=0.07 secs] 
2021-05-15T00:31:18.961+0800: 2.645: [GC (Allocation Failure) 2021-05-15T00:31:18.961+0800: 2.645: [ParNew: 309056K->34304K(309056K), 0.0346510 secs] 389518K->147175K(995840K), 0.0347442 secs] [Times: user=0.36 sys=0.01, real=0.03 secs] 
2021-05-15T00:31:19.207+0800: 2.891: [GC (Allocation Failure) 2021-05-15T00:31:19.208+0800: 2.892: [ParNew: 309056K->34304K(309056K), 0.0688900 secs] 421927K->223599K(995840K), 0.0689837 secs] [Times: user=0.47 sys=0.02, real=0.07 secs] 
2021-05-15T00:31:19.481+0800: 3.165: [GC (Allocation Failure) 2021-05-15T00:31:19.481+0800: 3.165: [ParNew: 309056K->34304K(309056K), 0.0666752 secs] 498351K->301482K(995840K), 0.0668004 secs] [Times: user=0.47 sys=0.02, real=0.07 secs] 
2021-05-15T00:31:19.752+0800: 3.436: [GC (Allocation Failure) 2021-05-15T00:31:19.752+0800: 3.436: [ParNew: 309056K->34304K(309056K), 0.0653357 secs] 576234K->379590K(995840K), 0.0654305 secs] [Times: user=0.49 sys=0.02, real=0.07 secs] 
2021-05-15T00:31:20.018+0800: 3.702: [GC (Allocation Failure) 2021-05-15T00:31:20.018+0800: 3.702: [ParNew: 309056K->34304K(309056K), 0.0629972 secs] 654342K->458275K(995840K), 0.0630949 secs] [Times: user=0.46 sys=0.02, real=0.06 secs] 
2021-05-15T00:31:20.081+0800: 3.765: [GC (CMS Initial Mark) [1 CMS-initial-mark: 423971K(686784K)] 463733K(995840K), 0.0045196 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 
2021-05-15T00:31:20.086+0800: 3.770: [CMS-concurrent-mark-start]
2021-05-15T00:31:20.273+0800: 3.957: [CMS-concurrent-mark: 0.187/0.187 secs] [Times: user=0.93 sys=0.01, real=0.19 secs] 
2021-05-15T00:31:20.273+0800: 3.957: [CMS-concurrent-preclean-start]
2021-05-15T00:31:20.285+0800: 3.969: [CMS-concurrent-preclean: 0.011/0.012 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2021-05-15T00:31:20.285+0800: 3.969: [CMS-concurrent-abortable-preclean-start]
2021-05-15T00:31:20.288+0800: 3.972: [GC (Allocation Failure) 2021-05-15T00:31:20.288+0800: 3.972: [ParNew: 309056K->34304K(309056K), 0.0664470 secs] 733027K->535488K(995840K), 0.0665184 secs] [Times: user=0.53 sys=0.03, real=0.06 secs] 
2021-05-15T00:31:20.558+0800: 4.242: [GC (Allocation Failure) 2021-05-15T00:31:20.558+0800: 4.243: [ParNew: 309056K->34304K(309056K), 0.0828381 secs] 810240K->626965K(995840K), 0.0829344 secs] [Times: user=0.55 sys=0.04, real=0.08 secs] 
2021-05-15T00:31:20.718+0800: 4.402: [CMS-concurrent-abortable-preclean: 0.279/0.433 secs] [Times: user=1.62 sys=0.08, real=0.44 secs] 
2021-05-15T00:31:20.718+0800: 4.402: [GC (CMS Final Remark) [YG occupancy: 143896 K (309056 K)]2021-05-15T00:31:20.718+0800: 4.403: [Rescan (non-parallel) 2021-05-15T00:31:20.718+0800: 4.403: [grey object rescan, 0.0015164 secs]2021-05-15T00:31:20.720+0800: 4.404: [root rescan, 0.0540443 secs]2021-05-15T00:31:20.774+0800: 4.458: [visit unhandled CLDs, 0.0000162 secs]2021-05-15T00:31:20.774+0800: 4.458: [dirty klass scan, 0.0005677 secs], 0.0562005 secs]2021-05-15T00:31:20.775+0800: 4.459: [weak refs processing, 0.0001411 secs]2021-05-15T00:31:20.775+0800: 4.459: [class unloading, 0.0054771 secs]2021-05-15T00:31:20.780+0800: 4.464: [scrub symbol table, 0.0038367 secs]2021-05-15T00:31:20.784+0800: 4.468: [scrub string table, 0.0004146 secs][1 CMS-remark: 592661K(686784K)] 736557K(995840K), 0.0661653 secs] [Times: user=0.07 sys=0.00, real=0.06 secs] 
2021-05-15T00:31:20.785+0800: 4.469: [CMS-concurrent-sweep-start]
2021-05-15T00:31:20.908+0800: 4.592: [GC (Allocation Failure) 2021-05-15T00:31:20.908+0800: 4.592: [ParNew: 309056K->34304K(309056K), 0.0701146 secs] 898823K->701745K(995840K), 0.0702118 secs] [Times: user=0.51 sys=0.04, real=0.07 secs] 
2021-05-15T00:31:21.049+0800: 4.733: [CMS-concurrent-sweep: 0.194/0.264 secs] [Times: user=0.89 sys=0.04, real=0.27 secs] 
2021-05-15T00:31:21.050+0800: 4.734: [CMS-concurrent-reset-start]
2021-05-15T00:31:21.153+0800: 4.837: [CMS-concurrent-reset: 0.103/0.103 secs] [Times: user=0.20 sys=0.00, real=0.10 secs] 
2021-05-15T00:31:21.177+0800: 4.861: [GC (Allocation Failure) 2021-05-15T00:31:21.177+0800: 4.861: [ParNew: 309056K->34304K(309056K), 0.0701840 secs] 976497K->782754K(1421460K), 0.0702793 secs] [Times: user=0.50 sys=0.03, real=0.07 secs] 
2021-05-15T00:31:21.247+0800: 4.931: [GC (CMS Initial Mark) [1 CMS-initial-mark: 748450K(1112404K)] 788244K(1421460K), 0.0070456 secs] [Times: user=0.03 sys=0.01, real=0.00 secs] 
2021-05-15T00:31:21.255+0800: 4.939: [CMS-concurrent-mark-start]
2021-05-15T00:31:21.455+0800: 5.139: [GC (Allocation Failure) 2021-05-15T00:31:21.455+0800: 5.139: [ParNew: 309056K->34304K(309056K), 0.0684591 secs] 1057506K->860223K(1421460K), 0.0685595 secs] [Times: user=0.53 sys=0.03, real=0.07 secs] 
2021-05-15T00:31:21.661+0800: 5.345: [CMS-concurrent-mark: 0.338/0.407 secs] [Times: user=2.21 sys=0.03, real=0.41 secs] 
2021-05-15T00:31:21.661+0800: 5.345: [CMS-concurrent-preclean-start]
2021-05-15T00:31:21.747+0800: 5.431: [CMS-concurrent-preclean: 0.082/0.085 secs] [Times: user=0.18 sys=0.01, real=0.09 secs] 
2021-05-15T00:31:21.747+0800: 5.431: [CMS-concurrent-abortable-preclean-start]
2021-05-15T00:31:21.875+0800: 5.559: [GC (Allocation Failure) 2021-05-15T00:31:21.875+0800: 5.559: [ParNew: 309056K->34304K(309056K), 0.0706172 secs] 1134975K->933311K(1421460K), 0.0707116 secs] [Times: user=0.56 sys=0.03, real=0.07 secs] 
2021-05-15T00:31:22.391+0800: 6.075: [CMS-concurrent-abortable-preclean: 0.553/0.645 secs] [Times: user=2.97 sys=0.15, real=0.64 secs] 
2021-05-15T00:31:22.392+0800: 6.076: [GC (CMS Final Remark) [YG occupancy: 187393 K (309056 K)]2021-05-15T00:31:22.392+0800: 6.076: [Rescan (non-parallel) 2021-05-15T00:31:22.392+0800: 6.076: [grey object rescan, 0.0015576 secs]2021-05-15T00:31:22.393+0800: 6.077: [root rescan, 0.0424950 secs]2021-05-15T00:31:22.436+0800: 6.120: [visit unhandled CLDs, 0.0000191 secs]2021-05-15T00:31:22.436+0800: 6.120: [dirty klass scan, 0.0006902 secs], 0.0448208 secs]2021-05-15T00:31:22.437+0800: 6.121: [weak refs processing, 0.0000225 secs]2021-05-15T00:31:22.437+0800: 6.121: [class unloading, 0.0050313 secs]2021-05-15T00:31:22.442+0800: 6.126: [scrub symbol table, 0.0050928 secs]2021-05-15T00:31:22.447+0800: 6.131: [scrub string table, 0.0005154 secs][1 CMS-remark: 899007K(1112404K)] 1086400K(1421460K), 0.0556885 secs] [Times: user=0.07 sys=0.00, real=0.06 secs] 
2021-05-15T00:31:22.448+0800: 6.132: [CMS-concurrent-sweep-start]
2021-05-15T00:31:22.683+0800: 6.367: [GC (Allocation Failure) 2021-05-15T00:31:22.683+0800: 6.367: [ParNew: 309056K->34304K(309056K), 0.0349852 secs] 1201878K->965695K(1421460K), 0.0351054 secs] [Times: user=0.26 sys=0.01, real=0.04 secs] 
2021-05-15T00:31:22.806+0800: 6.490: [CMS-concurrent-sweep: 0.312/0.359 secs] [Times: user=3.01 sys=0.12, real=0.36 secs] 
2021-05-15T00:31:22.808+0800: 6.492: [CMS-concurrent-reset-start]
2021-05-15T00:31:22.914+0800: 6.598: [CMS-concurrent-reset: 0.106/0.106 secs] [Times: user=1.16 sys=0.03, real=0.10 secs] 
2021-05-15T00:31:23.073+0800: 6.757: [GC (Allocation Failure) 2021-05-15T00:31:23.073+0800: 6.757: [ParNew: 309056K->20081K(309056K), 0.0047677 secs] 1240447K->951472K(1861376K), 0.0048667 secs] [Times: user=0.07 sys=0.01, real=0.01 secs] 
2021-05-15T00:31:23.301+0800: 6.985: [GC (Allocation Failure) 2021-05-15T00:31:23.301+0800: 6.985: [ParNew: 294833K->23836K(309056K), 0.0041716 secs] 1226224K->955228K(1861376K), 0.0042559 secs] [Times: user=0.05 sys=0.00, real=0.00 secs] 
2021-05-15T00:31:23.504+0800: 7.188: [GC (Allocation Failure) 2021-05-15T00:31:23.504+0800: 7.188: [ParNew: 298588K->27797K(309056K), 0.0088044 secs] 1229980K->965872K(1861376K), 0.0089022 secs] [Times: user=0.08 sys=0.01, real=0.01 secs] 
2021-05-15T00:31:23.744+0800: 7.428: [GC (Allocation Failure) 2021-05-15T00:31:23.744+0800: 7.428: [ParNew: 302549K->33130K(309056K), 0.0059663 secs] 1240624K->976405K(1861376K), 0.0060644 secs] [Times: user=0.04 sys=0.01, real=0.01 secs] 
2021-05-15T00:31:24.176+0800: 7.860: [GC (Allocation Failure) 2021-05-15T00:31:24.176+0800: 7.860: [ParNew: 307882K->27645K(309056K), 0.0052660 secs] 1251157K->977978K(1861376K), 0.0053507 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 
2021-05-15T00:31:24.587+0800: 8.271: [GC (Allocation Failure) 2021-05-15T00:31:24.587+0800: 8.271: [ParNew: 302397K->20084K(309056K), 0.0024872 secs] 1252730K->970417K(1861376K), 0.0025589 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
2021-05-15T00:31:25.020+0800: 8.704: [GC (Allocation Failure) 2021-05-15T00:31:25.020+0800: 8.704: [ParNew: 294836K->18418K(309056K), 0.0025574 secs] 1245169K->968750K(1861376K), 0.0026411 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
2021-05-15T00:54:26.927+0800: 1390.611: [GC (Allocation Failure) 2021-05-15T00:54:26.927+0800: 1390.611: [ParNew: 293170K->18453K(309056K), 0.0029046 secs] 1243502K->968785K(1861376K), 0.0029900 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 
收藏
点赞
0
个赞
共1条回复 最后由IamStrangers编辑于2021-05-18
#2IamStrangers回复于2021-05-17

你这个是java 的 gc 日志,不是fe 的系统日志,可以看下 fe.log 里,fe启动后一段时间的日志,看看有什么异常。

以及对于挂掉的那两个fe,看看对时间fe.out 里有什么错误。

0
快速回复
TOP
切换版块