首页 版块 积分商城 访问AI主站 注册 发帖
13671653088
105
积分 积分商城
0
获赞
Flink开启后产生大量不健康的tablet
Ta的回复 :当前不太可能减少任务数量,请问这个副本延迟有没有什么参数可以调节,加快同步速度的? 当前flink执行stream load会有如下这些报错信息: [代码]
2
stream load导入报错
Ta的回复 :be.INFO,选取了一个-215报错的Stream load日志如下: 需要麻烦看下这种错误有什么好的解决方案 I0520 15:51:04.691452 2801 stream_load.cpp:214] new income streaming load request.id=d94952cdffe4ff08-131b39ecc22d6988, job_id=-1, txn_id=-1, label=audit_20210520_155104_6fe859081c2a46918e5a74bf3d71332e, db=ods_dental, tbl=ods_dental_operationlog I0520 15:51:04.696164 2801 stream_load_executor.cpp:53] begin to execute job. label=audit_20210520_155104_6fe859081c2a46918e5a74bf3d71332e, txn_id=11178426, query_id=d94952cdffe4ff08-131b39ecc22d6988 I0520 15:51:04.696190 2801 plan_fragment_executor.cpp:76] Prepare(): query_id=d94952cdffe4ff08-131b39ecc22d6988 fragment_instance_id=d94952cdffe4ff08-131b39ecc22d6989 backend_num=0 c002baa6e8f138e, version: 0 I0520 15:51:04.696252 2801 plan_fragment_executor.cpp:140] Using query memory limit: 2.00 GB I0520 15:51:04.696797 2596 plan_fragment_executor.cpp:239] Open(): fragment_instance_id=d94952cdffe4ff08-131b39ecc22d6989 c002baa6e8f138e, version: 0 I0520 15:51:04.697333 2759 tablets_channel.cpp:59] open tablets channel: (id=d94952cdffe4ff08-131b39ecc22d6988,index_id=2527663), tablets num: 3, timeout(s): 36000 I0520 15:51:04.699055 2749 tablets_channel.cpp:141] close tablets channel: (id=d94952cdffe4ff08-131b39ecc22d6988,index_id=2527663), sender id: 0 c002baa6e8f138e, version: 0 I0520 15:51:04.699074 20369 tablet_sink.cpp:979] all node channels are stopped(maybe finished/offending/cancelled), consumer thread exit. I0520 15:51:04.699285 2749 txn_manager.cpp:250] commit transaction to engine successfully. partition_id: 308886, transaction_id: 11178426, tablet: 2527680.39817555.a14fc33f88e02df4-29c9453fca0a2196, rowsetid: 0200000000dd21cd6a432b7fb01433bddc002baa6e8f138e, version: 0 ; , load_id=d94952cdffe4ff08-131b39ecc22d6988 I0520 15:51:04.699295 2749 delta_writer.cpp:343] close delta writer for tablet: 2527680, stats: (flush time(ms)=0, flush count=1, flush bytes: 4096, flush disk bytes: 0) t_id=2527704, txn_id=11178426, err=-215 I0520 15:51:04.699322 2749 txn_manager.cpp:250] commit transaction to engine successfully. partition_id: 308886, transaction_id: 11178426, tablet: 2527668.39817555.754bf6c720eb9f87-282e5c4b1f464694, rowsetid: 0200000000dd21ce6a432b7fb01433bddter write failed, tablet_id=2527704, txn_id=11178c002baa6e8f138e, version: 0 I0520 15:51:04.699327 2749 delta_writer.cpp:343] close delta writer for tablet: 2527668, stats: (flush time(ms)=0, flush count=1, flush bytes: 4096, flush disk bytes: 0) I0520 15:51:04.699347 2749 txn_manager.cpp:250] commit transaction to engine successfully. partition_id: 308886, transaction_id: 11178426, tablet: 2527672.39817555.0f4841f222bb3bb5-8b4b4268a0aeecac, rowsetid: 0200000000dd21cf6a432b7fb01433bdd)(1)} {10003:(17)(1)} {10005:(0)(1)} {10008:(0)(1c002baa6e8f138e, version: 0 I0520 15:51:04.699350 2749 delta_writer.cpp:343] close delta writer for tablet: 2527672, stats: (flush time(ms)=0, flush count=1, flush bytes: 4096, flush disk bytes: 0) , txn_id=11178426, err=-215 I0520 15:51:04.699422 2749 load_channel_mgr.cpp:152] removing load channel d94952cdffe4ff08-131b39ecc22d6988 because it's finished I0520 15:51:04.699430 2749 load_channel.cpp:38] load channel mem peak usage=4096, info=limit: 2147483648; consumption: 0; label: LoadChannel:d94952cdffe4ff08-131b39ecc22d6988; all tracker size: 3; limit trackers size: 2; parent is null: false; , load_id=d94952cdffe4ff08-131b39ecc22d6988 W0520 15:51:04.699900 2759 tablet_sink.cpp:168] NodeChannel[2527663-10008] add batch req success but status isn't ok, load_id=d94952cdffe4ff08-131b39ecc22d6988, txn_id=11178426, node=10.188.3.155:8060, errmsg=tablet writer write failed, tablet_id=2527704, txn_id=11178426, err=-215 W0520 15:51:04.700928 2596 tablet_sink.cpp:733] NodeChannel[2527663-10008]: close channel failed, load_id=d94952cdffe4ff08-131b39ecc22d6988, txn_id=11178426. error_msg=close wait failed coz rpc error. node=10.188.3.155:8060, errmsg=tablet writer write failed, tablet_id=2527704, txn_id=11178426, err=-215 I0520 15:51:04.718088 2596 tablet_sink.cpp:749] total mem_exceeded_block_ns=0, total queue_push_lock_ns=0, total actual_consume_ns=298548 I0520 15:51:04.718101 2596 tablet_sink.cpp:780] finished to close olap table sink. load_id=d94952cdffe4ff08-131b39ecc22d6988, txn_id=11178426, node add batch time(ms)/num: {10006:(17)(1)} {10009:(0)(1)} {10002:(0)(1)} {10007:(0)(1)} {10010:(0)(1)} {10003:(17)(1)} {10005:(0)(1)} {10008:(0)(1)} {10011:(0)(1)} {10004:(0)(1)} W0520 15:51:04.718353 2596 fragment_mgr.cpp:230] Got error while opening fragment d94952cdffe4ff08-131b39ecc22d6989: Internal error: close wait failed coz rpc error. node=10.188.3.155:8060, errmsg=tablet writer write failed, tablet_id=2527704, txn_id=11178426, err=-215 I0520 15:51:04.718505 2596 plan_fragment_executor.cpp:583] Fragment d94952cdffe4ff08-131b39ecc22d6989:(Active: 20.468ms, non-child: 0.00%)
6
stream load导入报错
Ta的回复 :- AverageThreadTokens: 0.00 - FragmentCpuTime: 492.766us - MemoryLimit: 2.00 GB - PeakMemoryUsage: 1.21 MB - PeakReservation: 0 - PeakUsedReservation: 0 - RowsProduced: 1 BlockMgr: - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - MaxBlockSize: 8.00 MB - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalIntegrityCheckTime: 0.000ns - TotalReadBlockTime: 0.000ns OlapTableSink:(Active: 21.307ms, non-child: 100.00%) - CloseWaitTime: 20.370ms - ConvertBatchTime: 0.000ns - MaxAddBatchExecTime: 17.989ms - NonBlockingSendTime: 1.409ms - NonBlockingSendWorkTime: 298.548us - SerializeBatchTime: 23.098us - NumberBatchAdded: 10 - NumberNodeChannels: 10 - OpenTime: 744.186us - RowsFiltered: 0 - RowsRead: 1 - RowsReturned: 1 - SendDataTime: 13.745us - WaitMemLimitTime: 0.000ns - TotalAddBatchExecTime: 39.369ms - ValidateDataTime: 3.116us BROKER_SCAN_NODE (id=0):(Active: 38.553us, non-child: 0.19%) - BytesDecompressed: 0 - BytesRead: 284.00 B - DecompressTime: 0.000ns - FileReadTime: 5.751us - MaterializeTupleTime(*): 11.861us - NumDiskAccess: 0 , txn_id=11178426, err=-215, id=d94952cdffe4ff08- - PeakMemoryUsage: 1.02 MB - RowsRead: 1 1178426, err=-215 - RowsReturned: 1 - RowsReturnedRate: 25.94 K/sec - TotalRawReadTime(*): 32.899us - TotalReadThroughput: 0.00 /sec - WaitScannerTime: 0.000ns W0520 15:51:04.718528 2596 stream_load_executor.cpp:90] fragment execute failed, query_id=d94952cdffe4ff08-131b39ecc22d6988, err_msg=close wait failed coz rpc error. node=10.188.3.155:8060, errmsg=tablet writer write failed, tablet_id=2527704, txn_id=11178426, err=-215, id=d94952cdffe4ff08-131b39ecc22d6988, job_id=-1, txn_id=11178426, label=audit_20210520_155104_6fe859081c2a46918e5a74bf3d71332e W0520 15:51:04.718605 2801 stream_load.cpp:142] handle streaming load failed, id=d94952cdffe4ff08-131b39ecc22d6988, errmsg=close wait failed coz rpc error. node=10.188.3.155:8060, errmsg=tablet writer write failed, tablet_id=2527704, txn_id=11178426, err=-215
6
stream load导入报错
Ta的回复 :[代码]   当前集群的版本情况是这样的 几乎没有BC,只有CC 目前打算执行以下两项参数,有没有更多的建议调整参数和参数值? echo "compaction_task_num_per_disk=5" >> /etc/doris/be/conf/be.conf echo "max_cumulative_compaction_num_singleton_deltas=500" >> /etc/doris/be/conf/be.conf 另外,查看版本合并时,发现有大量的版本大小为0,想要了解下这是什么情况造成的,数据导入失败? [代码]
6
BE 内存限制无效
Ta的回复 :我用的版本是0.14.7 问一下导入引起的OOM需要如何解决,有没有办法做控制 800W数据量从表A insert select到表B,BE出现了OOM被linux干掉了。
5
left join查询性能问题
Ta的回复 :期待下个版本早日更新,感觉解决了最近遇到的各种问题
11
关于Doris集群的扩容意见征询
Ta的回复 :感谢详尽的答复 关于问题1,感谢建议,我们会进行采纳,先对集群进行纵向扩容,通过硬件减缓OOM的问题。 关于问题2,我们目前使用的已经是SSD了,根据grafana监控显示,BE的磁盘读写维持在20%以下。 关于问题3,感觉这是一个相当有有用的功能。希望了解下,资源标签这项功能的版本,其大概的计划发布时间。此外,这项资源标签,主要隔离的是BE的运算资源,还是FE和BE都进行隔离。
4
Doris 0.14.12版本升级问题
Ta的回复 :一些信息更新: 6月9日再次尝试了升级Doris BE   从0.14.7到0.14.12 升级前通知了同事暂停所有flink写入。 结果依然是,升级到第四台时发现,开始有升级过的BE节点宕机,与第一次升级时遇到的情况基本一致。    
4
row_number over相关问题
Ta的回复 :view中 create view vw_XXX as select *,row_number() over(patition by v1,v2 order by k1)rn  from table 外部查询 select * from vw_XXX limit 100 会先做row_number的操作,再limit100
2
ODBC Mysql外表问题
Ta的回复 :抱歉漏贴版本了,版本是0.14.7
5
如何通过客户端管理查询
Ta的回复 :请问,执行的Select,有没有办法在还没跑完的情况下终止任务
4
Doris扩容后报错问题
Ta的回复 :感谢明雨哥答复,经过排查,产生问题的原因是阿里云的网络基础设置中,存在跨安全组910s超时的限制。  
2
Doris 集群迁移问题
Ta的回复 :明白了,谢谢
3
切换版块
智能客服