spark load 在load 阶段报错
7丶s1ence 发布于2021-08-12 浏览:589 回复:1
0
收藏

各位大佬有遇到这种错误吗?

日志信息如下,核心的是这句The first read record. NotImplemented: This class cannot yet iterate chunked arrays

spark load 在读parquet 文件准备导入到be 的时候,无法成功读取 parquet 文件,同一批导入 其他119个 parquet 文件都已经load 进去了 进度卡在99% 就差这一个

0812 03:40:01.785815 115443 engine_batch_load_task.cpp:339] success to push delta, transaction_id=4261944 tablet=15169284.215178049.274c11a88e027b15-5e54935f6019d3b6, cost=7s689ms
I0812 03:40:01.785914 115443 engine_batch_load_task.cpp:270] Push finish, cost time: 7
I0812 03:40:01.786116 115443 task_worker_pool.cpp:272] finish task success.
I0812 03:40:01.786134 115443 task_worker_pool.cpp:258] remove task info. type=REALTIME_PUSH, signature=4262400, queue_size=7
I0812 03:40:01.786644 115443 task_worker_pool.cpp:603] get push task. signature: 4262410 priority: 0 push_type: 3
I0812 03:40:01.786656 115443 engine_batch_load_task.cpp:291] begin to process push. transaction_id=4261944 tablet_id=15169300, version=-1
I0812 03:40:01.786660 115443 push_handler.cpp:60] begin to realtime push. tablet=15169300.215178049.f243bdd9d66cdefa-cfb44faab158ebae, transaction_id=4261944
I0812 03:40:01.786703 115443 push_handler.cpp:318] tablet=15169300.215178049.f243bdd9d66cdefa-cfb44faab158ebae, file path=hdfs://XXX/V1.mart_waimai_dw_ad__topic_flow_ad_addata_a
s_testb_analy_d__1628698225__cantor606466896__1.13732483.15168979.13736359.20.215178049.parquet, file size=3304391440

W0812 03:40:35.698889 115443 parquet_reader.cpp:86] The first read record. NotImplemented: This class cannot yet iterate chunked arrays
W0812 03:40:35.699018 115443 push_handler.cpp:1092] Scanner get next tuple failed
W0812 03:40:35.699031 115443 push_handler.cpp:357] read next row failed. res=-910 read_rows=0
I0812 03:40:35.699100 115443 push_handler.cpp:1121] PushBrokerReader:
- MaterializeTupleTime(*): 0.000ns
- MemoryLimit: 2.00 GB
- PeakMemoryUsage: 8.00 KB
- PeakReservation: 0
- PeakUsedReservation: 0
- RowsRead: 0
- TotalRawReadTime(*): 33s911ms
W0812 03:40:35.701568 115443 push_handler.cpp:210] fail to convert tmp file when realtime push. res=-910, failed to process realtime push., tablet=15169300.215178049.f243bdd9d66cdefa-cfb44faab158ebae, transaction_id=4261944
I0812 03:40:35.731787 115443 txn_manager.cpp:379] rollback transaction from engine successfully. partition_id: 15168979, transaction_id: 4261944, tablet: 15169300.215178049.f243bdd9d66cdefa-cfb44faab158ebae
W0812 03:40:35.731817 115443 engine_batch_load_task.cpp:333] fail to push delta, transaction_id=4261944 tablet=15169300.215178049.f243bdd9d66cdefa-cfb44faab158ebae, cost=33s945ms
I0812 03:40:35.731967 115443 engine_batch_load_task.cpp:270] Push finish, cost time: 34
W0812 03:40:35.731976 115443 engine_batch_load_task.cpp:79] push internal error, need retry.signature: 4262410
W0812 03:40:35.731987 115443 task_worker_pool.cpp:643] push failed, error_code: -1, signature: 4262410
I0812 03:40:35.732393 115443 task_worker_pool.cpp:272] finish task success.
I0812 03:40:35.732409 115443 task_worker_pool.cpp:258] remove task info. type=REALTIME_PUSH, signature=4262410, queue_size=6

 

收藏
点赞
0
个赞
共1条回复 最后由7丶s1ence回复于2021-09-24
#27丶s1ence回复于2021-09-24

目前的解决方案是将

ParquetWriter 中 enableDictionary 置为false 不开启

0
快速回复
TOP
切换版块