Doris reader_init_time 慢
7丶s1ence 发布于2021-05 浏览:2502 回复:3
0
收藏
最后编辑于2021-05

Doris 版本 : Doris version trunk (0.12)

100 个线程并发测试Doris 分区表的查询能力

查看Profile 发现查询计划下发到Be后,多个Instances中必定存在某一两个 Instance 的Scan Node很慢超过5s,其他Instance 都在毫秒级别返回

慢 Instance Profile 见附录1, 最开始怀疑是慢Be,把慢Be 加入到slow_be_addresses 后又会出现其他慢Be

在慢instance 的OlapScanNode中reader_init_time 占到2s多 想请教下大佬们 这种情况可能的原因有哪些

补充一点: 上述这种情况只会在查历史分区的时候出现(2天前),如果查最近这两天刚导入的分区就都很快没有慢Instance 这种情况

 

正常Instance Profile 见附录2

 

附录1:慢Instance Profile

Fragment 2:
      Instance 6f2c1b2e5bd5408b-9ca88adb9b7eb890 (host=TNetworkAddress(hostname:10.16.166.23, port:9060)):(Active: 7s13ms, % non-child: 0.02%)
         - MemoryLimit: 2.00 GB
         - FragmentCpuTime: 15.635ms
         - PeakUsedReservation: 0.00 
         - PeakMemoryUsage: 37.02 MB
         - RowsProduced: 93
         - AverageThreadTokens: 4611385778452229600.00
         - ExecuteDelayAfterRpc: 173.0ms
         - PeakReservation: 34.00 MB
         - GetLockTime: 764ns
         - PrepareTime: 142.223ms
        BlockMgr:
           - BlocksCreated: 0
           - BlockWritesOutstanding: 0
           - BytesWritten: 0.00 
           - TotalEncryptionTime: 0ns
           - BufferedPins: 0
           - TotalReadBlockTime: 0ns
           - TotalBufferWaitTime: 0ns
           - BlocksRecycled: 0
           - TotalIntegrityCheckTime: 0ns
           - MaxBlockSize: 8.00 MB
        DataStreamSender (dst_id=6):(Active: 62.548us, % non-child: 0.00%)
           - OverallThroughput: 51.184444427490234 MB/sec
           - PeakMemoryUsage: 66.64 KB
           - SerializeBatchTime: 47.432us
           - IgnoreRows: 0
           - BytesSent: 3.28 KB
           - NetworkThroughput(*): 0.0 /sec
           - ThriftTransmitTime(*): 0ns
           - UncompressedRowBatchSize: 5.49 KB
        AGGREGATION_NODE (id=3):(Active: 7s11ms, % non-child: 0.05%)
          ExecOption: Streaming Preaggregation
           - ReductionFactorEstimate: 0.00
           - PeakMemoryUsage: 36.43 MB
           - LargestPartitionPercent: 0
           - GetResultsTime: 62.531us
           - RowsReturned: 93
           - PartitionsCreated: 16
           - RowsPassedThrough: 0
           - ReductionFactorThresholdToExpand: 0.00
           - RowsReturnedRate: 13
           - HashBuckets: 32.768K (32768)
           - HTResizeTime: 184.352us
           - StreamingTime: 2.867ms
          HASH_JOIN_NODE (id=2):(Active: 7s7ms, % non-child: 0.12%)
             - ProbeTime: 8.856ms
             - ProbeRows: 115.739K (115739)
             - PeakMemoryUsage: 569.38 KB
             - LoadFactor: 4604536993823588400.00
             - BuildBuckets: 8.192K (8192)
             - PushDownTime: 0ns
             - BuildRows: 9.913K (9913)
             - RowsReturned: 3.479K (3479)
             - BuildTime: 1.472ms
             - RowsReturnedRate: 496
             - PushDownComputeTime: 0ns
            EXCHANGE_NODE (id=5):(Active: 252.658ms, % non-child: 2.98%)
               - BytesReceived: 90.38 KB
               - DeserializeRowBatchTimer: 187.351us
               - ConvertRowBatchTime: 161.871us
               - PeakMemoryUsage: 32.00 KB
               - SendersBlockedTotalTimer(*): 0ns
               - RowsReturned: 9.913K (9913)
               - FirstBatchArrivalWaitTime: 233.874ms
               - DataArrivalWaitTime: 252.439ms
               - RowsReturnedRate: 39.234K /sec
            OLAP_SCAN_NODE (id=0):(Active: 6s744ms, % non-child: 79.61%)
               - CompressedBytesRead: 3.55 MB
               - MaxWaitScanTime: 373.8ms
               - ScannerBlockPutTimer: 37.247us
               - ScanCpuTime: 177.891ms
               - BitmapIndexFilterTimer: 0ns
               - NumScanners: 1
               - RowsStatsFiltered: 0
               - OpenTime: 8.300us
               - VectorPredEvalTime: 309.673us
               - BlockLookupCacheTime: 0ns
               - RowsDelFiltered: 0
               - RowsetReaderInitTime: 663.739us
               - BlockSeekPosPageTime: 0ns
               - NumDiskAccess: 0
               - GetNextTime: 6s744ms
               - TotalPagesNum: 0
               - BlockLoadTime: 41.205ms
               - CachedPagesNum: 0
               - BlocksLoad: 222
               - TabletCount : 1
               - DecompressorTimer: 5.63ms
               - RowsVectorPredFiltered: 0
               - ReaderInitTime: 2s685ms
               - RowsRead: 191.621K (191621)
               - BytesRead: 11.05 MB
               - ScannerMaxPendingTimer: 3s886ms
               - TotalRawReadTime(*): 0ns
               - PeakMemoryUsage: 0.00 
               - RowsPushedCondFiltered: 0
               - ScanRangesComplete: 0
               - ScanTime: 177.910ms
               - BitmapIndexFilterCount: 0
               - BlockSeekTime: 2s348ms
               - StartScanTime: 289.97us
               - RawRowsRead: 193.022K (193022)
               - ScannerThreadsVoluntaryContextSwitches: 0
               - IndexLoadTime: 1.533ms
               - ScannerThreadsTotalWallClockTime: 0ns
                 - MaterializeTupleTime(*): 0ns
                 - ScannerThreadsUserTime: 0ns
                 - ScannerThreadsSysTime: 0ns
               - BlockReadDecompressTime: 0ns
               - BlockVerifyChecksumTime: 0ns
               - RowsReturnedRate: 17.16K /sec
               - UncompressedBytesRead: 6.06 MB
               - RowsBloomFilterFiltered: 0
               - RowsReturned: 115.739K (115739)
               - RowsetNum: 0
               - ScannerThreadsInvoluntaryContextSwitches: 0
               - BlockPutCacheTime: 0ns
               - BlockSeekPageIterTime: 0ns
               - PerReadThreadRawHdfsThroughput: 0.0 /sec
               - BlockFetchTime: 2s740ms
               - ShowHintsTime: 76.128us
               - TotalReadThroughput: 0.0 /sec
               - IOTimer: 2.52ms
               - BlockReadPageTime: 0ns
               - BlockConvertTime: 15.87ms
               - SegmentNum: 0
               - BlockSeekCount: 0
          Buffer pool:
             - PeakUnpinnedBytes: 0.00 
             - PeakUsedReservation: 34.00 MB
             - ReservationLimit: 80.00 GB
             - AllocTime: 19.110us
             - CumulativeAllocationBytes: 34.00 MB
             - CumulativeAllocations: 17
             - PeakReservation: 34.00 MB

 

 

附录2 正常Instance Profile 情况

Instance 6f2c1b2e5bd5408b-9ca88adb9b7eb88f (host=TNetworkAddress(hostname:10.16.196.10, port:9060)):(Active: 988.455ms, % non-child: 0.02%)
         - MemoryLimit: 2.00 GB
         - FragmentCpuTime: 12.92ms
         - PeakUsedReservation: 0.00 
         - PeakMemoryUsage: 36.89 MB
         - RowsProduced: 91
         - AverageThreadTokens: 4610184818551597600.00
         - ExecuteDelayAfterRpc: 91.0ms
         - PeakReservation: 34.00 MB
         - GetLockTime: 462ns
         - PrepareTime: 184.704ms
        BlockMgr:
           - BlocksCreated: 0
           - BlockWritesOutstanding: 0
           - BytesWritten: 0.00 
           - TotalEncryptionTime: 0ns
           - BufferedPins: 0
           - TotalReadBlockTime: 0ns
           - TotalBufferWaitTime: 0ns
           - BlocksRecycled: 0
           - TotalIntegrityCheckTime: 0ns
           - MaxBlockSize: 8.00 MB
        DataStreamSender (dst_id=6):(Active: 69.690us, % non-child: 0.00%)
           - OverallThroughput: 44.939969062805176 MB/sec
           - PeakMemoryUsage: 66.64 KB
           - SerializeBatchTime: 57.438us
           - IgnoreRows: 0
           - BytesSent: 3.21 KB
           - NetworkThroughput(*): 0.0 /sec
           - ThriftTransmitTime(*): 0ns
           - UncompressedRowBatchSize: 5.37 KB
        AGGREGATION_NODE (id=3):(Active: 986.636ms, % non-child: 0.05%)
          ExecOption: Streaming Preaggregation
           - ReductionFactorEstimate: 0.00
           - PeakMemoryUsage: 36.30 MB
           - LargestPartitionPercent: 0
           - GetResultsTime: 80.590us
           - RowsReturned: 91
           - PartitionsCreated: 16
           - RowsPassedThrough: 0
           - ReductionFactorThresholdToExpand: 0.00
           - RowsReturnedRate: 92
           - HashBuckets: 32.768K (32768)
           - HTResizeTime: 107.622us
           - StreamingTime: 3.49ms
          HASH_JOIN_NODE (id=2):(Active: 982.247ms, % non-child: 0.10%)
            ExecOption: Hash Table Built Asynchronously
             - ProbeTime: 6.751ms
             - ProbeRows: 116.102K (116102)
             - PeakMemoryUsage: 569.38 KB
             - LoadFactor: 4604536993823588400.00
             - BuildBuckets: 8.192K (8192)
             - PushDownTime: 0ns
             - BuildRows: 9.913K (9913)
             - RowsReturned: 3.582K (3582)
             - BuildTime: 1.836ms
             - RowsReturnedRate: 3.646K /sec
             - PushDownComputeTime: 0ns
            EXCHANGE_NODE (id=5):(Active: 241.235ms, % non-child: 2.85%)
               - BytesReceived: 90.38 KB
               - DeserializeRowBatchTimer: 220.75us
               - ConvertRowBatchTime: 154.816us
               - PeakMemoryUsage: 32.00 KB
               - SendersBlockedTotalTimer(*): 0ns
               - RowsReturned: 9.913K (9913)
               - FirstBatchArrivalWaitTime: 220.854ms
               - DataArrivalWaitTime: 241.19ms
               - RowsReturnedRate: 41.092K /sec
            OLAP_SCAN_NODE (id=0):(Active: 732.300ms, % non-child: 8.64%)
               - CompressedBytesRead: 3.57 MB
               - MaxWaitScanTime: 207.670us
               - ScannerBlockPutTimer: 32.941us
               - ScanCpuTime: 154.120ms
               - BitmapIndexFilterTimer: 0ns
               - NumScanners: 1
               - RowsStatsFiltered: 0
               - OpenTime: 9.472us
               - VectorPredEvalTime: 277.373us
               - BlockLookupCacheTime: 0ns
               - RowsDelFiltered: 0
               - RowsetReaderInitTime: 1.872ms
               - BlockSeekPosPageTime: 0ns
               - NumDiskAccess: 0
               - GetNextTime: 732.309ms
               - TotalPagesNum: 0
               - BlockLoadTime: 43.325ms
               - CachedPagesNum: 0
               - BlocksLoad: 212
               - TabletCount : 1
               - DecompressorTimer: 5.299ms
               - RowsVectorPredFiltered: 0
               - ReaderInitTime: 583.623ms
               - RowsRead: 192.361K (192361)
               - BytesRead: 11.12 MB
               - ScannerMaxPendingTimer: 138.885us
               - TotalRawReadTime(*): 0ns
               - PeakMemoryUsage: 0.00 
               - RowsPushedCondFiltered: 0
               - ScanRangesComplete: 0
               - ScanTime: 154.136ms
               - BitmapIndexFilterCount: 0
               - BlockSeekTime: 576.145ms
               - StartScanTime: 274.938us
               - RawRowsRead: 194.34K (194340)
               - ScannerThreadsVoluntaryContextSwitches: 0
               - IndexLoadTime: 1.546ms
               - ScannerThreadsTotalWallClockTime: 0ns
                 - MaterializeTupleTime(*): 0ns
                 - ScannerThreadsUserTime: 0ns
                 - ScannerThreadsSysTime: 0ns
               - BlockReadDecompressTime: 0ns
               - BlockVerifyChecksumTime: 0ns
               - RowsReturnedRate: 158.544K /sec
               - UncompressedBytesRead: 6.11 MB
               - RowsBloomFilterFiltered: 0
               - RowsReturned: 116.102K (116102)
               - RowsetNum: 0
               - ScannerThreadsInvoluntaryContextSwitches: 0
               - BlockPutCacheTime: 0ns
               - BlockSeekPageIterTime: 0ns
               - PerReadThreadRawHdfsThroughput: 0.0 /sec
               - BlockFetchTime: 641.70ms
               - ShowHintsTime: 52.977us
               - TotalReadThroughput: 0.0 /sec
               - IOTimer: 1.857ms
               - BlockReadPageTime: 0ns
               - BlockConvertTime: 17.966ms
               - SegmentNum: 0
               - BlockSeekCount: 0
          Buffer pool:
             - PeakUnpinnedBytes: 0.00 
             - PeakUsedReservation: 34.00 MB
             - ReservationLimit: 80.00 GB
             - AllocTime: 23.533us
             - CumulativeAllocationBytes: 34.00 MB
             - CumulativeAllocations: 17
             - PeakReservation: 34.00 MB

 

收藏
点赞
0
个赞
共3条回复 最后由7丶s1ence回复于2021-05
#47丶s1ence回复于2021-05

定位了问题 应该是storage v1 导致的异常,BlockSeekTime 都要两秒多,换到v2 后一切正常 感谢

 

1
#37丶s1ence回复于2021-05

好的 感谢

我也觉得很奇怪 明明应该是实时导入的 没有compaction 应该慢才对

先升级之后再测下试试

0
#2IamStrangers回复于2021-05

首先,0.12 这个版本有些古老了,Doris 后面也做过大量的这方便的优化。

reader init 耗时长有可能是 IO 资源问题,也有可能是因为 tablet 对应的数据版本过多的问题。

你这个场景比较奇怪,通常查询新导入的数据可能会更慢一点(因为部分版本还没有合并),而历史数据应该会更快(因为合并度更高。)

建议先进行版本升级,再做测试。

0
快速回复
TOP
切换版块