最后编辑于2021-05
Doris 版本 : Doris version trunk (0.12)
100 个线程并发测试Doris 分区表的查询能力
查看Profile 发现查询计划下发到Be后,多个Instances中必定存在某一两个 Instance 的Scan Node很慢超过5s,其他Instance 都在毫秒级别返回
慢 Instance Profile 见附录1, 最开始怀疑是慢Be,把慢Be 加入到slow_be_addresses 后又会出现其他慢Be
在慢instance 的OlapScanNode中reader_init_time 占到2s多 想请教下大佬们 这种情况可能的原因有哪些
补充一点: 上述这种情况只会在查历史分区的时候出现(2天前),如果查最近这两天刚导入的分区就都很快没有慢Instance 这种情况
正常Instance Profile 见附录2
附录1:慢Instance Profile
Fragment 2:
Instance 6f2c1b2e5bd5408b-9ca88adb9b7eb890 (host=TNetworkAddress(hostname:10.16.166.23, port:9060)):(Active: 7s13ms, % non-child: 0.02%)
- MemoryLimit: 2.00 GB
- FragmentCpuTime: 15.635ms
- PeakUsedReservation: 0.00
- PeakMemoryUsage: 37.02 MB
- RowsProduced: 93
- AverageThreadTokens: 4611385778452229600.00
- ExecuteDelayAfterRpc: 173.0ms
- PeakReservation: 34.00 MB
- GetLockTime: 764ns
- PrepareTime: 142.223ms
BlockMgr:
- BlocksCreated: 0
- BlockWritesOutstanding: 0
- BytesWritten: 0.00
- TotalEncryptionTime: 0ns
- BufferedPins: 0
- TotalReadBlockTime: 0ns
- TotalBufferWaitTime: 0ns
- BlocksRecycled: 0
- TotalIntegrityCheckTime: 0ns
- MaxBlockSize: 8.00 MB
DataStreamSender (dst_id=6):(Active: 62.548us, % non-child: 0.00%)
- OverallThroughput: 51.184444427490234 MB/sec
- PeakMemoryUsage: 66.64 KB
- SerializeBatchTime: 47.432us
- IgnoreRows: 0
- BytesSent: 3.28 KB
- NetworkThroughput(*): 0.0 /sec
- ThriftTransmitTime(*): 0ns
- UncompressedRowBatchSize: 5.49 KB
AGGREGATION_NODE (id=3):(Active: 7s11ms, % non-child: 0.05%)
ExecOption: Streaming Preaggregation
- ReductionFactorEstimate: 0.00
- PeakMemoryUsage: 36.43 MB
- LargestPartitionPercent: 0
- GetResultsTime: 62.531us
- RowsReturned: 93
- PartitionsCreated: 16
- RowsPassedThrough: 0
- ReductionFactorThresholdToExpand: 0.00
- RowsReturnedRate: 13
- HashBuckets: 32.768K (32768)
- HTResizeTime: 184.352us
- StreamingTime: 2.867ms
HASH_JOIN_NODE (id=2):(Active: 7s7ms, % non-child: 0.12%)
- ProbeTime: 8.856ms
- ProbeRows: 115.739K (115739)
- PeakMemoryUsage: 569.38 KB
- LoadFactor: 4604536993823588400.00
- BuildBuckets: 8.192K (8192)
- PushDownTime: 0ns
- BuildRows: 9.913K (9913)
- RowsReturned: 3.479K (3479)
- BuildTime: 1.472ms
- RowsReturnedRate: 496
- PushDownComputeTime: 0ns
EXCHANGE_NODE (id=5):(Active: 252.658ms, % non-child: 2.98%)
- BytesReceived: 90.38 KB
- DeserializeRowBatchTimer: 187.351us
- ConvertRowBatchTime: 161.871us
- PeakMemoryUsage: 32.00 KB
- SendersBlockedTotalTimer(*): 0ns
- RowsReturned: 9.913K (9913)
- FirstBatchArrivalWaitTime: 233.874ms
- DataArrivalWaitTime: 252.439ms
- RowsReturnedRate: 39.234K /sec
OLAP_SCAN_NODE (id=0):(Active: 6s744ms, % non-child: 79.61%)
- CompressedBytesRead: 3.55 MB
- MaxWaitScanTime: 373.8ms
- ScannerBlockPutTimer: 37.247us
- ScanCpuTime: 177.891ms
- BitmapIndexFilterTimer: 0ns
- NumScanners: 1
- RowsStatsFiltered: 0
- OpenTime: 8.300us
- VectorPredEvalTime: 309.673us
- BlockLookupCacheTime: 0ns
- RowsDelFiltered: 0
- RowsetReaderInitTime: 663.739us
- BlockSeekPosPageTime: 0ns
- NumDiskAccess: 0
- GetNextTime: 6s744ms
- TotalPagesNum: 0
- BlockLoadTime: 41.205ms
- CachedPagesNum: 0
- BlocksLoad: 222
- TabletCount : 1
- DecompressorTimer: 5.63ms
- RowsVectorPredFiltered: 0
- ReaderInitTime: 2s685ms
- RowsRead: 191.621K (191621)
- BytesRead: 11.05 MB
- ScannerMaxPendingTimer: 3s886ms
- TotalRawReadTime(*): 0ns
- PeakMemoryUsage: 0.00
- RowsPushedCondFiltered: 0
- ScanRangesComplete: 0
- ScanTime: 177.910ms
- BitmapIndexFilterCount: 0
- BlockSeekTime: 2s348ms
- StartScanTime: 289.97us
- RawRowsRead: 193.022K (193022)
- ScannerThreadsVoluntaryContextSwitches: 0
- IndexLoadTime: 1.533ms
- ScannerThreadsTotalWallClockTime: 0ns
- MaterializeTupleTime(*): 0ns
- ScannerThreadsUserTime: 0ns
- ScannerThreadsSysTime: 0ns
- BlockReadDecompressTime: 0ns
- BlockVerifyChecksumTime: 0ns
- RowsReturnedRate: 17.16K /sec
- UncompressedBytesRead: 6.06 MB
- RowsBloomFilterFiltered: 0
- RowsReturned: 115.739K (115739)
- RowsetNum: 0
- ScannerThreadsInvoluntaryContextSwitches: 0
- BlockPutCacheTime: 0ns
- BlockSeekPageIterTime: 0ns
- PerReadThreadRawHdfsThroughput: 0.0 /sec
- BlockFetchTime: 2s740ms
- ShowHintsTime: 76.128us
- TotalReadThroughput: 0.0 /sec
- IOTimer: 2.52ms
- BlockReadPageTime: 0ns
- BlockConvertTime: 15.87ms
- SegmentNum: 0
- BlockSeekCount: 0
Buffer pool:
- PeakUnpinnedBytes: 0.00
- PeakUsedReservation: 34.00 MB
- ReservationLimit: 80.00 GB
- AllocTime: 19.110us
- CumulativeAllocationBytes: 34.00 MB
- CumulativeAllocations: 17
- PeakReservation: 34.00 MB
附录2 正常Instance Profile 情况
Instance 6f2c1b2e5bd5408b-9ca88adb9b7eb88f (host=TNetworkAddress(hostname:10.16.196.10, port:9060)):(Active: 988.455ms, % non-child: 0.02%)
- MemoryLimit: 2.00 GB
- FragmentCpuTime: 12.92ms
- PeakUsedReservation: 0.00
- PeakMemoryUsage: 36.89 MB
- RowsProduced: 91
- AverageThreadTokens: 4610184818551597600.00
- ExecuteDelayAfterRpc: 91.0ms
- PeakReservation: 34.00 MB
- GetLockTime: 462ns
- PrepareTime: 184.704ms
BlockMgr:
- BlocksCreated: 0
- BlockWritesOutstanding: 0
- BytesWritten: 0.00
- TotalEncryptionTime: 0ns
- BufferedPins: 0
- TotalReadBlockTime: 0ns
- TotalBufferWaitTime: 0ns
- BlocksRecycled: 0
- TotalIntegrityCheckTime: 0ns
- MaxBlockSize: 8.00 MB
DataStreamSender (dst_id=6):(Active: 69.690us, % non-child: 0.00%)
- OverallThroughput: 44.939969062805176 MB/sec
- PeakMemoryUsage: 66.64 KB
- SerializeBatchTime: 57.438us
- IgnoreRows: 0
- BytesSent: 3.21 KB
- NetworkThroughput(*): 0.0 /sec
- ThriftTransmitTime(*): 0ns
- UncompressedRowBatchSize: 5.37 KB
AGGREGATION_NODE (id=3):(Active: 986.636ms, % non-child: 0.05%)
ExecOption: Streaming Preaggregation
- ReductionFactorEstimate: 0.00
- PeakMemoryUsage: 36.30 MB
- LargestPartitionPercent: 0
- GetResultsTime: 80.590us
- RowsReturned: 91
- PartitionsCreated: 16
- RowsPassedThrough: 0
- ReductionFactorThresholdToExpand: 0.00
- RowsReturnedRate: 92
- HashBuckets: 32.768K (32768)
- HTResizeTime: 107.622us
- StreamingTime: 3.49ms
HASH_JOIN_NODE (id=2):(Active: 982.247ms, % non-child: 0.10%)
ExecOption: Hash Table Built Asynchronously
- ProbeTime: 6.751ms
- ProbeRows: 116.102K (116102)
- PeakMemoryUsage: 569.38 KB
- LoadFactor: 4604536993823588400.00
- BuildBuckets: 8.192K (8192)
- PushDownTime: 0ns
- BuildRows: 9.913K (9913)
- RowsReturned: 3.582K (3582)
- BuildTime: 1.836ms
- RowsReturnedRate: 3.646K /sec
- PushDownComputeTime: 0ns
EXCHANGE_NODE (id=5):(Active: 241.235ms, % non-child: 2.85%)
- BytesReceived: 90.38 KB
- DeserializeRowBatchTimer: 220.75us
- ConvertRowBatchTime: 154.816us
- PeakMemoryUsage: 32.00 KB
- SendersBlockedTotalTimer(*): 0ns
- RowsReturned: 9.913K (9913)
- FirstBatchArrivalWaitTime: 220.854ms
- DataArrivalWaitTime: 241.19ms
- RowsReturnedRate: 41.092K /sec
OLAP_SCAN_NODE (id=0):(Active: 732.300ms, % non-child: 8.64%)
- CompressedBytesRead: 3.57 MB
- MaxWaitScanTime: 207.670us
- ScannerBlockPutTimer: 32.941us
- ScanCpuTime: 154.120ms
- BitmapIndexFilterTimer: 0ns
- NumScanners: 1
- RowsStatsFiltered: 0
- OpenTime: 9.472us
- VectorPredEvalTime: 277.373us
- BlockLookupCacheTime: 0ns
- RowsDelFiltered: 0
- RowsetReaderInitTime: 1.872ms
- BlockSeekPosPageTime: 0ns
- NumDiskAccess: 0
- GetNextTime: 732.309ms
- TotalPagesNum: 0
- BlockLoadTime: 43.325ms
- CachedPagesNum: 0
- BlocksLoad: 212
- TabletCount : 1
- DecompressorTimer: 5.299ms
- RowsVectorPredFiltered: 0
- ReaderInitTime: 583.623ms
- RowsRead: 192.361K (192361)
- BytesRead: 11.12 MB
- ScannerMaxPendingTimer: 138.885us
- TotalRawReadTime(*): 0ns
- PeakMemoryUsage: 0.00
- RowsPushedCondFiltered: 0
- ScanRangesComplete: 0
- ScanTime: 154.136ms
- BitmapIndexFilterCount: 0
- BlockSeekTime: 576.145ms
- StartScanTime: 274.938us
- RawRowsRead: 194.34K (194340)
- ScannerThreadsVoluntaryContextSwitches: 0
- IndexLoadTime: 1.546ms
- ScannerThreadsTotalWallClockTime: 0ns
- MaterializeTupleTime(*): 0ns
- ScannerThreadsUserTime: 0ns
- ScannerThreadsSysTime: 0ns
- BlockReadDecompressTime: 0ns
- BlockVerifyChecksumTime: 0ns
- RowsReturnedRate: 158.544K /sec
- UncompressedBytesRead: 6.11 MB
- RowsBloomFilterFiltered: 0
- RowsReturned: 116.102K (116102)
- RowsetNum: 0
- ScannerThreadsInvoluntaryContextSwitches: 0
- BlockPutCacheTime: 0ns
- BlockSeekPageIterTime: 0ns
- PerReadThreadRawHdfsThroughput: 0.0 /sec
- BlockFetchTime: 641.70ms
- ShowHintsTime: 52.977us
- TotalReadThroughput: 0.0 /sec
- IOTimer: 1.857ms
- BlockReadPageTime: 0ns
- BlockConvertTime: 17.966ms
- SegmentNum: 0
- BlockSeekCount: 0
Buffer pool:
- PeakUnpinnedBytes: 0.00
- PeakUsedReservation: 34.00 MB
- ReservationLimit: 80.00 GB
- AllocTime: 23.533us
- CumulativeAllocationBytes: 34.00 MB
- CumulativeAllocations: 17
- PeakReservation: 34.00 MB
收藏
点赞
0
个赞
请登录后评论
TOP
切换版块
定位了问题 应该是storage v1 导致的异常,BlockSeekTime 都要两秒多,换到v2 后一切正常 感谢
好的 感谢
我也觉得很奇怪 明明应该是实时导入的 没有compaction 应该慢才对
先升级之后再测下试试
首先,0.12 这个版本有些古老了,Doris 后面也做过大量的这方便的优化。
reader init 耗时长有可能是 IO 资源问题,也有可能是因为 tablet 对应的数据版本过多的问题。
你这个场景比较奇怪,通常查询新导入的数据可能会更慢一点(因为部分版本还没有合并),而历史数据应该会更快(因为合并度更高。)
建议先进行版本升级,再做测试。