模型支持情况说明

更新时间：2025-09-15

本文介绍了模型支持情况，在调用模型精调V2版本部分API时，需查看此文档各参数支持情况。

对话续写类

SFT

ERNIE系列

model	trainMode	parameterScale	hyperParameterConfig
ERNIE-Lite-8K-0308	SFT	FullFineTuning、LoRA 、LoRA-GA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA、LoRA-GA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen： FullFineTuning、LoRA：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 LoRA-GA：单选，4096 或 8192，默认值4096 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长4 LoRA：[1,10000]，默认值16, 步长8（当maxSeqLen=8192时，推荐步长4） LoRA-GA：[1,10000]，默认值16，步长4（当maxSeqLen=8192时，推荐步长8） · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8，默认为8 loraAllLinear：单选，True 或 False，默认为True · 仅LoRA-GA支持： loraRank：单选，8 或 64，默认为64 loragaInitIters：[0,10000000]，默认值4 loragaStableGamma：[0,10000000]，默认值64 loragaGradientOffload：字符串，False 或 True，默认值False loraAllLinear：单选，True 或 False，默认为True
ERNIE-Lite-128K-0722	SFT	FullFineTuning、LoRA、LoRA-GA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA、LoRA-GA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize：[1,10000]，默认16，步长1 · pseudoSamplingProb：[0,1.9]，默认值0，步长0.1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA-GA支持： loragaInitIters：[0,10000000]，默认值4 loragaStableGamma：[0,10000000]，默认值64 loragaGradientOffload：字符串，False 或 True，默认值False
ERNIE-Speed-8K	SFT	FullFineTuning、LoRA、LoRA-GA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA、LoRA-GA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen： FullFineTuning、LoRA：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 LoRA-GA：单选，4096 或 8192，默认值4096 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长2（当maxSeqLen=8192时，推荐步长1） LoRA、LoRA-GA：[1,10000]，默认值16，步长4（当maxSeqLen=8192时，推荐步长2） · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，8 或 64，默认为64 loraAllLinear：单选，True 或 False，默认为True · 仅LoRA-GA支持： loraRank：单选，8 或 64，默认为64 loragaInitIters：[0,10000000]，默认值4 loragaStableGamma：[0,10000000]，默认值64 loragaGradientOffload：字符串，False 或 True，默认值False loraAllLinear：单选，True 或 False，默认为True
ERNIE-Character-8K-0321	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长2） LoRA：[1,10000]，默认值16，步长2（当maxSeqLen=4096时，推荐步长2；当maxSeqLen=8192时，推荐步长1） · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8，默认为8 loraAllLinear：单选，True 或 False，默认为True
ERNIE-Tiny-8K	SFT	FullFineTuning、LoRA、LoRA-GA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA、LoRA-GA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen： FullFineTuning、LoRA：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 LoRA-GA：单选，4096 或 8192，默认值4096 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize： FullFineTuning：[1,10000]，默认值32，步长16（当maxSeqLen=8192时，推荐步长8） LoRA、LoRA-GA：[1,10000]，默认值32，步长16 · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8，默认为8 loraAllLinear：单选，True 或 False，默认为True · 仅LoRA-GA支持： loraRank：单选，8 或 64，默认为64 loragaInitIters：[0,10000000]，默认值4 loragaStableGamma：[0,10000000]，默认值64 loragaGradientOffload：字符串，False 或 True，默认值False loraAllLinear：单选，True 或 False，默认为True >
ERNIE-Speed-Pro-128K	SFT	FullFineTuning、LoRA、LoRA-GA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA、LoRA-GA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen： FullFineTuning、LoRA：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 LoRA-GA：单选，4096 或 8192，默认值8192 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长2（当maxSeqLen=131072时，推荐步长1） LoRA：[1,10000]，默认值16，步长4（当maxSeqLen=131072时，推荐步长1） · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 8 或 4，默认为8 loraAllLinear：单选，True 或 False，默认为True · 仅LoRA-GA支持： loraRank：单选，8 或 64，默认为64 loragaInitIters：[0,10000000]，默认值4 loragaStableGamma：[0,10000000]，默认值64 loragaGradientOffload：字符串，False 或 True，默认值False loraAllLinear：单选，True 或 False，默认为True
ERNIE-Tiny-128K-0929	SFT	FullFineTuning、LoRA、LoRA-GA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA、LoRA-GA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize： FullFineTuning、LoRA：[1,10000]，默认值16，步长4（当maxSeqLen=65536时，推荐步长2，当maxSeqLen=131072时，推荐步长8） LoRA-GA：[1,10000]，默认值16，步长4（当maxSeqLen=131072时，推荐步长1） · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8，默认为8 loraAllLinear：单选，True 或 False，默认为True · 仅LoRA-GA支持： loraRank：单选，8 或 64，默认为64 loragaInitIters：[0,10000000]，默认值4 loragaStableGamma：[0,10000000]，默认值64 loragaGradientOffload：字符串，False 或 True，默认值False loraAllLinear：单选，True 或 False，默认为True
ERNIE-4.5-Turbo-128K	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认0.00003，步长0.000001 LoRA：[0.0000001,0.001]，默认0.0003，步长0.000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值 4096 · globalBatchSize： FullFineTuning：[1,10000]，默认64，步长1 LoRA：[1,10000]，默认64，步长1 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType： FullFineTuning：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear LoRA：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值constant · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 或 16 或 32 或 64，默认为64 loraAllLinear：单选，True 或 False，默认为True
ERNIE-Character-Fiction-8K-1028	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003 LoRA：[0.000001,0.001]，默认值0.0003 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize： FullFineTuning：[1,10000]，默认值2，步长1（当maxSeqLen=4096时，推荐步长2） LoRA：[1,10000]，默认值4，步长1（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长2） · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8,默认值8 loraAllLinear：单选，True 或 False，默认为True
ERNIE-Code-3-128K	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003 LoRA：[0.000001,0.001]，默认值0.0003 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072 默认值32768 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize：[1,10000]，默认值16，步长1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · tensorParallelDegree:[1,8]，默认值8 · 仅LoRA支持： loraRank：单选，8 或 64，默认值64 loraAllLinear：单选，True 或 False，默认为True
Qianfan-Sug	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.00003，步长0.000001 LoRA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 < · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · globalBatchSize： FullFineTuning：[1,10000]，默认值32，步长16（当maxSeqLen=8192时，推荐步长8） LoRA：[1,10000]，默认值32, 步长16 · pseudoSamplingProb：[0,1]，默认值0，步长0.1 · checkpointSaveStrategy：单选，step或epoch，默认值step · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认为64，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8，默认为8 loraAllLinear：单选，True 或 False，默认为True

开源系列

model	trainMode	parameterScale	hyperParameterConfig
Meta-Llama-3.1-8B	SFT	FullFineTuning	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · validationStep:[0,1000000]，默认值16,步长1 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：当参数checkpointSaveStrategy=step时，此参数有效 FullFineTuning：[64,4096]，默认值64 LoRA：[64,4096]，默认值256 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Meta-Llama-3-8B	SFT	FullFineTuning	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,2]，默认值1 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep:[0,1000000]，默认值16, 步长1
Meta-Llama-3.2-1B-128K	SFT	FullFineTuning	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · validationStep：[0,1000000]，默认值16，步长1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值8192 · batchSize：[1,N]，默认值1，其中的 N 和 maxSeqLen 有关联，关联关系如下： maxSeqLen = 131072 时，N=1 maxSeqLen = 65536 时，N=2 maxSeqLen = 32768 时，N=4 maxSeqLen = 16384 时，N=8 maxSeqLen = 8192 时，N=16 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效
Qianfan-Chinese-Llama-2-1.3B	SFT	FullFineTuning	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1
Qianfan-Chinese-Llama-2-7B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,8]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qianfan-Chinese-Llama-2-7B-32K	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001, 步长0.000001 · batchSize：1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，4096 或 8192 或 16384 或 32768，默认值32768 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值64 · validationStep：[0,1000000]，默认值16，步长1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qianfan-Chinese-Llama-2-13B-v1	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,8]，默认1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qianfan-Chinese-Llama-2-13B-v2	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,8]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Mixtral-8x7B	SFT	FullFineTuning	· epoch：[1,20]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.00001，步长0.000001 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
SQLCoder-7B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
ChatGLM2-6B-32K	SFT	FullFineTuning	· epoch：[1,50]，默认值3 · maxSeqLen：单选，4096 或 8192 或 16384 或 32768，默认值32768 · batchSize32k：1，前置条件maxSeqLen=32768 · batchSize16k：[1,2]，默认值1，前置条件maxSeqLen=16384 · batchSize8k：[1,6]，默认值1，前置条件maxSeqLen=8192 · batchSize4k：[1,12]，默认值 1，前置条件:maxSeqLen=4096 · Packing：字符串，true 或 false 或 auto，默认值true · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03,，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · validationStep：[0,1000000]，默认值16,，步长1 · saveStep：[64,4096]，默认值256
ChatGLM2-6B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,2]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
ChatGLM3-6B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 FullFineTuning：[1,50]，默认值3 LoRA：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：16 或 32 或 64，默认值16 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen： FullFineTuning：单选，4096 或 8192，默认值4096 LoRA：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Baichuan2-7B-Chat	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Baichuan2-13B-Chat	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,2]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
BLOOMZ-7B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
CodeLlama-7B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[64,4096]，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅Lora支持： loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1 loraTargetModules：多选，self_attn.q_proj、self_attn.k_proj、self_attn.v_proj、self_attn.o_proj、mlp.gate_proj、mlp.up_proj、mlp.down_proj，默认值self_attn.q_proj + self_attn.v_proj
Custom-Model（自定义模型）	SFT	FullFineTuning	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001
Qwen2.5-1.5B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen2.5-3B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen2.5-7B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen2.5-32B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen2.5-72B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen2.5-14B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen3-0.6B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，说明：当checkpointSaveStrategy=step，该字段有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen3-1.7B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen3-8B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen3-14B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
Qwen3-32B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
QwQ-32B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · globalBatchSize:[8,100000]，默认值16，步长8 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpoint_save_strategy：step 或 epoch，默认值：step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 ，当checkpointSaveStrategy=step，该参数有效 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
ChatGLM4-9B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,4]，默认值1 · Packing：字符串，true 或 false 或 auto，默认值false · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 · validationStep：[0,1000000]，默认值16，步长1 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64，默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
DeepSeek-R1-Distill-Qwen-32B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1，步长1 · validationStep：[0,1000000]，默认值16，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = step，该参数有效 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
DeepSeek-R1-Distill-Qwen-7B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 checkpointSaveStrategy: 单选，step 或 epoch，默认step · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1，步长1 · validationStep：[0,1000000]，默认值16，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = step，该参数有效 · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 默认值32 loraAlpha：单选，8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
DeepSeek-R1	SFT	LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · batchSize：[1,4]，默认值1 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · loraRank：8 或 16 或 32 或 64，默认值32 · loraAlpha：8 或 16 或 32 或 64，默认值32 · loraDropout：[0.01, 0.5]，默认值0.1，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096
DeepSeek-R1-Distill-Qwen-14B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096或 8192 或 16384 或 32768，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1，步长1 · validationStep：[0,1000000]，默认值16，步长1 · saveStep：当参数checkpointSaveStrategy=step时，此参数有效 FullFineTuning：[64,4096]，默认值64 LoRA：[64,4096]，默认值256 · 仅LoRA支持： loraRank：8 或 16 或 32 或 64，默认值32 loraAlpha: 8 或 16 或 32 或 64，默认值32 loraDropout: [0.01, 0.5]，默认值0.1，步长0.001
DeepSeek-R1-Distill-Qianfan-Llama-8B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096或 8192 或 16384 或 32768，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1，步长1 · validationStep：[0,1000000]，默认值16，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = step，该参数有效
DeepSeek-R1-Distill-Qwen-1.5B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096或 8192 或 16384 或 32768，默认值4096 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1，步长1 · validationStep：[0,1000000]，默认值16，步长1 · saveStep：[64,4096]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · 仅LoRA支持： loraRank：8 或 16 或 32 或 64，默认值32 loraAlpha：8 或 16 或 32 或 64，默认值32 loraDropout：[0.01,0.5]，默认值0.1，步长0.001
DeepSeek-R1-Distill-Qianfan-Llama-70B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · globalBatchSize：[8,100000]，默认16，步长8 · Packing：字符串，true 或 false 或 auto，默认值true · schedulerName：单选，单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认0.03，步长0.001 · weightDecay：[ 0.001,1]，默认0.01，步长0.001 · checkpoint_save_strategy：step 或 epoch，默认值：step · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认4096 · checkpointCount：[1,10]，默认1 · saveStep：[1,50000]，默认为64，checkpointCount需要是validationStep的整数倍，，当checkpointSaveStrategy = step，该参数有效 · validationStep：[0,1000000]，默认16，步长1 · 仅LoRA： loraRank：单选，[8,16,32,64 ]，默认32 loraAlpha：单选，[8,16,32,64]，默认32 loraDropout：[0.01,0.5]，默认0.1，步长0.001
DeepSeek-V3-0324	SFT	LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · globalBatchSize：[8,100000]，默认16，步长8 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认4096 · loraRank：单选，8 或 16 或 32 或 64 ，默认8 · loraAlpha：单选，8 或 16 或 32 或 64，默认值32 · loraDropout：[0.01,0.5]，默认值0.1，步长0.001

RFT

model	trainMode	parameterScale	hyperParameterConfig
DeepSeek-R1-Distill-Qwen-14B	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384，默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[1,20]，默认值1,步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42
DeepSeek-R1-Distill-Qwen-7B	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384 或 32768，默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[2,20]，默认值2，步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42
Qwen2.5-7B-Instruct	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384 或 32768，默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[2,20]，默认值2，步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42
QwQ-32B	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384，默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[2,20]，默认值2，步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42
Qwen2.5-14B-Instruct	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384，默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[2,20]，默认值2，步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42
Qwen2.5-32B-Instruct	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384，默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[2,20]，默认值2，步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42
Qwen3-32B	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384 或 32768, 默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[2,20]，默认值2，步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42
DeepSeek-R1-Distill-Qwen-32B	RFT	FullFineTuning	· epoch：[1,50]，默认值1 · criticLearningRate：[0.0000001,0.00001]，默认值0.000009，步长0.0000001，当RlMethod=PPO时，此参数有效 · actorLearningRate：[0.0000001,0.00001]，默认值0.0000005，步长0.0000001 · maxSeqLen：4096 或 8192 或 16384 或 32768, 默认值4096 · globalBatchSize：[1,10000]，默认值64（当maxSeqLen=4096时，推荐步长4；当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · rolloutBatchSize：[1,10000]，默认值64，步长4（当maxSeqLen=8192时，推荐步长1；当maxSeqLen=16384时，推荐步长1；当maxSeqLen=32768时，推荐步长1） · numSamplesPerPrompt：[1,32]，默认值8，当RlMethod=GRPO时，此参数有效 · maxPromptLen4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxPromptLen8k：[512,8092]，默认值1024，当maxSeqLen=8092时，此参数有效 · maxPromptLen16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxPromptLen32k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength16k：[512,15360]，默认值1024，当maxSeqLen=16384时，此参数有效 · maxLength32k：[512,30720]，默认值1024，当maxSeqLen=32768时，此参数有效 · maxLength4k：[512,3072]，默认值1024，当maxSeqLen=4096时，此参数有效 · maxLength8k：[512,8092]，默认值1024，当maxSeqLen=8192时，此参数有效 · loggingSteps：[1,1]，默认值1 · klCoeff：[0.00001,0.01]，默认值0.001 · checkpointSaveStrategy：字符串，默认值step · checkpointCount：[2,20]，默认值2，步长1，checkpointCount数不得大于迭代轮次 · saveStep：2 或 4 或 8 或 16 或 32 或 64 或 128 或 256，默认值16，当checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42

PostPretrain

model	trainMode	parameterScale	hyperParameterConfig
ERNIE-Speed-8K	PostPretrain	-	· epoch：[1,50]，默认值1 · learningRate：[0.0000001,0.01]，默认值0.00003，步长0.000001 · maxSeqLen：单选，4096 或 8192，默认值4096 · globalBatchSize：[1,10000]，默认值32，步长1（当maxSeqLen=4096时，推荐步长2） · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.00000001 · power：[1,3]，默认值1 · validationStep：[0,1000000]，默认值16，步长1，saveStep需要是validationStep的整数倍 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效
ERNIE-Tiny-8K	PostPretrain	-	· epoch：[1,50]，默认值1 · learningRate：[0.0000001,0.01]，默认值0.00003，步长0.000001 · maxSeqLen：单选，4096 或 8192，默认值4096 · globalBatchSize：[1,10000]，默认值32，步长8（当maxSeqLen=4096时，推荐步长16） · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.00000001 · power：[1,3]，默认值1 · validationStep：[0,1000000]，默认值16，步长1，saveStep需要是validationStep的整数倍 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效
Qianfan-Chinese-Llama-2-13B-v1	PostPretrain	-	· epoch：1 · learningRate：[0.0000002,0.0002]，默认值0.00002，步长0.000001 · batchSize：[48,960]，默认值192，步长48 · weightDecay：[0.0001,0.05]，默认值0.01，步长0.001 · checkpointCount：[1,10]，默认值1 · saveStep：[64,8192]，默认值64 · validationStep：[0, 1000000]，默认值16，步长1
ERNIE-Speed-Pro-128K	PostPretrain	-	· epoch：[1,50]，默认值1 · learningRate：[0.0000001,0.01]，默认值0.00003，步长0.000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · globalBatchSize：[1,10000]，默认值16，步长1（当maxSeqLen=16384时，推荐步长2；当maxSeqLen=32768时，推荐步长2；当maxSeqLen=65536时，推荐步长2） · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.00000001 · power：[1,3]，默认值1 · validationStep：[0,1000000]，默认值16，步长1，saveStep需要是validationStep的整数倍 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效
ERNIE-Lite-128K-0722	PostPretrain	-	· epoch：[1,50]，默认值1 · learningRate：[0.0000001,0.01]，默认值0.00003，步长0.000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072, 默认值32768 · globalBatchSize：[1,10000], 默认值16, 步长2（当maxSeqLen=16384时，推荐步长4；当maxSeqLen=32768时，推荐步长4；当maxSeqLen=65536时，推荐步长4） · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.00000001 · power：[1,3]，默认值1 · validationStep：[0,1000000]，默认值16，步长1，saveStep需要是validationStep的整数倍 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · tensorParallelDegree：[1,8]，默认值4 · shardingParallelDegree：[1,64]，默认值2 · sharding：stage1 或 stage2 或 stage3，默认值stage2 · recompute：0 或 1，默认值1
ERNIE-Character-Fiction-8K	PostPretrain	-	· epoch：[1,50]，默认值1 · learningRate：[0.00000010, 0.01]，默认值 0.00003，步长0.0000010 · maxSeqLen：单选, 可选项4096、8192, 默认值 4096 · globalBatchSize：[1,10000]，默认值32，步长2（当maxSeqLen=8192时，推荐步长1） · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1, 2147483647]，默认值 42 · lrSchedulerType：单选，可选项linear、cosine、polynomial、constant、constant_with_warmup，默认值 linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001, 0.0000010]，默认值 0.00000010，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0,1000000]，默认值16，步长1 · 早停策略相关参数： earlyStopping：单选, 可选项False、True，默认值 False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：单选，可选项validationLoss，默认值 validationLoss earlyStoppingThreshold：[0,5]，默认值 0.01，步长0.01 earlyStoppingPatience：[1,50]，默认值 3，步长1
Qwen2.5-32B	PostPretrain	-	· epoch：[1,10]，默认值1 · maxSeqLen：单选，4096 或 8192，默认值8192 · batchSize:[32,1024]，默认值：256，步长：32 · learningRate:[0.0000002,0.0002]，默认值0.00002，步长0.000001 · weightDecay:[0.0001,0.05]，默认值:0.01，步长:0.001 · saveStep:[16,8192]，默认值:256 · checkpointCount:[1,10]，默认值1
DeepSeek-R1-Distill-Qwen-32B	PostPretrain	-	· epoch:[1,10]，默认值1 · maxSeqLen:单选，4096 或 8192 或 16384 或 32768，默认值8192 · batchSize:[32,1024]，默认值：256，步长：32 · learningRate:[0.0000002,0.0002]，默认值0.00002，步长0.000001 · weightDecay:[0.0001,0.05]，默认值:0.01，步长:0.001 · saveStep:[16,8192]，默认值:256 · checkpointCount:[1,10]，默认值1

DPO

model	trainMode	parameterScale	hyperParameterConfig
ERNIE-Lite-128K-0722	DPO	FullFineTuning	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · globalBatchSize：[1,10000]，默认值16，步长1（当maxSeqLen=16384时，推荐步长2；当maxSeqLen=32768时，推荐步长2；当maxSeqLen=65536时，推荐步长2，） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0, 1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，当参数earlyStopping为True时，此参数有效
ERNIE-Speed-8K	DPO	FullFineTuning	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[1,10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长2） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0, 1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，当参数earlyStopping为True时，此参数有效
ERNIE-Tiny-8K	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值32，步长8（当maxSeqLen=4096时，推荐步长16） LoRA：[1,10000]，默认值32，步长8（当maxSeqLen=4096时，推荐步长16；当maxSeqLen=8192时，推荐步长16） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0, 1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Speed-Pro-128K	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1 LoRA：[1,10000]，默认值16，步长1（当maxSeqLen=16384时，推荐步长4；当maxSeqLen=32768时，推荐步长4；当maxSeqLen=65536时，推荐步长4） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0, 1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，8 或 64 ，默认值64
ERNIE-Tiny-128K-0929	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · globalBatchSize：[1,10000]，默认值16，步长2（当maxSeqLen=16384时，推荐步长4；当maxSeqLen=32768时，推荐步长4；当maxSeqLen=131072时，推荐步长8） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0, 1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Character-8K-0321	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate： FullFineTuning：[0.0000001,0.01]，默认值0.000001，步长0.0000001 LoRA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1 LoRA：[1,10000]，默认值16，步长2 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta： FullFineTuning：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0, 1000000]，默认值16，步长1 · lossType： FullFineTuning：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： pseudoSamplingProb：[0,0.9]，默认值0，步长0.1 loraAllLinear：True 或 False，默认值True loraRank：2 或 4 或 8，默认值8
Meta-Llama-3.1-8B	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 LoRA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen：单选,1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[1,10000]，默认值16，步长1 · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · validationStep：[0, 1000000]，默认值16，步长1 · checkpoint_save_strategy：step 或 epoch，默认值：step · saveStep：[1,50000]，默认为64，Checkpoint保存间隔数需要是验证步数的整数倍，当checkpointSaveStrategy = step，此参数有效 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · perDeviceTrainBatchSize：[1,8]，默认值1 · 仅LoRA支持： loraRank：[8,64], 默认值64
ERNIE-4.5-Turbo-128K	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，4096 或 8192，默认值8192 · maxPromptLen：[1,131062]，默认值2048 · maxSteps：[0,10000000]，默认值0 · recompute： FullFineTuning：0或1，默认值1 LoRA：0或1，默认值0 · globalBatchSize：[1,10000]，默认值16，步长1 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍，当参数checkpointSaveStrategy=step时，此参数有效 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3]，默认值1 · validationStep：[0, 1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：[2,4,8], 默认值8
Baichuan2-7B-Chat	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 LoRA：[0.000001,0.001]，默认值0.0003，步长0.000001 · maxSeqLen：单选,1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[1,10000]，默认值16，步长1 · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · seed：[1,2147483647]，默认值42 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，saveStep需要是validationStep的整数倍 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · validationStep：[0, 1000000]，默认值16，步长1 · checkpoint_save_strategy：step 或 epoch，默认值：step · saveStep：[1,50000]，默认为64，Checkpoint保存间隔数需要是验证步数的整数倍，当checkpointSaveStrategy = step，此参数有效 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · perDeviceTrainBatchSize：[1,8]，默认值1 · 仅LoRA支持： loraRank：[8,64], 默认值64
Qianfan-Sug	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值32，步长16（当maxSeqLen=4096时，步长为8） LoRA：[1,10000]，默认值32，步长16 · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，步长64 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · power：[1,3],默认值1 · validationStep：[0,1000000]，默认值16，步长1 · lossType：字符串 sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数checkpointSaveStrategy=step时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：2 或 4 或 8，默认值8
DeepSeek-R1-Distill-Qwen-14B	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001,步长0.000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = step，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · checkpointSaveStrategy：step 或 epoch，默认值：step · 仅LoRA支持： loraRank：8 或 64，默认值64
DeepSeek-R1	DPO	LoRA	· epoch：[1,50]默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001,步长0.000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · loraRank：单选，8 或 16 或 32 或 64，默认8 · loraAlpha：单选，8 或 16 或 32 或 64，默认32 · loraDropout：[0.01,0.5]，默认0.1，步长0.001
Qwen2.5-1.5B-Instruct	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：maxSeqLen：512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64
Qwen2.5-14B-Instruct	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64
Qwen2.5-32B-Instruct	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：maxSeqLen：512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64
Qwen3-4B	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64
Qwen3-8B	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64
Qwen3-32B	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64
Qwen3-0.6B	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64
Qwen3-14B	DPO	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.0002]，默认值0.000001，步长0.000001 · maxSeqLen：512 或 1024 或 2048 或 4096 或 8192 或 16384 或 32768，默认值4096 · globalBatchSize：[8,100000]，默认值16，步长8 · Packing：字符串，false 或 true 或 auto，默认：auto · warmupRatio：[0.01,0.1]，默认值0.03，步长0.001 · weightDecay：[0.001,1]，默认值0.01，步长0.001 · dpoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointCount：[1,10]，默认值1，步长1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy = steps，此参数有效 · seed：[1,2147483647]，默认值42 · schedulerName：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值cosine · checkpointSaveStrategy：steps 或 epoch，默认值：steps · validationStep：[0,1000000]，默认值16，步长1 · lossType：sigmoid 或 ipo 或 kto_pair，默认值sigmoid · 仅LoRA支持： loraRank：8 或 64，默认值64

KTO

model	trainMode	parameterScale	hyperParameterConfig
ERNIE-Speed-8K	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长2） LoRA：[1,10000]，默认值16，步长2（当maxSeqLen=4096时，推荐步长4） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，8 或 64 ，默认值64
ERNIE-Lite-128K-0419	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1（当maxSeqLen=16384时，推荐步长2；当maxSeqLen=32768时，推荐步长2；当maxSeqLen=65536时，推荐步长2） LoRA：[1,10000]，默认值16，步长1（当maxSeqLen=16384时，推荐步长8；当maxSeqLen=32768时，推荐步长8；当maxSeqLen=65536时，推荐步长4；当maxSeqLen=131072时，推荐步长2） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Lite-8K-0308	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长4（当maxSeqLen=8192时，推荐步长8） LoRA：[1,10000]，默认值16，步长4（当maxSeqLen=4096时，推荐步长8） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Character-Fiction-8K	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 ·globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长2） LoRA：[1,10000]，默认值16，步长2（当maxSeqLen=4096时，推荐步长4） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Character-8K-0321	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长2） LoRA：[1,10000]，默认值16，步长2（当maxSeqLen=4096时，推荐步长4） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Tiny-8K	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值32，步长8（当maxSeqLen=4096时，推荐步长16） LoRA：[1,10000]，默认值32，步长8（当maxSeqLen=4096时；当推荐步长16时，maxSeqLen=8192,推荐步长16） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Tiny-128K-0929	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · globalBatchSize：[1,10000]，默认值16，步长2（当maxSeqLen=16384时，推荐步长4；当maxSeqLen=32768时，推荐步长4；当maxSeqLen=131072时，推荐步长8） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，2 或 4 或 8 ，默认值8
ERNIE-Speed-Pro-128K	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，8192 或 16384 或 32768 或 65536 或 131072，默认值32768 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1 LoRA：[1,10000]，默认值16，步长1（当maxSeqLen=16384时，推荐步长4；当maxSeqLen=32768时，推荐步长4；当maxSeqLen=65536时，推荐步长4） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，8 或 64 ，默认值64
ERNIE-4.5-Turbo-128K	KTO	FullFineTuning、LoRA	· epoch：[1,50]，默认值3 · learningRate：[0.0000001,0.01]，默认值0.000001，步长0.0000001 · maxSeqLen：单选，512 或 1024 或 2048 或 4096 或 8192，默认值4096 · globalBatchSize： FullFineTuning：[1,10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长2） LoRA：[1,10000]，默认值16，步长2（当maxSeqLen=4096时，推荐步长4） · loggingSteps：1 · warmupRatio：[0.01,0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001,0.1]，默认值0.01，步长0.0001 · ktoBeta：[0.01,1]，默认值0.1，步长0.001 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1,2147483647]，默认值42 · lrSchedulerType：单选，linear 或 cosine 或 polynomial 或 constant 或 constant_with_warmup，默认值linear · numCycles：[0.1,0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001,0.000001]，默认值0.0000001，步长0.00000001 · validationStep：[0,1000000]，默认值16，步长1 · power：[1,3]，默认值1 · 早停策略相关参数： earlyStopping：True 或 False，默认False，当参数checkpointSaveStrategy=step时，此参数有效 earlyStopMetric：ValidationLoss，当参数earlyStopping为True时，此参数有效 earlyStoppingThreshold：[0,5] ，默认值 0.01，步长0.01，当参数earlyStopping为True时，此参数有效 earlyStoppingPatience：[1,50]，默认值 3，步长1，当参数earlyStopping为True时，此参数有效 · 仅LoRA支持： loraRank：单选，8 或 64 ，默认值64

RLHF

model	trainMode	parameterScale	hyperParameterConfig
ERNIE-Lite-8K-0308	RM	FullFineTuning	· epoch：[1, 50]，默认值3 · learningRate：[0.00000010, 0.01]，默认值0.0000010，步长0.00000010 · maxSeqLen：单选，可选项4096、8192，默认值4096 · globalBatchSize：[1, 10000]，默认值16，步长2（当maxSeqLen=8192时，推荐步长4） · useCls：单选，可选项true、false，默认值true · warmupRatio：[0.01, 0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001, 0.1]，默认值0.01，步长0.0001 · pseudoSamplingProb：[0, 0.9]，默认值0，步长0.1 · seed：[1, 2147483647]，默认值42 · lrSchedulerType：单选，可选项linear、cosine、polynomial、constant、constant_with_warmup，默认值linear · numCycles：[0.1, 0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001, 0.0000010]，默认值0.00000010，步长0.00000001 · validationStep：[0, 1000000]，默认值16，步长1 · power：[1, 3]，默认值1
ERNIE-Lite-8K-0308	PPO	FullFineTuning	· epoch：[1, 50]，默认值3 · critic_learning_rate：[0.00000010, 0.00001]，默认值0.000002，步长0.00000010 · learningRate：[0.00000010, 0.00001]，默认值0.0000010，步长0.00000010 · maxSeqLen：单选，可选项4096、8192，默认值4096 · globalBatchSize：[1, 10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长4） · clip_range_score：[5, 50]，默认值10 · clip_range_value：[5, 50]，默认值5 · clip_range_ratio：[0.01, 0.3]，默认值0.2 · loggingSteps：[1, 1]，默认值1 · warmupRatio：[0.01, 0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001, 0.1]，默认值0.01，步长0.0001 · top_p：[0, 1]，默认值0.9 · validationStep：[0, 1000000]，默认值16，步长1 · repetition_penalty：[1, 2]，默认值1 · temperature：[0, 1]，默认值1 · kl_coeff：[0.001, 0.1]，默认值0.02 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1, 10]，默认值1 · saveStep：单选，可选项64、128、256、512、1024、2048、4096，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1, 2147483647]，默认值42 · lrSchedulerType：单选，可选项linear、cosine、polynomial、constant、constant_with_warmup，默认值linear · numCycles：[0.1, 0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001, 0.0000010]，默认值0.00000010，步长0.00000001 · power：[1, 3]，默认值1
ERNIE-Tiny-8K	RM	FullFineTuning	· epoch：[1, 50]，默认值3 · learningRate：[0.00000010, 0.01]，默认值0.0000010，步长0.00000010 · maxSeqLen：单选，可选项4096、8192，默认值4096 · globalBatchSize：[1, 10000]，默认值16，步长2（当maxSeqLen=8192时，推荐步长4） · useCls：单选，可选项true、false，默认值true · warmupRatio：[0.01, 0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001, 0.1]，默认值0.01，步长0.0001 · pseudoSamplingProb：[0, 0.9]，默认值0，步长0.1 · seed：[1, 2147483647]，默认值42 · lrSchedulerType：单选，可选项linear、cosine、polynomial、constant、constant_with_warmup，默认值linear · numCycles：[0.1, 0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001, 0.0000010]，默认值0.00000010，步长0.00000001 · validationStep：[0, 1000000]，默认值16，步长1 · power：[1, 3]，默认值1
ERNIE-Tiny-8K	PPO	FullFineTuning	· epoch：[1, 50]，默认值3 · critic_learning_rate：[0.00000010, 0.00001]，默认值0.000002，步长0.00000010 · learningRate：[0.00000010, 0.00001]，默认值0.0000010，步长0.00000010 · maxSeqLen：单选，可选项4096、8192，默认值4096 · globalBatchSize：[1, 10000]，默认值16，步长1（当maxSeqLen=4096时，推荐步长4） · clip_range_score：[5, 50]，默认值10 · clip_range_value：[5, 50]，默认值5 · clip_range_ratio：[0.01, 0.3]，默认值0.2 · loggingSteps：[1, 1]，默认值1 · warmupRatio：[0.01, 0.5]，默认值0.1，步长0.01 · weightDecay：[0.0001, 0.1]，默认值0.01，步长0.0001 · top_p：[0, 1]，默认值0.9 · validationStep：[0, 1000000]，默认值16，步长1 · repetition_penalty：[1, 2]，默认值1 · temperature：[0, 1]，默认值1 · kl_coeff：[0.001, 0.1]，默认值0.02 · checkpointSaveStrategy: 单选，step 或 epoch，默认step · checkpointCount：[1, 10]，默认值1 · saveStep：单选，可选项64、128、256、512、1024、2048、4096，默认值256，当参数checkpointSaveStrategy=step时，此参数有效 · seed：[1, 2147483647]，默认值42 · lrSchedulerType：单选，可选项linear、cosine、polynomial、constant、constant_with_warmup，默认值linear · numCycles：[0.1, 0.5]，默认值0.5，步长0.1 · lrEnd：[0.00000001, 0.0000010]，默认值0.00000010，步长0.00000001 · power：[1, 3]，默认值1

图像理解类

model	trainMode	parameterScale	hyperParameterConfig
LLAVA-V1.6-13B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.000001 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · validationStep：[0,1000000]，默认值16，递增步长1 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001,1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096，默认值2048 · freezeViT：布尔值，True 或 False，默认False · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
InternVL2-2B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,001]，默认值0.00001，递增步长0.000001 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · validationStep：[0, 1000000]，默认值16，递增步长1 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01, 0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，True 或 False，默认False · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
Qwen2-VL-7B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.000001 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64（，当checkpointSaveStrategy=step，此参数有效 · validationStep：[0, 1000000]，默认值16，递增步长1 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01, 0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，true 或 false，默认false · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
InternVL2-8B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.000001 LoRA： [0.0000000001,0.001]，默认值0.0001，步长0.00004 · validationStep：[0, 1000000], 默认值16，递增步长1 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：单选，step 或 epoch，默认step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01, 0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，True 或 False，默认False · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
InternLM-XComposer2.5	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate：[0.0000000001,0.001]，默认值0.00001，步长0.00001 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：单选，step 或 epoch，默认step · validationStep：[0, 1000000]，默认值16，递增步长1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，True 或 False，默认False · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
Qwen2-VL-2B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.00004 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：step 或 epoch，默认值step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy=step时，此参数有效 · validationStep：[0, 1000000]，默认值16，递增步长1 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01, 0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，true 或 false，默认false · 仅LoRA： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
InternVL2.5-8B	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.00004 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：单选，step 或 epoch，默认step · validationStep：[0, 1000000]，默认值16，递增步长1 · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当参数checkpointSaveStrategy=step时，此参数有效 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01,0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，True 或 False，默认False · 仅LoRA支持： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
Qwen2.5-VL-7B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.00004 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：step 或 epoch，默认值step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy=step时，此参数有效 · validationStep：[0, 1000000]，默认值16，递增步长1 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01, 0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，true 或 false，默认false · 仅LoRA： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
Qwen2.5-VL-32B-Instruct	SFT	FullFineTuning、LoRA	· epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.00004 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：step 或 epoch，默认值step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy=step时，此参数有效 · validationStep：[0, 1000000]，默认值16，递增步长1 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01, 0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，true 或 false，默认false · 仅LoRA： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8
Qwen2.5-VL-3B-Instruct	SFT	FullFineTuning、LoRA	epoch：[1,50]，默认值1 · learningRate： FullFineTuning：[0.0000000001,0.001]，默认值0.00001，递增步长0.00004 LoRA：[0.0000000001,0.001]，默认值0.0001，递增步长0.00004 · batchSize：默认值1，步长1，取值范围如下：当maxSeqLen为8192时，取值范围为[1,4] 当maxSeqLen为4096时，取值范围为[1,8] 当maxSeqLen为2048、1024、512时，取值范围为[1,16] · checkpointSaveStrategy：step 或 epoch，默认值step · checkpointCount：[1,10]，默认值1 · saveStep：[1,50000]，默认值64，当checkpointSaveStrategy=step时，此参数有效 · validationStep：[0, 1000000]，默认值16，递增步长1 · schedulerName：单选：linear、cosine、polynomial、constant、constant_with_warmup，默认值cosine · warmupRatio：[0.01, 0.1]，默认值0.05，递增步长0.001 · weightDecay：[0.001, 1]，默认值0.1，递增步长0.001 · maxSeqLen：单选：512、1024、2048、4096、8192，默认值2048 · freezeViT：布尔值，true 或 false，默认false · 仅LoRA： loraRank：单选，8 或 16 或 32 或 64 或 128 或 256，默认值8

删除模型精调任务

数据管理