5.Model
更新时间:2022-07-27
简介
在文心中,我们把深度学习任务中对神经网络的基本操作进行了统一封装,称为Model。一个Model实例中定义的操作包括网络结构定义(structure)、前向传播网络(forward)、优化策略设置(set_optimizer)、指标评估(get_metrics)等部分,这些部分除了可以直接使用文心预置的方法之外,均可实现自定义。
基本结构
每一个Model实例中,都需要实现4个基本功能:定义网络结构、搭建前向传播网络、设置优化策略、确定指标评估的方式。文心所有的Model都必须继承自BaseModel,这里以一个基于ERNIE的FC分类任务来举例说明:
- 定义网络结构:核心内容是初始化网络结构中每一层需要用到的OP。
def structure(self):
"""网络结构组织
:return
"""
emb_params = self.model_params.get("embedding")
config_path = emb_params.get("config_path")
self.cfg_dict = ErnieConfig(config_path)
self.ernie_model = ErnieModel(self.cfg_dict, name='')
initializer = nn.initializer.TruncatedNormal(std=0.02)
self.dropout = nn.Dropout(p=0.1)
self.fc_prediction = nn.Linear(in_features=self.hid_dim, out_features=self.num_labels,
weight_attr=paddle.ParamAttr(name='cls.w_0', initializer=initializer),
bias_attr='cls.b_0')
self.loss = paddle.nn.CrossEntropyLoss(use_softmax=False)
- 搭建前向传播网络:核心内容是模型的前向计算的组网部分(使用飞桨的接口进行组网)和损失函数的计算。输出即为对输入数据执行变换计算后的结果。
def forward(self, fields_dict, phase):
""" 前向计算
:param fields_dict:
:param phase:
:return
"""
## 从入参的dict中解析出由Reader构造出来的tensor数据。
fields_dict = self.fields_process(fields_dict, phase)
instance_text_a = fields_dict["text_a"]
record_id_text_a = instance_text_a[InstanceName.RECORD_ID]
text_a_src = record_id_text_a[InstanceName.SRC_IDS]
text_a_sent = record_id_text_a[InstanceName.SENTENCE_IDS]
## 通过ERNIE,对样本数据构造embedding(语义表示)向量, cls_embedding为ju'x
cls_embedding, tokens_embedding = self.ernie_model(src_ids=text_a_src, sent_ids=text_a_sent)
cls_embedding = self.dropout(cls_embedding)
## 接入全连接层,降维到与训练集中的label数目相同的维度,并采用softmax进行概率映射。
prediction = self.fc_prediction(cls_embedding)
probs = nn.functional.softmax(prediction)
if phase == InstanceName.TRAINING or phase == InstanceName.EVALUATE or phase == InstanceName.TEST:
"train, evaluate, test"
instance_label = fields_dict["label"]
record_id_label = instance_label[InstanceName.RECORD_ID]
label = record_id_label[InstanceName.SRC_IDS]
## 损失函数采用交叉熵计算
cost = self.loss(probs, label)
# tips:训练模式下,一定要返回loss
forward_return_dict = {
InstanceName.PREDICT_RESULT: probs,
InstanceName.LABEL: label,
InstanceName.LOSS: cost
}
return forward_return_dict
elif phase == InstanceName.INFERENCE:
"infer data with dynamic graph"
forward_return_dict = {
InstanceName.PREDICT_RESULT: probs
}
return forward_return_dict
elif phase == InstanceName.SAVE_INFERENCE:
"save inference model with jit"
target_predict_list = [probs]
target_feed_list = [text_a_src, text_a_sent]
# 以json的形式存入模型的meta文件中,在离线预测的时候用,field_name#field_tensor_name
target_feed_name_list = ["text_a#src_ids", "text_a#sent_ids"]
wrap_save(target_feed_list, self.ernie_model)
forward_return_dict = {
InstanceName.TARGET_FEED: target_feed_list,
InstanceName.TARGET_PREDICTS: target_predict_list,
InstanceName.TARGET_FEED_NAMES: target_feed_name_list
}
return forward_return_dict
-
选定优化策略:设置优化器,如Adam,Adagrad,SGD等,优化器部分详见[Optimizer]。这里的跳转还需要更新。
def set_optimizer(self): """ :return optimizer """ # 学习率和权重的衰减设置在optimizer中,loss的缩放设置在amp中(各个trainer中进行设置)。 opt_param = self.model_params.get('optimization', None) self.lr = opt_param.get("learning_rate", 2e-5) weight_decay = opt_param.get("weight_decay", 0.01) use_lr_decay = opt_param.get("use_lr_decay", False) epsilon = opt_param.get("epsilon", 1e-6) g_clip = paddle.nn.ClipGradByGlobalNorm(1.0) param_name_to_exclue_from_weight_decay = re.compile(r'.*layer_norm_scale|.*layer_norm_bias|.*b_0') if use_lr_decay: max_train_steps = opt_param.get("max_train_steps", 0) warmup_steps = opt_param.get("warmup_steps", 0) self.lr_scheduler = LinearWarmupDecay(base_lr=self.lr, end_lr=0.0, warmup_steps=warmup_steps, decay_steps=max_train_steps, num_train_steps=max_train_steps) self.optimizer = paddle.optimizer.AdamW(learning_rate=self.lr_scheduler, parameters=self.parameters(), weight_decay=weight_decay, apply_decay_param_fun=lambda n: not param_name_to_exclue_from_weight_decay.match(n), epsilon=epsilon, grad_clip=g_clip) else: self.optimizer = paddle.optimizer.AdamW(self.lr, parameters=self.parameters(), weight_decay=weight_decay, apply_decay_param_fun=lambda n: not param_name_to_exclue_from_weight_decay.match(n), epsilon=epsilon, grad_clip=g_clip) return self.optimizer
-
确定指标评估的方式:训练过程中某一时刻模型的指标评估部分的动态计算和打印。
def get_metrics(self, forward_return_dict, meta_info, phase): """ :param forward_return_dict: 前向计算得出的结果 :param meta_info: 常用的meta信息,如step, used_time, gpu_id等 :param phase: 当前调用的阶段,包含训练和评估 :return """ predictions = forward_return_dict[InstanceName.PREDICT_RESULT] label = forward_return_dict[InstanceName.LABEL] # paddle_acc = forward_return_dict["acc"] if self.is_dygraph: if isinstance(predictions, list): predictions = [item.numpy() for item in predictions] else: predictions = predictions.numpy() if isinstance(label, list): label = [item.numpy() for item in label] else: label = label.numpy() metrics_acc = metrics.Acc() acc = metrics_acc.eval([predictions, label]) metrics_pres = metrics.Precision() precision = metrics_pres.eval([predictions, label]) if phase == InstanceName.TRAINING: step = meta_info[InstanceName.STEP] time_cost = meta_info[InstanceName.TIME_COST] loss = forward_return_dict[InstanceName.LOSS] if isinstance(loss, paddle.Tensor): loss_np = loss.numpy() mean_loss = np.mean(loss_np) else: mean_loss = np.mean(loss) logging.info("phase = {0} loss = {1} acc = {2} precision = {3} step = {4} time_cost = {5}".format( phase, mean_loss, acc, precision, step, round(time_cost, 4))) if phase == InstanceName.EVALUATE or phase == InstanceName.TEST: time_cost = meta_info[InstanceName.TIME_COST] step = meta_info[InstanceName.STEP] logging.info("phase = {0} acc = {1} precision = {2} time_cost = {3} step = {4}".format( phase, acc, precision, round(time_cost, 4), step)) metrics_return_dict = collections.OrderedDict() metrics_return_dict["acc"] = acc metrics_return_dict["precision"] = precision return metrics_return_dict
文心中的预置Model
文心提供了丰富的预置Model,支持常见的NLP领域的经典任务,包括文本分类、文本匹配、序列标注、信息抽取等,预置的Model文件都在对应tasks下相应任务目录的model目录下:部分Model如下所示:
.
├── base_cls.py ## 分类任务的model基类
├── bow_classification.py ## BOW分类网络
├── ernie_classification.py ## 基于ERNIE的分类网络
├── ....
├── term_rank_ernie.py ## 基于ERNIE的term 重要性网络
├── base_matching.py ## 匹配任务的model基类
├── bow_matching_pairwise.py ## BOW 的pairwise匹配网络
├── ernie_matching_siamese_pairwise.py ## 基于ERNIE的pairwise匹配网络
├── ....
├── ernie_fc_ie.py ## 基于ERNIE的信息抽取网络
├── ....
├── ernie_fc_sequence_label.py ## 基于ERNIE的序列标注网络
└── ....
进阶使用
文心中提供了NLP领域比较通用的经典网络,如果用户需要针对自己的业务场景进行自定义优化使用的话,请参考详细的接口设计与自定义核心接口Model设计。