5.Model

更新时间：2022-07-27

简介

在文心中，我们把深度学习任务中对神经网络的基本操作进行了统一封装，称为Model。一个Model实例中定义的操作包括网络结构定义（structure）、前向传播网络(forward)、优化策略设置(set_optimizer)、指标评估(get_metrics)等部分，这些部分除了可以直接使用文心预置的方法之外，均可实现自定义。

基本结构

每一个Model实例中，都需要实现4个基本功能：定义网络结构、搭建前向传播网络、设置优化策略、确定指标评估的方式。文心所有的Model都必须继承自BaseModel，这里以一个基于ERNIE的FC分类任务来举例说明：

定义网络结构：核心内容是初始化网络结构中每一层需要用到的OP。

def structure(self):
      """网络结构组织
      :return
      """
      emb_params = self.model_params.get("embedding")
      config_path = emb_params.get("config_path")
      self.cfg_dict = ErnieConfig(config_path)
      self.ernie_model = ErnieModel(self.cfg_dict, name='')
      initializer = nn.initializer.TruncatedNormal(std=0.02)
      self.dropout = nn.Dropout(p=0.1)
      self.fc_prediction = nn.Linear(in_features=self.hid_dim, out_features=self.num_labels,
                                     weight_attr=paddle.ParamAttr(name='cls.w_0', initializer=initializer),
                                     bias_attr='cls.b_0')
      self.loss = paddle.nn.CrossEntropyLoss(use_softmax=False)

搭建前向传播网络：核心内容是模型的前向计算的组网部分（使用飞桨的接口进行组网）和损失函数的计算。输出即为对输入数据执行变换计算后的结果。

def forward(self, fields_dict, phase):
      """ 前向计算
      :param fields_dict:
      :param phase:
      :return
      """
         ## 从入参的dict中解析出由Reader构造出来的tensor数据。
      fields_dict = self.fields_process(fields_dict, phase)
      instance_text_a = fields_dict["text_a"]
      record_id_text_a = instance_text_a[InstanceName.RECORD_ID]
      text_a_src = record_id_text_a[InstanceName.SRC_IDS]
      text_a_sent = record_id_text_a[InstanceName.SENTENCE_IDS]
      ## 通过ERNIE，对样本数据构造embedding（语义表示）向量， cls_embedding为ju'x
      cls_embedding, tokens_embedding = self.ernie_model(src_ids=text_a_src, sent_ids=text_a_sent)
      cls_embedding = self.dropout(cls_embedding)
      ## 接入全连接层，降维到与训练集中的label数目相同的维度，并采用softmax进行概率映射。
      prediction = self.fc_prediction(cls_embedding)
      probs = nn.functional.softmax(prediction)
      if phase == InstanceName.TRAINING or phase == InstanceName.EVALUATE or phase == InstanceName.TEST:
          "train, evaluate, test"
          instance_label = fields_dict["label"]
          record_id_label = instance_label[InstanceName.RECORD_ID]
          label = record_id_label[InstanceName.SRC_IDS]
          ## 损失函数采用交叉熵计算
          cost = self.loss(probs, label)
          # tips：训练模式下，一定要返回loss
          forward_return_dict = {
              InstanceName.PREDICT_RESULT: probs,
              InstanceName.LABEL: label,
              InstanceName.LOSS: cost
          }
          return forward_return_dict
      elif phase == InstanceName.INFERENCE:
          "infer data with dynamic graph"
          forward_return_dict = {
              InstanceName.PREDICT_RESULT: probs
          }
          return forward_return_dict
      elif phase == InstanceName.SAVE_INFERENCE:
          "save inference model with jit"
          target_predict_list = [probs]
          target_feed_list = [text_a_src, text_a_sent]
          # 以json的形式存入模型的meta文件中，在离线预测的时候用，field_name#field_tensor_name
          target_feed_name_list = ["text_a#src_ids", "text_a#sent_ids"]
          wrap_save(target_feed_list, self.ernie_model)
          forward_return_dict = {
              InstanceName.TARGET_FEED: target_feed_list,
              InstanceName.TARGET_PREDICTS: target_predict_list,
              InstanceName.TARGET_FEED_NAMES: target_feed_name_list
          }
          return forward_return_dict

选定优化策略：设置优化器，如Adam，Adagrad，SGD等，优化器部分详见[Optimizer]。这里的跳转还需要更新。

def set_optimizer(self):
    """
    :return optimizer
    """
    # 学习率和权重的衰减设置在optimizer中，loss的缩放设置在amp中（各个trainer中进行设置）。
    opt_param = self.model_params.get('optimization', None)
    self.lr = opt_param.get("learning_rate", 2e-5)
    weight_decay = opt_param.get("weight_decay", 0.01)
    use_lr_decay = opt_param.get("use_lr_decay", False)
    epsilon = opt_param.get("epsilon", 1e-6)
    g_clip = paddle.nn.ClipGradByGlobalNorm(1.0)
    param_name_to_exclue_from_weight_decay = re.compile(r'.*layer_norm_scale|.*layer_norm_bias|.*b_0')
    if use_lr_decay:
        max_train_steps = opt_param.get("max_train_steps", 0)
        warmup_steps = opt_param.get("warmup_steps", 0)
        self.lr_scheduler = LinearWarmupDecay(base_lr=self.lr, end_lr=0.0, warmup_steps=warmup_steps,
                                              decay_steps=max_train_steps, num_train_steps=max_train_steps)
        self.optimizer = paddle.optimizer.AdamW(learning_rate=self.lr_scheduler,
                                                parameters=self.parameters(),
                                                weight_decay=weight_decay,
                                                apply_decay_param_fun=lambda
                                                    n: not param_name_to_exclue_from_weight_decay.match(n),
                                                epsilon=epsilon,
                                                grad_clip=g_clip)
    else:
        self.optimizer = paddle.optimizer.AdamW(self.lr,
                                                parameters=self.parameters(),
                                                weight_decay=weight_decay,
                                                apply_decay_param_fun=lambda
                                                    n: not param_name_to_exclue_from_weight_decay.match(n),
                                                epsilon=epsilon,
                                                grad_clip=g_clip)
    return self.optimizer

确定指标评估的方式：训练过程中某一时刻模型的指标评估部分的动态计算和打印。

def get_metrics(self, forward_return_dict, meta_info, phase):
    """
    :param forward_return_dict: 前向计算得出的结果
    :param meta_info: 常用的meta信息，如step, used_time, gpu_id等
    :param phase: 当前调用的阶段，包含训练和评估
    :return
    """
    predictions = forward_return_dict[InstanceName.PREDICT_RESULT]
    label = forward_return_dict[InstanceName.LABEL]
    # paddle_acc = forward_return_dict["acc"]
    if self.is_dygraph:
        if isinstance(predictions, list):
            predictions = [item.numpy() for item in predictions]
        else:
            predictions = predictions.numpy()
        if isinstance(label, list):
            label = [item.numpy() for item in label]
        else:
            label = label.numpy()
    metrics_acc = metrics.Acc()
    acc = metrics_acc.eval([predictions, label])
    metrics_pres = metrics.Precision()
    precision = metrics_pres.eval([predictions, label])
    if phase == InstanceName.TRAINING:
        step = meta_info[InstanceName.STEP]
        time_cost = meta_info[InstanceName.TIME_COST]
        loss = forward_return_dict[InstanceName.LOSS]
        if isinstance(loss, paddle.Tensor):
            loss_np = loss.numpy()
            mean_loss = np.mean(loss_np)
        else:
            mean_loss = np.mean(loss)
        logging.info("phase = {0} loss = {1} acc = {2} precision = {3} step = {4} time_cost = {5}".format(
            phase, mean_loss, acc, precision, step, round(time_cost, 4)))
    if phase == InstanceName.EVALUATE or phase == InstanceName.TEST:
        time_cost = meta_info[InstanceName.TIME_COST]
        step = meta_info[InstanceName.STEP]
        logging.info("phase = {0} acc = {1} precision = {2} time_cost = {3} step = {4}".format(
            phase, acc, precision, round(time_cost, 4), step))
    metrics_return_dict = collections.OrderedDict()
    metrics_return_dict["acc"] = acc
    metrics_return_dict["precision"] = precision
    return metrics_return_dict

文心中的预置Model

文心提供了丰富的预置Model，支持常见的NLP领域的经典任务，包括文本分类、文本匹配、序列标注、信息抽取等，预置的Model文件都在对应tasks下相应任务目录的model目录下：部分Model如下所示：

.
├── base_cls.py                    ## 分类任务的model基类
├── bow_classification.py          ## BOW分类网络
├── ernie_classification.py        ## 基于ERNIE的分类网络
├── ....
├── term_rank_ernie.py             ## 基于ERNIE的term 重要性网络
├── base_matching.py                  ## 匹配任务的model基类
├── bow_matching_pairwise.py       ## BOW 的pairwise匹配网络
├── ernie_matching_siamese_pairwise.py   ## 基于ERNIE的pairwise匹配网络
├── ....         
├── ernie_fc_ie.py                 ## 基于ERNIE的信息抽取网络
├── ....    
├── ernie_fc_sequence_label.py     ## 基于ERNIE的序列标注网络
└── ....

进阶使用

文心中提供了NLP领域比较通用的经典网络，如果用户需要针对自己的业务场景进行自定义优化使用的话，请参考详细的接口设计与自定义核心接口Model设计。

4.Reader

6. Metrics