资讯 文档
技术能力
语音技术
文字识别
人脸与人体
图像技术
语言与知识
视频技术

5.Model

简介

在文心中,我们把深度学习任务中对神经网络的基本操作进行了统一封装,称为Model。一个Model实例中定义的操作包括网络结构定义(structure)、前向传播网络(forward)、优化策略设置(set_optimizer)、指标评估(get_metrics)等部分,这些部分除了可以直接使用文心预置的方法之外,均可实现自定义。

基本结构

每一个Model实例中,都需要实现4个基本功能:定义网络结构、搭建前向传播网络、设置优化策略、确定指标评估的方式。文心所有的Model都必须继承自BaseModel,这里以一个基于ERNIE的FC分类任务来举例说明:

  • 定义网络结构:核心内容是初始化网络结构中每一层需要用到的OP。
def structure(self):
      """网络结构组织
      :return
      """
      emb_params = self.model_params.get("embedding")
      config_path = emb_params.get("config_path")
      self.cfg_dict = ErnieConfig(config_path)
      self.ernie_model = ErnieModel(self.cfg_dict, name='')
      initializer = nn.initializer.TruncatedNormal(std=0.02)
      self.dropout = nn.Dropout(p=0.1)
      self.fc_prediction = nn.Linear(in_features=self.hid_dim, out_features=self.num_labels,
                                     weight_attr=paddle.ParamAttr(name='cls.w_0', initializer=initializer),
                                     bias_attr='cls.b_0')
      self.loss = paddle.nn.CrossEntropyLoss(use_softmax=False)
  • 搭建前向传播网络:核心内容是模型的前向计算的组网部分(使用飞桨的接口进行组网)和损失函数的计算。输出即为对输入数据执行变换计算后的结果。
def forward(self, fields_dict, phase):
      """ 前向计算
      :param fields_dict:
      :param phase:
      :return
      """
         ## 从入参的dict中解析出由Reader构造出来的tensor数据。
      fields_dict = self.fields_process(fields_dict, phase)
      instance_text_a = fields_dict["text_a"]
      record_id_text_a = instance_text_a[InstanceName.RECORD_ID]
      text_a_src = record_id_text_a[InstanceName.SRC_IDS]
      text_a_sent = record_id_text_a[InstanceName.SENTENCE_IDS]
      ## 通过ERNIE,对样本数据构造embedding(语义表示)向量, cls_embedding为ju'x
      cls_embedding, tokens_embedding = self.ernie_model(src_ids=text_a_src, sent_ids=text_a_sent)
      cls_embedding = self.dropout(cls_embedding)
      ## 接入全连接层,降维到与训练集中的label数目相同的维度,并采用softmax进行概率映射。
      prediction = self.fc_prediction(cls_embedding)
      probs = nn.functional.softmax(prediction)
      if phase == InstanceName.TRAINING or phase == InstanceName.EVALUATE or phase == InstanceName.TEST:
          "train, evaluate, test"
          instance_label = fields_dict["label"]
          record_id_label = instance_label[InstanceName.RECORD_ID]
          label = record_id_label[InstanceName.SRC_IDS]
          ## 损失函数采用交叉熵计算
          cost = self.loss(probs, label)
          # tips:训练模式下,一定要返回loss
          forward_return_dict = {
              InstanceName.PREDICT_RESULT: probs,
              InstanceName.LABEL: label,
              InstanceName.LOSS: cost
          }
          return forward_return_dict
      elif phase == InstanceName.INFERENCE:
          "infer data with dynamic graph"
          forward_return_dict = {
              InstanceName.PREDICT_RESULT: probs
          }
          return forward_return_dict
      elif phase == InstanceName.SAVE_INFERENCE:
          "save inference model with jit"
          target_predict_list = [probs]
          target_feed_list = [text_a_src, text_a_sent]
          # 以json的形式存入模型的meta文件中,在离线预测的时候用,field_name#field_tensor_name
          target_feed_name_list = ["text_a#src_ids", "text_a#sent_ids"]
          wrap_save(target_feed_list, self.ernie_model)
          forward_return_dict = {
              InstanceName.TARGET_FEED: target_feed_list,
              InstanceName.TARGET_PREDICTS: target_predict_list,
              InstanceName.TARGET_FEED_NAMES: target_feed_name_list
          }
          return forward_return_dict
  • 选定优化策略:设置优化器,如Adam,Adagrad,SGD等,优化器部分详见[Optimizer]。这里的跳转还需要更新。

    def set_optimizer(self):
        """
        :return optimizer
        """
        # 学习率和权重的衰减设置在optimizer中,loss的缩放设置在amp中(各个trainer中进行设置)。
        opt_param = self.model_params.get('optimization', None)
        self.lr = opt_param.get("learning_rate", 2e-5)
        weight_decay = opt_param.get("weight_decay", 0.01)
        use_lr_decay = opt_param.get("use_lr_decay", False)
        epsilon = opt_param.get("epsilon", 1e-6)
        g_clip = paddle.nn.ClipGradByGlobalNorm(1.0)
        param_name_to_exclue_from_weight_decay = re.compile(r'.*layer_norm_scale|.*layer_norm_bias|.*b_0')
        if use_lr_decay:
            max_train_steps = opt_param.get("max_train_steps", 0)
            warmup_steps = opt_param.get("warmup_steps", 0)
            self.lr_scheduler = LinearWarmupDecay(base_lr=self.lr, end_lr=0.0, warmup_steps=warmup_steps,
                                                  decay_steps=max_train_steps, num_train_steps=max_train_steps)
            self.optimizer = paddle.optimizer.AdamW(learning_rate=self.lr_scheduler,
                                                    parameters=self.parameters(),
                                                    weight_decay=weight_decay,
                                                    apply_decay_param_fun=lambda
                                                        n: not param_name_to_exclue_from_weight_decay.match(n),
                                                    epsilon=epsilon,
                                                    grad_clip=g_clip)
        else:
            self.optimizer = paddle.optimizer.AdamW(self.lr,
                                                    parameters=self.parameters(),
                                                    weight_decay=weight_decay,
                                                    apply_decay_param_fun=lambda
                                                        n: not param_name_to_exclue_from_weight_decay.match(n),
                                                    epsilon=epsilon,
                                                    grad_clip=g_clip)
        return self.optimizer
  • 确定指标评估的方式:训练过程中某一时刻模型的指标评估部分的动态计算和打印。

    def get_metrics(self, forward_return_dict, meta_info, phase):
        """
        :param forward_return_dict: 前向计算得出的结果
        :param meta_info: 常用的meta信息,如step, used_time, gpu_id等
        :param phase: 当前调用的阶段,包含训练和评估
        :return
        """
        predictions = forward_return_dict[InstanceName.PREDICT_RESULT]
        label = forward_return_dict[InstanceName.LABEL]
        # paddle_acc = forward_return_dict["acc"]
        if self.is_dygraph:
            if isinstance(predictions, list):
                predictions = [item.numpy() for item in predictions]
            else:
                predictions = predictions.numpy()
            if isinstance(label, list):
                label = [item.numpy() for item in label]
            else:
                label = label.numpy()
        metrics_acc = metrics.Acc()
        acc = metrics_acc.eval([predictions, label])
        metrics_pres = metrics.Precision()
        precision = metrics_pres.eval([predictions, label])
        if phase == InstanceName.TRAINING:
            step = meta_info[InstanceName.STEP]
            time_cost = meta_info[InstanceName.TIME_COST]
            loss = forward_return_dict[InstanceName.LOSS]
            if isinstance(loss, paddle.Tensor):
                loss_np = loss.numpy()
                mean_loss = np.mean(loss_np)
            else:
                mean_loss = np.mean(loss)
            logging.info("phase = {0} loss = {1} acc = {2} precision = {3} step = {4} time_cost = {5}".format(
                phase, mean_loss, acc, precision, step, round(time_cost, 4)))
        if phase == InstanceName.EVALUATE or phase == InstanceName.TEST:
            time_cost = meta_info[InstanceName.TIME_COST]
            step = meta_info[InstanceName.STEP]
            logging.info("phase = {0} acc = {1} precision = {2} time_cost = {3} step = {4}".format(
                phase, acc, precision, round(time_cost, 4), step))
        metrics_return_dict = collections.OrderedDict()
        metrics_return_dict["acc"] = acc
        metrics_return_dict["precision"] = precision
        return metrics_return_dict

文心中的预置Model

文心提供了丰富的预置Model,支持常见的NLP领域的经典任务,包括文本分类、文本匹配、序列标注、信息抽取等,预置的Model文件都在对应tasks下相应任务目录的model目录下:部分Model如下所示:

.
├── base_cls.py                    ## 分类任务的model基类
├── bow_classification.py          ## BOW分类网络
├── ernie_classification.py        ## 基于ERNIE的分类网络
├── ....
├── term_rank_ernie.py             ## 基于ERNIE的term 重要性网络
├── base_matching.py                  ## 匹配任务的model基类
├── bow_matching_pairwise.py       ## BOW 的pairwise匹配网络
├── ernie_matching_siamese_pairwise.py   ## 基于ERNIE的pairwise匹配网络
├── ....         
├── ernie_fc_ie.py                 ## 基于ERNIE的信息抽取网络
├── ....    
├── ernie_fc_sequence_label.py     ## 基于ERNIE的序列标注网络
└── ....

进阶使用

文心中提供了NLP领域比较通用的经典网络,如果用户需要针对自己的业务场景进行自定义优化使用的话,请参考详细的接口设计与自定义核心接口Model设计

上一篇
4.Reader
下一篇
6. Metrics