试卷分析与识别

更新时间：2025-06-11

接口描述

可对文档版面进行分析，输出图、表、标题、文本的位置，并输出分版块内容的OCR识别结果，支持中、英两种语言，手写、印刷体混排多种场景，支持公式识别、手写竖式识别。

在线调试

您可以在示例代码中心中调试该接口，可进行签名验证、查看在线调用的请求内容和返回结果、示例代码的自动生成。

请求说明

请求示例

HTTP 方法：POST

请求URL： https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis

URL参数：

参数	值
access_token	通过API Key和Secret Key获取的access_token,参考“Access Token获取”

Header如下：

参数	值
Content-Type	application/x-www-form-urlencoded

Body中放置请求参数，参数详情如下：

请求参数

参数	是否必选	类型	可选值范围	说明
image	和 url/pdf_file 三选一	string	-	图像数据，base64编码后进行urlencode，要求base64编码和urlencode后大小不超过10M，最短边至少15px，最长边最大8192px，支持jpg/jpeg/png/bmp格式优先级：image > url > pdf_file，当image字段存在时，url、pdf_file字段失效
url	和 image/pdf_file 三选一	string	-	图片完整url，url长度不超过1024字节，url对应的图片base64编码后大小不超过10M，最短边至少15px，最长边最大8192px，支持jpg/jpeg/png/bmp格式优先级：image > url > pdf_file，当image字段存在时，url字段失效请注意关闭URL防盗链
pdf_file	和 image/url 三选一	string	-	PDF文件，base64编码后进行urlencode，要求base64编码和urlencode后大小不超过10M，最短边至少15px，最长边最大8192px 优先级：image > url > pdf_file，当image、url字段存在时，pdf_file字段失效
pdf_file_num	否	string	-	需要识别的PDF文件的对应页码，当 pdf_file 参数有效时，识别传入页码的对应页面内容，若不传入，则默认识别第 1 页
language_type	否	string	CHN_ENG/ ENG	识别语言类型，默认为CHN_ENG 可选值包括： = CHN_ENG：中英文 = ENG：英文
result_type	否	string	big/small	返回识别结果是按单行结果返回，还是按单字结果返回，默认为big。 = big：返回行识别结果 = small：返回行识别结果之上还会返回单字结果
detect_direction	否	string	true/false	是否检测图像朝向，默认不检测，即：false。朝向是指输入图像是正常方向、逆时针旋转90/180/270度。其中， 0 ：正向 1：逆时针旋转90度 2：逆时针旋转180度 3：逆时针旋转270度
line_probability	否	string	true/false	是否返回每行识别结果的置信度。默认为false
disp_line_poly	否	string	true/false	是否返回每行的四角点坐标。默认为false
words_type	否	string	handwring_only/ handprint_mix	文字类型。默认：印刷文字识别 = handwring_only：手写文字识别 = handprint_mix：手写印刷混排识别
layout_analysis	否	string	true/false	是否分析文档版面：包括layout（图、表、标题、段落、目录）；attribute（栏、页眉、页脚、页码、脚注）的分析输出
recg_formula	否	string	true/false	是否检测并识别公式，默认为false，公式以 Latex 格式文本返回。 =true：检测并识别公式 =false：不检测识别公式
recg_long_division	否	string	true/false	是否检测并识别手写竖式，默认为false。 =true：检测并识别手写竖式 =false：不检测手写竖式
disp_underline_analysis	否	string	true/false	是否开启下划线识别功能，可选值如下： =true：开启，在返回参数 underline 内输出下划线信息 =false：关闭，默认值，不输出下划线信息
recg_alter	否	string	true/false	是否开启返回涂改识别结果功能，可选值如下： =true：开启检测，涂改部分统一用“☰”返回 =false：关闭，默认值，不输出涂改识别结果

请求代码示例

提示一：使用示例代码前，请记得替换其中的示例Token、图片地址或Base64信息。

提示二：部分语言依赖的类或库，请在代码注释中查看下载地址。

# 试卷分析与识别
curl -i -k 'https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis?access_token=【调用鉴权接口获取的token】' --data 'language_type=CHN_ENG&result_type=big&image=【图片Base64编码，需UrlEncode】' -H 'Content-Type:application/x-www-form-urlencoded'

# encoding:utf-8

import requests
import base64

'''
试卷分析与识别
'''

request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis"
# 二进制方式打开图片文件
f = open('[本地文件]', 'rb')
img = base64.b64encode(f.read())

params = {"image":img,"language_type":"CHN_ENG","result_type":"big"}
access_token = '[调用鉴权接口获取的token]'
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
response = requests.post(request_url, data=params, headers=headers)
if response:
    print (response.json())

package com.baidu.ai.aip;

import com.baidu.ai.aip.utils.Base64Util;
import com.baidu.ai.aip.utils.FileUtil;
import com.baidu.ai.aip.utils.HttpUtil;

import java.net.URLEncoder;

/**
* 文档版面分析与识别
*/
public class DocAnalysis {

    /**
    * 重要提示代码中所需工具类
    * FileUtil,Base64Util,HttpUtil,GsonUtils请从
    * https://ai.baidu.com/file/658A35ABAB2D404FBF903F64D47C1F72
    * https://ai.baidu.com/file/C8D81F3301E24D2892968F09AE1AD6E2
    * https://ai.baidu.com/file/544D677F5D4E4F17B4122FBD60DB82B3
    * https://ai.baidu.com/file/470B3ACCA3FE43788B5A963BF0B625F3
    * 下载
    */
    public static String docAnalysis() {
        // 请求url
        String url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis";
        try {
            // 本地文件路径
            String filePath = "[本地文件路径]";
            byte[] imgData = FileUtil.readFileByBytes(filePath);
            String imgStr = Base64Util.encode(imgData);
            String imgParam = URLEncoder.encode(imgStr, "UTF-8");

            String param = "language_type=" + "CHN_ENG" + "&result_type=" + "big" + "&image=" + imgParam;

            // 注意这里仅为了简化编码每一次请求都去获取access_token，线上环境access_token有过期时间， 客户端可自行缓存，过期后重新获取。
            String accessToken = "[调用鉴权接口获取的token]";

            String result = HttpUtil.post(url, accessToken, param);
            System.out.println(result);
            return result;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }

    public static void main(String[] args) {
        DocAnalysis.docAnalysis();
    }
}

#include <iostream>
#include <curl/curl.h>

// libcurl库下载链接：https://curl.haxx.se/download.html
// jsoncpp库下载链接：https://github.com/open-source-parsers/jsoncpp/
const static std::string request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis";
static std::string docAnalysis_result;
/**
 * curl发送http请求调用的回调函数，回调函数中对返回的json格式的body进行了解析，解析结果储存在全局的静态变量当中
 * @param 参数定义见libcurl文档
 * @return 返回值定义见libcurl文档
 */
static size_t callback(void *ptr, size_t size, size_t nmemb, void *stream) {
    // 获取到的body存放在ptr中，先将其转换为string格式
    docAnalysis_result = std::string((char *) ptr, size * nmemb);
    return size * nmemb;
}
/**
 * 文档版面分析与识别
 * @return 调用成功返回0，发生错误返回其他错误码
 */
int docAnalysis(std::string &json_result, const std::string &access_token) {
    std::string url = request_url + "?access_token=" + access_token;
    CURL *curl = NULL;
    CURLcode result_code;
    int is_success;
    curl = curl_easy_init();
    if (curl) {
        curl_easy_setopt(curl, CURLOPT_URL, url.data());
        curl_easy_setopt(curl, CURLOPT_POST, 1);
        curl_httppost *post = NULL;
        curl_httppost *last = NULL;
        curl_formadd(&post, &last, CURLFORM_COPYNAME, "language_type", CURLFORM_COPYCONTENTS, "CHN_ENG", CURLFORM_END);
        curl_formadd(&post, &last, CURLFORM_COPYNAME, "result_type", CURLFORM_COPYCONTENTS, "big", CURLFORM_END);
        curl_formadd(&post, &last, CURLFORM_COPYNAME, "image", CURLFORM_COPYCONTENTS, "【base64_img】", CURLFORM_END);

        curl_easy_setopt(curl, CURLOPT_HTTPPOST, post);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, callback);
        result_code = curl_easy_perform(curl);
        if (result_code != CURLE_OK) {
            fprintf(stderr, "curl_easy_perform() failed: %s\n",
                    curl_easy_strerror(result_code));
            is_success = 1;
            return is_success;
        }
        json_result = docAnalysis_result;
        curl_easy_cleanup(curl);
        is_success = 0;
    } else {
        fprintf(stderr, "curl_easy_init() failed.");
        is_success = 1;
    }
    return is_success;
}

<?php
/**
 * 发起http post请求(REST API), 并获取REST请求的结果
 * @param string $url
 * @param string $param
 * @return - http response body if succeeds, else false.
 */
function request_post($url = '', $param = '')
{
    if (empty($url) || empty($param)) {
        return false;
    }

    $postUrl = $url;
    $curlPost = $param;
    // 初始化curl
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $postUrl);
    curl_setopt($curl, CURLOPT_HEADER, 0);
    // 要求结果为字符串且输出到屏幕上
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    // post提交方式
    curl_setopt($curl, CURLOPT_POST, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $curlPost);
    // 运行curl
    $data = curl_exec($curl);
    curl_close($curl);

    return $data;
}

$token = '[调用鉴权接口获取的token]';
$url = 'https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis?access_token=' . $token;
$img = file_get_contents('[本地文件路径]');
$img = base64_encode($img);
$bodys = array(
    'language_type' => "CHN_ENG",
    'result_type' => "big",
    'image' => $img
);
$res = request_post($url, $bodys);

var_dump($res);

using System;
using System.IO;
using System.Net;
using System.Text;
using System.Web;

namespace com.baidu.ai
{
    public class DocAnalysis
    {
        // 文档版面分析与识别
        public static string docAnalysis()
        {
            string token = "[调用鉴权接口获取的token]";
            string host = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis?access_token=" + token;
            Encoding encoding = Encoding.Default;
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(host);
            request.Method = "post";
            request.KeepAlive = true;
            // 图片的base64编码
            string base64 = getFileBase64("[本地图片文件]");
            String str = "language_type=" + "CHN_ENG" + "&result_type=" + "big" + "&image=" + HttpUtility.UrlEncode(base64);
            byte[] buffer = encoding.GetBytes(str);
            request.ContentLength = buffer.Length;
            request.GetRequestStream().Write(buffer, 0, buffer.Length);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.Default);
            string result = reader.ReadToEnd();
            Console.WriteLine("文档版面分析与识别:");
            Console.WriteLine(result);
            return result;
        }

        public static String getFileBase64(String fileName) {
            FileStream filestream = new FileStream(fileName, FileMode.Open);
            byte[] arr = new byte[filestream.Length];
            filestream.Read(arr, 0, (int)filestream.Length);
            string baser64 = Convert.ToBase64String(arr);
            filestream.Close();
            return baser64;
        }
    }
}

返回说明

返回参数

字段	是否必选	类型	说明
log_id	是	uint64	唯一的log id，用于问题定位
img_direction	否	int32	detect_direction=true 时返回。检测到的图像朝向，0 ：正向； 1：逆时针旋转90度；2：逆时针旋转180度；3：逆时针旋转270度
results_num	是	uint32	识别结果数，表示results的元素个数
results	是	array[]	识别结果数组
+ words_type	是	string	文字属性（手写、印刷），handwriting 手写，print 印刷
+ words	是	array[]	整行的识别结果数组。
++ line_probability	否	array[]	line_probability=true 时返回。识别结果中每一行的置信度值，包含average：行置信度平均值，min：行置信度最小值
+++ average	否	float	行置信度
+++ min	否	float	整行中单字的最低置信度
++ word	是	string	整行的识别结果
++ poly_location	否	array[]	是否返回每行的四角点坐标，disp_line_poly=true时返回
++ words_location	是	array[]	整行的矩形框坐标。位置信息（坐标0点为左上角）
+++ left	是	uint32	表示定位位置的长方形左上顶点的水平坐标
+++ top	是	uint32	表示定位位置的长方形左上顶点的垂直坐标
+++ width	是	uint32	表示定位定位位置的长方形的宽度
+++ height	是	uint32	表示位置的长方形的高度
+ chars	否	array[]	result_type=small 时返回。单字符结果数组
++ char	否	string	result_type=small 时返回。每个单字的内容
++ chars_location	否	object	每个单字的矩形框坐标。位置信息（坐标0点为左上角）
+++ left	否	uint32	表示定位位置的长方形左上顶点的水平坐标
+++ top	否	uint32	表示定位位置的长方形左上顶点的垂直坐标
+++ width	否	uint32	表示定位定位位置的长方形的宽度
+++ height	否	uint32	表示位置的长方形的高度
formula_result	否	array[]	识别结果中的公式数组，包括公式位置和公式内容， recg_formula=true时返回
+ form_location	否	array[]	识别结果中公式的矩形框坐标数组（坐标0点为左上角）
+ form_words	否	string	识别结果中公式的内容
words_result	否	array[]	将普通文字和公式融合后的识别结果数组， recg_formula=true时返回
+ location	否	array[]	识别结果中整行的矩形框坐标数组（坐标0点为左上角）
+ words	否	string	识别结果中整行的内容
+ chars	否	array[]	单字符结果数组，公式整体作为一个单字， result_type=small 时返回
++ char	否	string	每个单字的内容
++ chars_location	否	object	每个单字的矩形框坐标数组（坐标0点为左上角）
layouts_num	否	uint32	版面分析结果数，表示layout的元素个数
layouts	否	array[]	每个「栏：section」里面的文档版面模块数组，包含表格、图、段落文本、标题、目录等5个模块；每个模块的坐标位置；段落文本和表格内文本内容对应的行序号id。
+ layout	否	string	版面分析的标签结果。表格:table, 图:figure, 文本:text, 标题:title ，目录:contents
+ layout_location	否	array[]	文档版面信息标签的位置，四个顶点: 左上，右上，右下，左下
++ x	否	uint32	水平坐标（坐标0点为左上角）
++ y	否	uint32	水平坐标（坐标0点为左上角）
+ layout_idx	否	array[]	文档版面信息中的文本在results结果中的位置：版面文本标签对应的行序号ID为n，则此标签中的文本在results结果中第n+1条展示）
sec_rows	否	uint32	将所有的版面中的「栏:section」内容表示成 M x N 的网格，sec_rows = M
sec_cols	否	uint32	将所有的版面中的「分栏」内容表示成 M x N 的网格，sec_cols = N
sections	否	array[]	一张图片中包含的5大版面属性，包含：栏，页眉，页脚，页码，脚注，该数组里有属性的标签、属性的位置、属性所包含文本内容的id序号。其中，栏（section）里面包含5个模块内容，有：表格、图、段落文本、标题、目录（在返回参数layouts里输出）
+ attribute	否	string	版面分析的属性标签结果，栏:section, 页眉:header, 页脚:footer, 页码:number，脚注:footnote
+ attri_location	否	array[]	版面分析的属性所在位置，四个顶点: 左上，右上，右下，左下
++ x	否	uint32	水平坐标（坐标0点为左上角）
++ y	否	uint32	水平坐标（坐标0点为左上角）
+ sec_idx	否	string	sections返回参数中的5个版面属性里，包含的内容序号标识
++ idx	否	string	sections返回参数中的5个版面属性里，每个属性下包含的文本行id序号
++ para_idx	否	string	当且仅当attribute=section时才会返回。表示，返回参数中的「栏：section」里面，所包含的表格、图、段落文本、标题、目录等5个模块返回的顺序号id（即layouts返回结果中，每个模块的返回顺序号）
++ row_idx	否	string	当且仅当attribute=section时才会返回。表示，将所有栏表示成 M xN 的网格，所属网格的行的id
++ col_idx	否	string	当且仅当attribute=section时才会返回。表示，将所有栏表示成 M xN 的网格，所属网格的列的id
+ long_division	否	array[]	手写竖式识别结果，当 recg_long_division=true 时返回
+ location	否	object	手写竖式的矩形框坐标数组（坐标0点为左上角）
+ words	否	object	按行输出手写竖式内文字结果
++ word	否	string	每行文字的内容
++ words_location	否	object	每行的矩形框坐标数组（坐标0点为左上角）
+ long_division_num	否	uint32	手写竖式识别结果数，表示 long_division 的元素个数，当 recg_long_division=true 时返回
underline	否	array[]	识别到的下划线结果，当 disp_underline_analysis=true 时返回
+ points	否	object	下划线坐标信息
++ start_x	否	uint32	下划线起点 x 坐标
++ start_y	否	uint32	下划线起点 y 坐标
++ end_x	否	uint32	下划线终点 x 坐标
++ end_y	否	uint32	下划线终点 y 坐标
+ prob	否	uint32	下划线置信度，取值范围在 [0，1] 之间
pdf_file_size	否	string	传入PDF文件的总页数，当 pdf_file 参数有效时返回该字段

返回示例

{
	"results_num": 6,
	"log_id": "4488766695474114139",
	"img_direction": 0,
	"layouts_num": 0,
	"results": [
		{
			"words_type": "print",
			"words": {
				"words_location": {
					"top": 124,
					"left": 136,
					"width": 418,
					"height": 65
				},
				"word": "五默写(4分)"
			},
		},
		{
			"words_type": "print",
			"words": {
				"words_location": {
					"top": 246,
					"left": 136,
					"width": 37,
					"height": 45
				},
				"word": "1"
			},
		},
		{
			"words_type": "handwriting",
			"words": {
				"words_location": {
					"top": 195,
					"left": 237,
					"width": 469,
					"height": 104
				},
				"word": "采菊东篱下"
			},
		},
		{
			"words_type": "print",
			"words": {
				"words_location": {
					"top": 241,
					"left": 889,
					"width": 287,
					"height": 52
				},
				"word": "悠然见南山?"
			},
		},
		{
			"words_type": "print",
			"words": {
				"words_location": {
					"top": 415,
					"left": 134,
					"width": 472,
					"height": 52
				},
				"word": "2.商女不知亡国恨"
			},
		},
		{
			"words_type": "handwriting",
			"words": {
				"words_location": {
					"top": 377,
					"left": 607,
					"width": 556,
					"height": 93
				},
				"word": "隔江犹唱后庭花。"
			},
		},
	],
  "formula_result": [
        {
            "form_location": {
                "top": 0,
                "left": 97,
                "width": 151,
                "height": 77
            },
            "form_words": " x = \\frac { 1 } { n - 1 } - 1 1 \\frac { \\frac { 5 } { 2 } } { 5 }"
        },
        {
            "form_location": {
                "top": 119,
                "left": 118,
                "width": 115,
                "height": 80
            },
            "form_words": " = \\sqrt { \\frac { x } { 2 } ( x - 1 ) ^ { 2 } }"
        },
        {
            "form_location": {
                "top": 196,
                "left": 78,
                "width": 17,
                "height": 24
            },
            "form_words": " x ^ { 2 }"
        },
        {
            "form_location": {
                "top": 244,
                "left": 79,
                "width": 103,
                "height": 70
            },
            "form_words": " s = \\frac { \\sum _ { i = 0 } { m } \\cdot i v } { - 1 }"
        }
    ],
    "words_result": [
        {
            "location": {
                "top": 164,
                "left": 255,
                "width": 111,
                "height": 16
            },
            "words": "其中m表示考生"
        },
        {
            "location": {
                "top": 198,
                "left": 24,
                "width": 341,
                "height": 18
            },
            "words": "的人数  x ^ { 2 } 表示的是滴个考上的第i题等分，"
        },
    ],
}

医疗票据文字识别

公式识别