原生多模态
更新时间:2025-04-11
原生多模态基础大模型通过多个模态联合建模实现协同优化,多模态理解能力优秀;具备更精进的语言能力,理解、生成、逻辑、记忆能力全面提升,去幻觉、逻辑推理、代码能力显著提升。
支持模型列表
模型名称 | 模型版本 | model 参数值 | max_completion_tokens取值范围 |
---|---|---|---|
ERNIE 4.5 | ERNIE-4.5-8K-Preview | ernie-4.5-8k-preview | [2,8192] |
Llama-4-Maverick | Llama-4-Maverick-17B-128E-Instruct | llama-4-maverick-17b-128e-instruct | [2,8192] |
Llama-4-Scout | Llama-4-Scout-17B-16E-Instruct | llama-4-scout-17b-16e-instruct | [2,8192] |
使用方法
输入
支持的输入模态
支持以下输入组合:
输入多模态数据的方式
输入的图片、文件支持 Base64 编码或公网 URL。以下示例代码均以传入公网 URL 为例,如果需要传入 Base64 编码,请参见:图片Base 64 编码输入
输出
当前支持以流式、非流式形式调用多模态模型。
支持的输出模态
重要
当前仅支持文本输出;Llama 4 仅支持文本输出,ERNIE 4.5 后续推出音频。
快速开始
您可以通过两种方式将图像传入模型:图像 URL 和 Base64 编码。与文本信息相同,图像信息也需要使用用户角色进行输入,即"role": "user"
。以下是使用视觉理解模型的简单调用示例。
前提条件
调用本文API,需使用API Key鉴权方式。使用API Key鉴权调用API流程,具体调用流程,请查看认证鉴权。
文本输入
模型支持接收纯文本作为输入。
curl --location 'https://qianfan.bj.baidubce.com/v2/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"model": "ernie-4.5-8k-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "介绍几个上海著名景点"
}
]
}
],
"stream":false
}'
图片输入
模型支持接收纯图片作为输入。
模型支持传入多张图片。对输入图片的要求如下:
- 单个图片文件的大小不超过10 MB;
- 图片数量受模型图文总 Token 上限(即最大输入)的限制,所有图片的总 Token 数必须小于模型的最大输入;
curl --location 'https://qianfan.bj.baidubce.com/v2/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"model": "ernie-4.5-8k-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://testimage.bj.bcebos.com/image1.jpg"
}
}
]
}
],
"stream": false
}'
图片+文本输入
模型支持接收 图片+文本 作为输入。
模型支持传入多张图片。对输入图片的要求如下:
- 单个图片文件的大小不超过10 MB;
- 图片数量受模型图文总 Token 上限(即最大输入)的限制,所有图片的总 Token 数必须小于模型的最大输入;
curl --location 'https://qianfan.bj.baidubce.com/v2/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"model": "ernie-4.5-8k-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are in these images? Is there any difference between them?"
},
{
"type": "image_url",
"image_url": {
"url": "https://testimage.bj.bcebos.com/image1.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://testimage.bj.bcebos.com/image2.png"
}
}
]
}
],
"stream": false
}'
搜索增强
注意:只有ERNIE 4.5支持搜索增强。
curl --location 'https://qianfan.bj.baidubce.com/v2/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"model": "ernie-4.5-8k-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "2024年奥运会乒乓球男单冠军是谁"
}
]
}
],
"web_search": {
"enable": true,
"enable_trace": true
},
"stream": false,
"max_tokens": 512
}'
多轮对话
第一轮
请求
curl --location 'https://qianfan.bj.baidubce.com/v2/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"model": "ernie-4.5-8k-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are in these images? Is there any difference between them?"
},
{
"type": "image_url",
"image_url": {
"url": "https://testimage.bj.bcebos.com/image1.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://testimage.bj.bcebos.com/image2.png"
}
}
]
}
],
"stream": false
}'
返回
{
"id": "as-t4ntazzewk",
"object": "chat.completion",
"created": 1742198333,
"model": "ernie-4.5-8k-preview",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The first image depicts a serene natural landscape with a wooden boardwalk leading through a lush green field under a blue sky with scattered clouds. The second image shows a close-up of a hand interacting with a small, intricately designed figurine that resembles a turtle with detailed patterns and textures, surrounded by green foliage. The primary difference between the two images is that the first is a wide-angle photograph of an outdoor scene, while the second is a close-up photograph of a hand and a decorative object."
},
"finish_reason": "stop",
"flag": 0
}
],
"usage": {
"prompt_tokens": 3019,
"completion_tokens": 102,
"total_tokens": 3121
}
}
第二轮
请求
curl --location 'https://qianfan.bj.baidubce.com/v2/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your-api-key' \
--data '{
"model": "ernie-4.5-8k-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are in these images? Is there any difference between them?"
},
{
"type": "image_url",
"image_url": {
"url": "https://testimage.bj.bcebos.com/image1.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://testimage.bj.bcebos.com/image2.png"
}
}
]
},
{
"role": "assistant",
"content": "The first image depicts a serene natural landscape with a wooden boardwalk leading through a lush green field under a blue sky with scattered clouds. The second image shows a close-up of a hand interacting with a small, intricately designed figurine that resembles a turtle with detailed patterns and textures, surrounded by green foliage. The primary difference between the two images is that the first is a wide-angle photograph of an outdoor scene, while the second is a close-up photograph of a hand and a decorative object."
},
{
"role": "user",
"content": "不够详细,再补充解析细节一点哇"
}
],
"stream": false
}'
返回
{
"id": "as-14hvmej1ts",
"object": "chat.completion",
"created": 1742198435,
"model": "ernie-4.5-8k-preview",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "当然,让我们更详细地解析这两张图片的差异:\n\n1. **主题和内容**:\n - **第一张图片**:这是一张风景照片,展示了一个自然环境。照片中有一条木制栈道穿过一片绿色的草地,背景是蓝天和一些树木。整体氛围宁静,色彩鲜艳,给人一种开阔和放松的感觉。\n - **第二张图片**:这是一张特写照片,展示了一只手正在与一个装饰性物品互动。这个物品看起来像是一只带有复杂图案和纹理的乌龟,周围还有一些绿色的植物装饰。整体氛围更加细腻和手工艺感,色彩丰富且细节精致。\n\n2. **视角和构图**:\n - **第一张图片**:采用了广角视角,捕捉了大面积的自然景观。栈道从前景延伸到背景,引导观众的视线深入画面。天空和地面的比例均衡,给人一种空间感和深度感。\n - **第二张图片**:采用了特写视角,聚焦于手和乌龟的细节。手指和乌龟的纹理清晰可见,背景模糊,突出主体。构图紧凑,细节丰富,吸引观众注意手工艺品的精细制作。\n\n3. **色彩和光线**:\n - **第一张图片**:色彩以绿色和蓝色为主,光线自然,可能是拍摄于白天。阳光照射在草地和栈道上,形成明暗对比,增强了画面的立体感。\n - **第二张图片**:色彩更加多样,乌龟的绿、蓝、黄等颜色鲜艳,光线柔和,可能是室内拍摄。光线集中在手和乌龟上,突出细节和质感。\n\n4. **情感和氛围**:\n - **第一张图片**:传达出一种宁静、自然和放松的情感,适合用于展示自然风光或户外活动的场景。\n - **第二张图片**:传达出一种细腻、手工艺和专注的情感,适合用于展示手工艺品或创意作品的场景。\n\n总的来说,这两张图片在主题、视角、色彩和情感上都有显著的不同,分别展示了自然景观和手工艺品的美感。"
},
"finish_reason": "stop",
"flag": 0
}
],
"usage": {
"prompt_tokens": 3130,
"completion_tokens": 394,
"total_tokens": 3524
}
}
图片 Base 64 编码输入
如果您需要上传本地图像,可以将图像转成 Base 64 编码后输入。以下是一个兼容 OpenAI 接口规范的示例。
from openai import OpenAI
import os
import base64
client = OpenAI(
api_key = os.getenv("OPENAI_API_KEY"),
base_url="https://qianfan.baidubce.com/v2",
)
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Path to your image
image_path = "image1.jpg"
# Getting the Base64 string
base64_image = encode_image(image_path)
response = client.chat.completions.create(
model="ernie-4.5-8k-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
图像参数使用说明
- 大模型每一次调用都是无状态的,您需要自行管理传入给模型的信息。如果需要模型多次理解同一张图像,请在每次请求时都传入该图。
- 支持单图和多图,每一张图片大小不超过10MB,多张图片输入的总token不超过模型上下文长度。如ERNIE-4.5模型,不超过8K token的图片输入。
- 图片格式:
a. 图片base64:JPG、JPEG、PNG和BMP类型,传入的格式需为:
data:image/<图片格式>;base64,<Base64编码>
b. 图片公网url:支持JPG、JPEG、PNG、BMP和WEBP类型 - 详细参数格式说明
参数名称 | 类型 | 是否必须 | 描述 | 备注 |
---|---|---|---|---|
type | string | 是 | 只有一个取值: image_url | |
image_url | obj | 是 | 输入的图片信息 | 1、支持多图,图片数量不限制,超过8K token会报token超限 2、单图最大不超过10MB(url下载后图片大小,或base64保存图片后大小) |
+url | string | 是 | 图片的公网url或者base64 | 1、支持格式: base64:JPG、JPEG、PNG和BMP等类型 url:支持JPG、JPEG、PNG、BMP和WEBP等类型 2、若为base64,传入的格式需为:data:image/<图片格式>;base64,<Base64编码> |
+detail | string | 否 | 图像/分辨率质量 | low表示低分辨率,high表示高分辨率 |
错误码
如果请求错误,服务器返回的JSON文本包含以下参数。
code | 错误码 |
msg | 错误描述信息,帮助理解和解决发生的错误 |
type | 错误类型 |
更多相关错误码,请查看推理服务V2版本错误码说明。