资讯 文档
技术能力
语音技术
文字识别
人脸与人体
图像技术
语言与知识
视频技术

LLM_API

LLM API

The Badiu-AI Studio LLM API is a set of foundational LLM API services provided to developers, supported by the Baidu AI Cloud Qianfan Platform, and offers capabilities of LLMs such as ERNIE models. This LLM API service is compatible with the openai-python SDK, allowing developers to directly use the native openai-python SDK to call ERNIE and other LLM services.

Join the official free tutorial course now 《LLM API Service: From Service Calls to Application Practice》, get started easily in 2 minutes, and master large models.

1. Preparation

1.1 Access Token

Access Token is used for AI Studio user authentication. It allows executing specific operations specified by the authorization scope (such as LLM API call permissions, repository read access permissions, etc.) towards AI Studio via the access token. You can go to the Access Token Page in your personal center to view your exclusive access token.

1.2 Tokens

Tokens are the basic unit of measurement for calling large model SDKs or using large model applications on the Baidu AI Studio. AI Studio provides each developer with a free quota of 1 million Tokens. Developers will be charged a different number of Tokens for using different models. You can check the Token Management to view the usage details. If Tokens are used up, you can Buy Tokens before using again.

1.3 Service Domain

The domain address for the Badiu-AI Studio LLM API service is: https://aistudio.baidu.com/llm/lmapi/v3

When using openai-python to call the Badiu-AI Studio LLM API service, you need to set:


2. Model List and Query

2.1 Text-to-Text Model List

Model Name model Parameter Value Context Length (token) Max Input (token) Max Output
(token)
(Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B ernie-4.5-turbo-vl 128k 123k [2, 12288]
Default 2k
(Open Sourced on 6/30) ERNIE-4.5-300B-A47B ernie-4.5-turbo-128k-preview 128k 123k [2, 12288]
Default 2k
(Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B ernie-4.5-vl-28b-a3b 128k 123k [2, 12288]
Default 2k
(Open Sourced on 6/30) ERNIE-4.5-21B-A3B ernie-4.5-21b-a3b 128k 120k [2, 12288]
Default 2k
(Open Sourced on 6/30) ERNIE-4.5-0.3B ernie-4.5-0.3b 128k 120k [2, 12288]
Default 2k
DeepSeek-Chat deepseek-v3 128k 128k [2, 12288]
Default 2k
ERNIE 4.0 ernie-4.0-8k 8k 5k [2, 2048]
Default 2k
ERNIE 4.0 Turbo ernie-4.0-turbo-128k 128k 124k [2, 4096]
Default 4k
ERNIE 4.0 Turbo ernie-4.0-turbo-8k 8k 5k [2, 2048]
Default 2k
ERNIE 3.5 ernie-3.5-8k 8k 5k [2, 2048]
Default 2k
ERNIE Character ernie-char-8k 8k 7k [2, 2048]
Default 1k
ERNIE Speed ernie-speed-8k 8k 6k [2, 2048]
Default 1k
ERNIE Speed ernie-speed-128k 128k 124k [2, 4096]
Default 4k
ERNIE Tiny ernie-tiny-8k 8k 6k [2, 2048]
Default 1k
ERNIE Lite ernie-lite-8k 8k 6k [2, 2048]
Default 1k
Kimi-K2 kimi-k2-instruct 128k 128k [1, 32768]
Default 4k
Qwen3-Coder qwen3-coder-30b-a3b-instruct 128k 128k [1, 32768]
Default 4k

2.2 Thinking Model List

Model Name model Parameter Value Context Length (token) Max Input (token) Max Output
(token)
Chain of Thought Length
(token)
(Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B ernie-4.5-turbo-vl 128k 123k [2, 12288]
Default 2k
16k
(Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B ernie-4.5-vl-28b-a3b 128k 123k [2, 12288]
Default 2k
16k
ERNIE X1 Turbo ernie-x1-turbo-32k 32k 24k [2, 16384]
Default 2k
16k
DeepSeek-Reasoner deepseek-r1-250528 96k 64k 16k
Default 4k
32k
DeepSeek-Reasoner deepseek-r1 96k 64k 16k
Default 4k
32k

2.3 Multimodal Model List

For multimodal model usage, please see section 5.8 of this document. (Added 2025/6/30: Video understanding call example, section 5.8.6)

Model Name model Parameter Value Supported Modalities Context Length (token) Max Input (token) Max Output
(token)
(Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B ernie-4.5-turbo-vl Text, Image, Video 128K 123K [2, 12288]
(Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B ernie-4.5-vl-28b-a3b Text, Image, Video 128k 123K [2, 12288]
ERNIE 4.5 Turbo VL ernie-4.5-turbo-vl-32k Text, Image 32k 30k [1, 8192]
Default 4k

2.4 Embedding Model List

Model Name model Parameter Max Input Text Count Context Length per Text (token)
Embedding-V1 embedding-v1 1 384
bge-large-zh bge-large-zh 16 512

2.5 Text-to-Image Model

Model Name Type
Stable-Diffusion-XL Text-to-Image Model

2.6 Feature Support

2025/6/30 Open Source Model List:

Model Name model Parameter Value Supported Capabilities Supported Modalities
(Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B ernie-4.5-turbo-vl Chat Model
Thinking (Coming Soon)
Text
Image
Video
(Open Sourced on 6/30) ERNIE-4.5-300B-A47B ernie-4.5-turbo-128k-preview Chat Model Text
(Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B ernie-4.5-vl-28b-a3b Chat Model
Thinking
Text
Image
Video (Coming Soon)
(Open Sourced on 6/30) ERNIE-4.5-21B-A3B ernie-4.5-21b-a3b Chat Model Text
(Open Sourced on 6/30) ERNIE-4.5-0.3B ernie-4.5-0.3b Chat Model Text

Web Search (Search Enhancement):

  • ernie-4.5
  • ernie-4.5-turbo
  • ernie-4.0
  • ernie-4.0-turbo
  • ernie-3.5
  • deepseek-r1
  • deepseek-v3

function call:

  • ernie-x1-turbo-32k
  • deepseek-r1
  • deepseek-v3

Structured Output:

  • ernie-4.5
  • ernie-4.0-turbo
  • ernie-3.5

2.7 Query Model List

# Query the list of supported models

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio Access Token, [https://aistudio.baidu.com/account/accessToken](https://aistudio.baidu.com/account/accessToken),
    base_url="[https://aistudio.baidu.com/llm/lmapi/v3](https://aistudio.baidu.com/llm/lmapi/v3)",  # aistudio LLM api service domain
)

models = client.models.list()
for model in models.data:
    print(model.id)

3. Install Dependencies

# install from PyPI
pip install openai

4. Basic Model Capability Usage

4.1 Text-to-Text

4.1.1 Model Use

import os
from openai import OpenAI

client = OpenAI(
     api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
     base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

chat_completion = client.chat.completions.create(
    messages=[
        {'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
        {'role': 'user', 'content': 'Hello, please introduce AI Studio'}
    ],
    model="ernie-3.5-8k",
)

print(chat_completion.choices[0].message.content)

To avoid exposing the api_key in the code, you can use python-dotenv to add AI_STUDIO_API_KEY="YOUR_ACCESS_TOKEN" to your .env file. Of course, you can also specify it directly via api_key="YOUR_ACCESS_TOKEN"

4.1.2 Request Parameter Description

body Description

Name Type Required Description Natively supported by openai-python
model string Yes Model ID, available values can be obtained from client.models.list() Yes
messages List Yes Chat context information. Description:
(1) messages members cannot be empty. 1 member means a single-turn dialogue, multiple members mean a multi-turn dialogue, for example:
1 member example, "messages": [ {"role": "user","content": "Hello"}]
3 members example, "messages": [ {"role": "user","content": "Hello"},{"role":"assistant","content":"What help do you need"},{"role":"user","content":"Introduce yourself"}]
(2) The last message is the current request information, the previous messages are historical dialogue information
(3) Description of roles in messages:
① The role of the first message must be user or system
② The role of the last message must be user or tool. If it is ERNIE 4.5 or ERNIE-X1-32K-Preview, the role of the last message must be user
③ If function call is not used:
· When the role of the first message is user, the role values need to be in the order user -> assistant -> user..., i.e., the role value of odd-numbered messages must be user or function, and the role value of even-numbered messages must be assistant, for example: in the example message, the role values are user, assistant, user, assistant, user; the role value of odd-numbered (red box) messages is user, i.e., the role value of the 1st, 3rd, 5th messages is user; the value of even-numbered (blue box) is assistant, i.e., the role value of the 2nd, 4th messages is assistant image.png
· When the role of the first message is system, the role values need to be in the order system -> user/function -> assistant -> user/function ...
(4) The total length of content in messages cannot exceed the input character limit and input token limit of the corresponding model, please check Context Length Description for Each Model
(5) If it is ERNIE 4.5, please refer to the following: consecutive user/assistant and starting message as assistant are not supported. The specific rules are as follows:
· messages members cannot be empty, 1 member means single-turn dialogue, multiple members mean multi-turn dialogue;
· The role of the first message must be user or system
· The role of the last message must be user
· After removing the first system role, the roles need to be in the order user -> assistant -> user ...
Yes
stream bool No Whether to return data in the form of a streaming interface, Description:
(1) Can only be false for beam search models
(2) Default is false
Yes
temperature float No Description:
(1) Higher values make the output more random, while lower values make it more focused and deterministic
(2) Default 0.95, range (0, 1.0], cannot be 0
(3) Not supported by the following models:
· deepSeek-v3
· deepSeek-r1
· ernie-x1-32k-preview
Yes
top_p float No Description:
(1) Affects the diversity of the output text. The larger the value, the stronger the diversity of the generated text
(2) Default 0.7, value range [0, 1.0]
(3) Not supported by the following models:
· deepSeek-v3
· deepSeek-r1
· ernie-x1-32k-preview
Yes
penalty_score float No Reduces the phenomenon of repeated generation by adding penalties to already generated tokens. Description:
(1) The larger the value, the greater the penalty
(2) Default 1.0, value range: [1.0, 2.0]
(3) Not supported by the following models:
· deepSeek-v3
· deepSeek-r1
· ernie-x1-32k-preview
No
max_completion_tokens int No Specify the maximum number of output tokens for the model, Description:
(1) Value range [2, 2048], please check the supported model list for specific model support
Yes
response_format string No Specify the format of the response content, Description:
(1) Optional values:
· json_object: return in json format, may not meet expectations
· text: return in text format
(2) If the parameter response_format is not filled, the default is text
(3) Not supported by the following models: ernie-x1-32k-preview
Yes
seed int No Description:
(1) Value range: (0, 2147483647), will be randomly generated by the model, default is empty
(2) If specified, the system will make a best effort for deterministic sampling, so that repeated requests with the same seed and parameters return the same result
(3) Not supported by the following models: ernie-x1-32k-preview
Yes
stop List No Generation stop identifier. When the model's generated result ends with an element in stop, text generation stops. Description:
(1) Each element's length should not exceed 20 characters
(2) At most 4 elements
(3) Not supported by the following models: ernie-x1-32k-preview
Yes
frequency_penalty float No Description:
(1) Positive values penalize new tokens based on their existing frequency in the text so far, reducing the likelihood of the model repeating the same line verbatim
(2) Value range: [-2.0, 2.0]
(3) Supported by the following models:

ernie-speed-8k、ernie-speed-128k 、ernie-tiny-8k、ernie-char-8k、ernie-lite-8k
Yes
presence_penalty float No Description:
(1) Positive values penalize new tokens based on whether they appear in the text so far, increasing the likelihood of the model talking about new topics
(2) Value range: [-2.0, 2.0]
(3) Supported by the following models: ernie-speed-8k、ernie-speed-128k 、ernie-tiny-8k、ernie-char-8k、ernie-lite-8k
Yes
tools List(Tool) No A list of descriptions of functions that can be triggered. For supported models, please refer to the supported model list in this document - whether function call is supported Yes
tool_choice string / tool_choice No Description:
(1) For supported models, please refer to the supported model list in this document - whether function call is supported
(2) string type, optional values are as follows:
· none: The model is not expected to call any function, only generate user-facing text messages
· auto: The model will automatically decide whether to call functions and which functions to call based on the input content
· required: The model is expected to always call one or more functions
(3) When it is of type tool_choice, it means prompting the large model to select a specified function in a function call scenario. The specified function name must exist in tools
Yes
parallel_tool_calls bool No Description:
(1) For supported models, please refer to the supported model list in this document - whether function call is supported
(2) Optional values:
· true: means enable parallel function calling, enabled by default
· false: means disable parallel function calling
Yes
web_search web_search No Search enhancement options, Description:
(1) Default is off (not passed)
(2) For supported models, please see the model list description above
No

message Description

Name Type Required Description
role string Yes Currently supports the following:user: represents the userassistant: represents the dialogue assistantsystem: represents the persona
name string No message name
content string Yes Dialogue content, Description: (1) Cannot be empty (2) The content corresponding to the last message cannot be blank characters, such as spaces, "\n", "\r", "\f", etc.

Tool's function Description

The function description in Tool is as follows

Name Type Required Description
name string Yes Function name
description string No Function description
parameters object No Function request parameters, in JSON Schema format, refer to JSON Schema Description

tool_choice Description

Name Type Required Description
type string Yes Specify the tool type, fixed value function
function function Yes Specify the function to use

tool_choice's function Description

Name Type Required Description
name string Yes Specify the name of the function to use

web_search Description

Name Type Description
enable bool Whether to enable the real-time search function, Description: (1) If real-time search is disabled, superscript and traceability information will not be returned (2) Optional values:· true: enable · false: disable, default false
enable_citation bool Whether to enable superscript return, Description: (1) Takes effect when enable is true (2) Optional values: · true: enable; if enabled, in scenarios where search enhancement is triggered, the response content will include superscripts and the corresponding search traceability information for the superscripts · false: not enabled, default false (3) If the retrieved content includes non-public webpages, superscripts will not be effective
enable_trace bool Whether to return search traceability information, Description: (1) Takes effect when enable is true. (2) Optional values: · true: return; if true, in scenarios where search enhancement is triggered, search traceability information search_results will be returned · false: do not return, default false (3) If the retrieved content is a non-public webpage, traceability information will not be returned even if search is triggered

4.1.3 Response Parameter Description

Name Type Description
id string Unique identifier for this request, can be used for troubleshooting
object string Packet type chat.completion: multi-turn dialogue return
created int Timestamp
model string Model ID
choices object Description: The returned content differs when the request parameter stream value is different
usage usage Token statistics, Description: (1) Returned by default for synchronous requests (2) The actual content will be returned in the last chunk, other chunks return null

choices Description

Name Type Description
index int Sequence number in the choice list
message message Response information, returned when stream=false
delta delta Response information, returned when stream=true
finish_reason string Output content identifier, Description: normal: The output content is completely generated by the large model, without triggering truncation or replacement stop: The output result was truncated after hitting a specified field in the input parameter stoplength: Reached the maximum number of tokenscontent_filter: The output content was truncated, defaulted, replaced with **, etc.function_call: The function call feature was invoked
flag int Security subdivision type, Description: (1) When stream=false, the meaning of the flag value is as follows:0 or not returned: Safe1: Low-risk unsafe scenario, conversation can continue2: Chat prohibited: Conversation not allowed to continue, but content can be displayed3: Display prohibited: Conversation not allowed to continue and content cannot be displayed on screen4: Screen retraction (2) When stream=true, a returned flag indicates security was triggered
ban_round int When flag is not 0, this field indicates which round of dialogue contains sensitive information; if it is the current question, ban_round = -1

choices's message Description

Name Type Description
role string Currently supports the following:· user: represents the user· assistant: represents the dialogue assistant· system: represents the persona
name string message name
content string Dialogue content
tool_calls List[ToolCall] Function call, returned in the first round of dialogue in a function call scenario, passed as historical information in the message in the second round
tool_call_id string Description:(1) This field is required when role=tool(2) The function call id generated by the model, corresponding to tool_calls[].id in tool_calls(3) The caller should pass the real id generated by the model, otherwise the effect will be compromised
reasoning_content string Chain of thought content, Note: Only valid when the model is DeepSeek-R1

delta Description

Name Type Description
content string Streaming response content
tool_calls List[ToolCall] Function calls generated by the model, including function name and call parameters

ToolCall Description

Name Type Description
id string Unique identifier for the function call, generated by the model
type string Fixed value function
function function Specific content of the function call

ToolCall's function Description

Name Type Description
name string Function name
arguments string Function arguments

search_results Description

Name Type Description
index int Sequence number
url string Search result URL
title string Search result title

usage Description

Name Type Description
prompt_tokens int Number of question tokens (including historical Q&A)
completion_tokens int Number of answer tokens
total_tokens int Total number of tokens

4.2 Text-to-Image

4.2.1 Model Use

import os
from openai import OpenAI
import base64

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

# Generated image returned as a URL
images_url = client.images.generate(prompt="A white cat, red hat", model="Stable-Diffusion-XL", response_format="url")
print(images_url.data[0].url)

# Generated image returned as base64
images_base64 = client.images.generate(prompt="A black cat, blue hat", model="Stable-Diffusion-XL", response_format="b64_json")
# Save the generated images
for i, image in enumerate(images_base64.data):
    with open("image_{}.png".format(i), "wb") as f:
        f.write(base64.b64decode(image.b64_json))

4.2.2 Request Parameter Description

Name Type Required Description Natively supported by openai-python
model string Yes Model ID, available values can be obtained from client.models.list() Yes
prompt string Yes Prompt, i.e., the elements the user wants the image to contain. Description: Length limit 1024 characters, recommended total number of Chinese or English words not to exceed 150 Yes
negative_prompt string No Negative prompt, i.e., the elements the user does not want the image to contain. Description: Length limit 1024 characters, recommended total number of Chinese or English words not to exceed 150 No
response_format string No The format for the returned generated image. Must be one of url or b64_json. After image generation, the url is valid for 7 days. Yes
size string No Generated image width and height, Description: (1) Default value 1024x1024 (2) Value range as follows: Suitable for avatars: ["768x768", "1024x1024", "1536x1536", "2048x2048"] Suitable for article illustrations: ["1024x768", "2048x1536"] Suitable for posters/flyers: ["768x1024", "1536x2048", "576x1024", "1152x2048"] Suitable for computer wallpapers: ["1024x576", "2048x1152"] Yes
n int No Number of images to generate, Description: (1) Default value is 1 (2) Value range is 1-4 (3) Generating many images at once or frequent requests may lead to request timeout Yes
steps int No Number of iterations, Description: Default value is 20 Value range is [10-50] No
style string No Generation style. Description: (1) Default value is Base (2) Optional values: Base: Basic style 3D Model: 3D Model Analog Film: Analog Film Anime: Anime Cinematic: Cinematic Comic Book: Comic Book Craft Clay: Craft Clay Digital Art: Digital Art Enhance: Enhance Fantasy Art: Fantasy Art Isometric: Isometric Line Art: Line Art Lowpoly: Lowpoly Neonpunk: Neonpunk Origami: Origami Photographic: Photographic Pixel Art: Pixel Art Texture: Texture Yes
sampler_index string No Sampling method, Description: (1) Default value: Euler a (2) Optional values as follows: Euler Euler a DPM++ 2M DPM++ 2M Karras LMS Karras DPM++ SDE DPM++ SDE Karras DPM2 a Karras Heun DPM++ 2M SDE DPM++ 2M SDE Karras DPM2 DPM2 Karras DPM2 a LMS No
retry_count int No Number of retries, default 1 No
request_timeout float No Request timeout, default 60 seconds No
backoff_factor float No Request retry parameter, used to specify the retry strategy, default is 0 No
seed integer No Random seed, Description:If not set, a random number is automatically generated Value range [0, 4294967295] No
cfg_scale float No Prompt relevance, Description: Default value is 5, value range 0-30 No

4.2.3 Model Response Description

Name Type Description
created int Timestamp
data list(image) Generated image result

image Description

Name Type Description
b64_json string Image base64 encoded content, if and only if response_format=b64_json
url string Image URL, if and only if response_format=url
index int Sequence number

4.3 Embeddings

4.3.1 Model Use

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

embeddings = client.embeddings.create(
        model="embedding-v1",
        input=[
            "Recommend some food",
            "Tell me a story"
        ]
    )
print(embeddings)

4.3.2 Request Parameter Description

Name Type Required Description
model str No Model ID, available values can be obtained from client.models.list()
Input List[str] Yes Input text, Description: (1) Cannot be an empty List, each member of the List cannot be an empty string (2) Number of texts cannot exceed 16 (3) Description: embedding-v1: Number of texts cannot exceed 16, each text's token count cannot exceed 384 and length cannot exceed 1000 characters bge-large-zh: Number of texts cannot exceed 16, each text's token count cannot exceed 512 and length cannot exceed 2000 characters

4.3.3 Return Parameter Description

Name Type Description
object str Packet type, fixed value "embedding_list"
data List[EmbeddingData] embedding information, number of data members matches the number of texts
usage Usage token statistics, token count = number of Chinese characters + number of words*1.3 (estimation logic only)

EmbeddingData Description

Name Type Description
object str Fixed value "embedding"
embedding List[float] embedding content
index int Sequence number

Usage Description

Name Type Description
prompt_tokens int Question tokens count
total_tokens int Total tokens count

5.Model Extension Capability Usage

5.1 Multi-Turn Dialogue

import os
from openai import OpenAI

def get_response(messages):
    client = OpenAI(
        api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
        base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
    )
    completion = client.chat.completions.create(model="ernie-3.5-8k", messages=messages)
    return completion

messages = [
    {
        "role": "system",
        "content": "You are an AI Studio developer assistant. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.",
    }
]

assistant_output = "Hello, I am the AI Studio developer assistant. How can I help you?"
print(f"""Input: "End" to end the conversation\n""")
print(f"Model output: {assistant_output}\n")
user_input = ""
while "End" not in user_input:
    user_input = input("Please enter: ")
    # Add user's question to the messages list
    messages.append({"role": "user", "content": user_input})
    assistant_output = get_response(messages).choices[0].message.content
    # Add the model's reply to the messages list
    messages.append({"role": "assistant", "content": assistant_output})
    print(f"Model output: {assistant_output}")
    print("\n")

5.2 Streaming Output

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
    model="ernie-3.5-8k",
    messages=[
        {'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
        {'role': 'user', 'content': 'Hello, please introduce AI Studio'}
    ],
    stream=True,
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

5.3 Asynchronous Use

import os
from openai import AsyncOpenAI
import asyncio

client = AsyncOpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
            {'role': 'user', 'content': 'Hello, please introduce AI Studio'}
        ],
        model="ernie-3.5-8k",
    )
    print(chat_completion.choices[0].message.content)
    

asyncio.run(main())

5.4 Search Enhancement

Usage Scenarios

For scenarios requiring real-time information or the latest data, such as news event queries, literature retrieval, and tracking policy changes. Based on web search capabilities, the model can obtain real-time data and information to answer user questions more accurately in specific scenarios.

​How to Use

Add the following web_search parameters to the request body to enable web search. The parameter descriptions are as follows:

Parameter Name Type Required Default Value Description
enable boolean No No Whether to enable the web search feature
enable_trace boolean No false Whether to return traceability information
enable_status boolean No false Whether to return a search trigger signal in the response. If search is triggered, the first packet returns 'Searching', and delta_tag:search_status indicates this packet is a signal packet
enable_citation boolean No false Whether to include citation source superscripts in the response. Single superscript format example: ^[1]^, multiple superscript format example: ^[1][2]^
search_number integer No 10 Number of documents to retrieve, range is [1~28]
reference_number integer No 10 Number of documents used for the large model's summary, range is [1~28] (must be ≤ search_num)

Parameter Example:

{
    "web_search": {
        "enable": true,
        "enable_citation": true,
        "enable_trace": true,
        "enable_status": true,
        "search_num": 10,
        "reference_num": 5
    }
}

Supported Models

  • ernie-4.5
  • ernie-4.5-turbo
  • ernie-4.0
  • ernie-4.0-turbo
  • ernie-3.5
  • deepseek-r1
  • deepseek-v3

Code Example

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
        model="ernie-4.0-turbo-8k",
        messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "Who is the men's singles table tennis champion of the 2024 Olympics"
                        }
                    ]
                }
            ],
            extra_body={
                "web_search": {
                    "enable": True,
                    "enable_trace": True
                }
            },
        stream=True,
    )

search_result = []
for chunk in completion:
    if (len(chunk.choices) > 0):
        if (hasattr(chunk, 'search_results')):
            search_result.extend(chunk.search_results)
        print(chunk.choices[0].delta.content, end="", flush=True)
    
unique_dict = {}
for item in search_result:
    unique_dict[item["index"]] = item
print("\nReferences:\n")
for result in list(unique_dict.values()):
    print(str(result["index"]) + ". " + result["title"] + ". " + result["url"] + "\n")

5.5 Structured Output

Introduction

JSON is one of the most widely used formats for applications to exchange data in the world.

Structured output is a feature that ensures the model always generates a response that conforms to the JSON schema you provide, so users don't have to worry about the model omitting required keys or producing invalid enum values.

Some benefits of structured output include:

  • Reliable type safety: No need to validate or retry improperly formatted responses
  • Clear rejection: Model rejections based on safety can now be detected programmatically
  • Simpler prompting: No need to use strongly-worded prompts to achieve consistent formatting

How to enable Control the generation of response content through the response_format field.

Field Data Type Description
type string Specifies the format of the response content. Optional values: json_object: returns in json format, may not meet expectations; text: returns in text format, default is text; json_schema: returns in the format specified by json_schema
json_schema object json_schema format, please refer to JSON Schema description; this parameter is required when type is json_schema

Supported Models

  • ernie-4.5
  • ernie-4.0-turbo
  • ernie-3.5

Code Example

{
  "model": "ernie-3.5-8k",
  "messages": [
    {
      "role": "user",
      "content": "Shanghai weather today"
    }
  ],
  "response_format": {
    "type": "text" //Can be replaced with json_object, json_schema
  }
}

We can see that when the format setting is different, the returned content format changes:

  • response_format not enabled
  Since weather information is updated in real-time, I cannot directly provide the precise weather conditions for Shanghai today.\n\nTo get the latest Shanghai weather information, I recommend you check a weather forecast application, visit the official website of the meteorological bureau, or use other reliable weather information sources. These platforms usually provide detailed real-time weather data such as temperature, humidity, wind speed, precipitation probability, etc., as well as weather forecasts for the next few days.\n\nHope these suggestions are helpful to you!
  • response_format enabled
  "{\n    \"Shanghai today's weather\": \"Since I cannot obtain real-time weather information, I am unable to provide the exact weather conditions for Shanghai today.\"\n}\n\nTo get real-time weather for Shanghai today, I recommend you check the weather app on your phone, visit the official website of the meteorological bureau, or use other reliable weather information sources. These channels usually provide the latest weather conditions, temperature, humidity, wind speed, and other detailed information."

5.6 Function calling

Capability Introduction

Function call is a feature that can connect large models with external tools or code. This feature can be used to enhance the inference effect of large models in application scenarios such as real-time data and data computation, or to perform other external operations, including tool-calling scenarios like information retrieval, database operations, graph search and processing, etc.

tools is an optional parameter in the model service API used to provide function definitions to the model. With this parameter, the model can generate function parameters that conform to the specifications provided by the user. Please note that the model service API does not actually execute any function calls. It only returns whether to call a function, the name of the function to be called, and the parameters required to call the function. Developers can use the parameters output by the model to further execute the function call in their system.

Supported Models

  • ernie-x1-turbo-32k
  • deepseek-r1
  • deepseek-v3

Call Step Description

  1. Define the function using JSON Schema format;
  2. Submit the defined function(s) to the model that supports function call via the tools parameter; multiple functions can be submitted at once;
  3. The model will decide which function to use, or not to use any function, based on the current chat context;
  4. If the model decides to use a function, it will return the parameters and information required to call the function in JSON format;
  5. Use the parameters output by the model to execute the corresponding function, and submit the execution result of this function to the model;
  6. The model will give the user a reply based on the function's execution result.

Example Code

from openai import OpenAI

client = OpenAI(
     api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
     base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA",
          },
          "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
        },
        "required": ["location"],
      },
    }
  }
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]
completion = client.chat.completions.create(
  model="deepseek-v3",
  messages=messages,
  tools=tools,
  tool_choice="auto"
)

print(completion)

5.7 Print Chain of Thought (Thinking Model)

Non-streaming

import os
from openai import OpenAI

client = OpenAI(
     api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
     base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

chat_completion = client.chat.completions.create(
    messages=[
        {'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
        {'role': 'user', 'content': 'Hello, please introduce AI Studio'}
    ],
    model="deepseek-r1",
)

print(chat_completion.choices[0].message.reasoning_content)
print(chat_completion.choices[0].message.content)

Streaming

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
    model="deepseek-r1",
    messages=[
        {'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
        {'role': 'user', 'content': 'Hello, please introduce AI Studio'}
    ],
    stream=True,
)

for chunk in completion:
    if (len(chunk.choices) > 0):
        if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
            print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
        else:
            print(chunk.choices[0].delta.content, end="", flush=True)

5.8 Multimodality

5.8.1 Multimodal - Text Input

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
        model="ernie-4.5-8k-preview",
        messages=[
            {
                'role': 'user', 'content': [
                    {
                        "type": "text",
                        "text": "Introduce a few famous attractions in Beijing"
                    }
                ]   
            }
        ]
    )

print(completion.choices[0].message.content or "")

5.8.2 Multimodal - Text Input - Streaming

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
        model="ernie-4.5-8k-preview",
        messages=[
            {
                'role': 'user', 'content': [
                    {
                        "type": "text",
                        "text": "Introduce a few famous attractions in Beijing"
                    }
                ]   
            }
        ]
    )

for chunk in completion:
    if (len(chunk.choices) > 0):
        print(chunk.choices[0].delta.content, end="", flush=True)

5.8.3 Multimodal - Image Input (url) - Streaming

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
        model="ernie-4.5-8k-preview",
        messages=[
            {
                'role': 'user', 'content': [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://bucket-demo-bj.bj.bcebos.com/pic/wuyuetian.png",
                        "detail": "high"
                        }
                    }
                ]   
            }
        ],
        stream=True,
    )

for chunk in completion:
    if (len(chunk.choices) > 0):
        print(chunk.choices[0].delta.content, end="", flush=True)

5.8.4 Multimodal - Image Input (base64) - Streaming

import os
from openai import OpenAI


def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")
        
# Path to your image
image_path = "/image_1.png"

# Getting the Base64 string
base64_image = encode_image(image_path)

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
        model="ernie-4.5-8k-preview",
        messages=[
            {
                'role': 'user', 'content': [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        }
                    }
                ]   
            }
        ],
        stream=True,
    )

for chunk in completion:
    if (len(chunk.choices) > 0):
        print(chunk.choices[0].delta.content, end="", flush=True)

5.8.5 Multimodal - Image + Text Input - Streaming

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
        model="ernie-4.5-8k-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                    "type": "text",
                    "text": "Which band is in the picture"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://bucket-demo-bj.bj.bcebos.com/pic/wuyuetian.png",
                        "detail": "high"
                    }
                }
                ]
            }
        ],
        stream=True,
    )

for chunk in completion:
    if (len(chunk.choices) > 0):
        print(chunk.choices[0].delta.content, end="", flush=True)

5.8.6 Multimodal - Video Understanding - Streaming

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("AI_STUDIO_API_KEY"),  # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
    base_url="https://aistudio.baidu.com/llm/lmapi/v3",  # aistudio LLM api service domain
)

completion = client.chat.completions.create(
    model="default",
    temperature=0.6,
    messages= [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this video"
                },
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://bucket-demo-01.gz.bcebos.com/video/sea.mov", 
                        "fps": 1 
                    }
                }
            ]
        }
    ],
    stream=True
)

for chunk in completion:
    if (len(chunk.choices) > 0):
        print(chunk.choices[0].delta.content, end="", flush=True)

Notes:

  • The large model is stateless with each call. You need to manage the information passed to the model yourself. If you need the model to understand the same image multiple times, please pass the image in every request.
  • Supports single and multiple images. Each image size should not exceed 10MB. The total tokens for multiple image inputs should not exceed the model's context length. For example, for the ERNIE-4.5 model, the image input should not exceed 8K tokens.
  • Image formats:
    a. Image base64: JPG, JPEG, PNG, and BMP types. The format passed must be: data:image/;base64, b. Public image url: Supports JPG, JPEG, PNG, BMP, and WEBP types

6. API Code Error Codes

HTTP Status Code Type Error Code Error Message
400 invalid_request_error malformed_json Invalid JSON
400 invalid_request_error invalid_model model is empty
400 invalid_request_error malformed_json Invalid Argument
400 invalid_request_error malformed_json 返回的具体错误信息
400 invalid_request_error invalid_messages 返回的具体错误信息
400 invalid_request_error characters_too_long the max input characters is xxx
400 invalid_request_error invalid_user_id user_id can not be empty
400 invalid_request_error tokens_too_long Prompt tokens too long
401 access_denied no_parameter_permission 返回的具体错误信息
401 invalid_request_error invalid_model No permission to use the model
401 invalid_request_error invalid_appid No permission to use the appid
401 invalid_request_error invalid_iam_token IAM Certification failed
403 unsafe_request system_unsafe the content of system field is invalid
403 unsafe_request user_setting_unsafe the content of user field is invalid
403 unsafe_request functions_unsafe the content of functions field is invalid
404 invalid_request_error no_such_model
405 invalid_request_error method_not_supported Only POST requests are accepted
429 rate_limit_exceeded rpm_rate_limit_exceeded Rate limit reached for RPM
429 rate_limit_exceeded tpm_rate_limit_exceeded Rate limit reached for TPM
429 rate_limit_exceeded preemptible_rate_limit_exceeded Rate limit reached for preemptible resource
429 rate_limit_exceeded user_rate_limit_exceeded qps request limit by APP ID reached
429 rate_limit_exceeded cluster_rate_limit_exceeded request limit by resouce cluster reached
500 Internal_error internal_error Internal error
500 Internal_error dispatch_internal_error Internal error


上一篇
大模型API
下一篇
工具