LLM_API
LLM API
The Badiu-AI Studio LLM API is a set of foundational LLM API services provided to developers, supported by the Baidu AI Cloud Qianfan Platform, and offers capabilities of LLMs such as ERNIE models. This LLM API service is compatible with the openai-python SDK, allowing developers to directly use the native openai-python SDK to call ERNIE and other LLM services.
Join the official free tutorial course now 《LLM API Service: From Service Calls to Application Practice》, get started easily in 2 minutes, and master large models.
1. Preparation
1.1 Access Token
Access Token is used for AI Studio user authentication. It allows executing specific operations specified by the authorization scope (such as LLM API call permissions, repository read access permissions, etc.) towards AI Studio via the access token. You can go to the Access Token Page in your personal center to view your exclusive access token.
1.2 Tokens
Tokens are the basic unit of measurement for calling large model SDKs or using large model applications on the Baidu AI Studio. AI Studio provides each developer with a free quota of 1 million Tokens. Developers will be charged a different number of Tokens for using different models. You can check the Token Management to view the usage details. If Tokens are used up, you can Buy Tokens before using again.
1.3 Service Domain
The domain address for the Badiu-AI Studio LLM API service is: https://aistudio.baidu.com/llm/lmapi/v3
When using openai-python to call the Badiu-AI Studio LLM API service, you need to set:
- Specify api_key = "Your Access Token"
- Specify base_url = "https://aistudio.baidu.com/llm/lmapi/v3"
2. Model List and Query
2.1 Text-to-Text Model List
| Model Name | model Parameter Value | Context Length (token) | Max Input (token) | Max Output (token) |
|---|---|---|---|---|
| (Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B | ernie-4.5-turbo-vl | 128k | 123k | [2, 12288] Default 2k |
| (Open Sourced on 6/30) ERNIE-4.5-300B-A47B | ernie-4.5-turbo-128k-preview | 128k | 123k | [2, 12288] Default 2k |
| (Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B | ernie-4.5-vl-28b-a3b | 128k | 123k | [2, 12288] Default 2k |
| (Open Sourced on 6/30) ERNIE-4.5-21B-A3B | ernie-4.5-21b-a3b | 128k | 120k | [2, 12288] Default 2k |
| (Open Sourced on 6/30) ERNIE-4.5-0.3B | ernie-4.5-0.3b | 128k | 120k | [2, 12288] Default 2k |
| DeepSeek-Chat | deepseek-v3 | 128k | 128k | [2, 12288] Default 2k |
| ERNIE 4.0 | ernie-4.0-8k | 8k | 5k | [2, 2048] Default 2k |
| ERNIE 4.0 Turbo | ernie-4.0-turbo-128k | 128k | 124k | [2, 4096] Default 4k |
| ERNIE 4.0 Turbo | ernie-4.0-turbo-8k | 8k | 5k | [2, 2048] Default 2k |
| ERNIE 3.5 | ernie-3.5-8k | 8k | 5k | [2, 2048] Default 2k |
| ERNIE Character | ernie-char-8k | 8k | 7k | [2, 2048] Default 1k |
| ERNIE Speed | ernie-speed-8k | 8k | 6k | [2, 2048] Default 1k |
| ERNIE Speed | ernie-speed-128k | 128k | 124k | [2, 4096] Default 4k |
| ERNIE Tiny | ernie-tiny-8k | 8k | 6k | [2, 2048] Default 1k |
| ERNIE Lite | ernie-lite-8k | 8k | 6k | [2, 2048] Default 1k |
| Kimi-K2 | kimi-k2-instruct | 128k | 128k | [1, 32768] Default 4k |
| Qwen3-Coder | qwen3-coder-30b-a3b-instruct | 128k | 128k | [1, 32768] Default 4k |
2.2 Thinking Model List
| Model Name | model Parameter Value | Context Length (token) | Max Input (token) | Max Output (token) |
Chain of Thought Length (token) |
|---|---|---|---|---|---|
| (Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B | ernie-4.5-turbo-vl | 128k | 123k | [2, 12288] Default 2k |
16k |
| (Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B | ernie-4.5-vl-28b-a3b | 128k | 123k | [2, 12288] Default 2k |
16k |
| ERNIE X1 Turbo | ernie-x1-turbo-32k | 32k | 24k | [2, 16384] Default 2k |
16k |
| DeepSeek-Reasoner | deepseek-r1-250528 | 96k | 64k | 16k Default 4k |
32k |
| DeepSeek-Reasoner | deepseek-r1 | 96k | 64k | 16k Default 4k |
32k |
2.3 Multimodal Model List
For multimodal model usage, please see section 5.8 of this document. (Added 2025/6/30: Video understanding call example, section 5.8.6)
| Model Name | model Parameter Value | Supported Modalities | Context Length (token) | Max Input (token) | Max Output (token) |
|---|---|---|---|---|---|
| (Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B | ernie-4.5-turbo-vl | Text, Image, Video | 128K | 123K | [2, 12288] |
| (Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B | ernie-4.5-vl-28b-a3b | Text, Image, Video | 128k | 123K | [2, 12288] |
| ERNIE 4.5 Turbo VL | ernie-4.5-turbo-vl-32k | Text, Image | 32k | 30k | [1, 8192] Default 4k |
2.4 Embedding Model List
| Model Name | model Parameter | Max Input Text Count | Context Length per Text (token) |
|---|---|---|---|
| Embedding-V1 | embedding-v1 | 1 | 384 |
| bge-large-zh | bge-large-zh | 16 | 512 |
2.5 Text-to-Image Model
| Model Name | Type |
|---|---|
| Stable-Diffusion-XL | Text-to-Image Model |
2.6 Feature Support
2025/6/30 Open Source Model List:
| Model Name | model Parameter Value | Supported Capabilities | Supported Modalities |
|---|---|---|---|
| (Open Sourced on 6/30) ERNIE-4.5-VL-424B-A47B | ernie-4.5-turbo-vl | Chat Model Thinking (Coming Soon) |
Text Image Video |
| (Open Sourced on 6/30) ERNIE-4.5-300B-A47B | ernie-4.5-turbo-128k-preview | Chat Model | Text |
| (Open Sourced on 6/30) ERNIE-4.5-VL-28B-A3B | ernie-4.5-vl-28b-a3b | Chat Model Thinking |
Text Image Video (Coming Soon) |
| (Open Sourced on 6/30) ERNIE-4.5-21B-A3B | ernie-4.5-21b-a3b | Chat Model | Text |
| (Open Sourced on 6/30) ERNIE-4.5-0.3B | ernie-4.5-0.3b | Chat Model | Text |
Web Search (Search Enhancement):
- ernie-4.5
- ernie-4.5-turbo
- ernie-4.0
- ernie-4.0-turbo
- ernie-3.5
- deepseek-r1
- deepseek-v3
function call:
- ernie-x1-turbo-32k
- deepseek-r1
- deepseek-v3
Structured Output:
- ernie-4.5
- ernie-4.0-turbo
- ernie-3.5
2.7 Query Model List
# Query the list of supported models
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio Access Token, [https://aistudio.baidu.com/account/accessToken](https://aistudio.baidu.com/account/accessToken),
base_url="[https://aistudio.baidu.com/llm/lmapi/v3](https://aistudio.baidu.com/llm/lmapi/v3)", # aistudio LLM api service domain
)
models = client.models.list()
for model in models.data:
print(model.id)3. Install Dependencies
# install from PyPI
pip install openai4. Basic Model Capability Usage
4.1 Text-to-Text
4.1.1 Model Use
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
chat_completion = client.chat.completions.create(
messages=[
{'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
{'role': 'user', 'content': 'Hello, please introduce AI Studio'}
],
model="ernie-3.5-8k",
)
print(chat_completion.choices[0].message.content)To avoid exposing the api_key in the code, you can use python-dotenv to add AI_STUDIO_API_KEY="YOUR_ACCESS_TOKEN" to your .env file. Of course, you can also specify it directly via api_key="YOUR_ACCESS_TOKEN"
4.1.2 Request Parameter Description
body Description
| Name | Type | Required | Description | Natively supported by openai-python |
|---|---|---|---|---|
| model | string | Yes | Model ID, available values can be obtained from client.models.list() |
Yes |
| messages | List | Yes | Chat context information. Description: (1) messages members cannot be empty. 1 member means a single-turn dialogue, multiple members mean a multi-turn dialogue, for example: 1 member example, "messages": [ {"role": "user","content": "Hello"}] 3 members example, "messages": [ {"role": "user","content": "Hello"},{"role":"assistant","content":"What help do you need"},{"role":"user","content":"Introduce yourself"}] (2) The last message is the current request information, the previous messages are historical dialogue information (3) Description of roles in messages: ① The role of the first message must be user or system ② The role of the last message must be user or tool. If it is ERNIE 4.5 or ERNIE-X1-32K-Preview, the role of the last message must be user ③ If function call is not used: · When the role of the first message is user, the role values need to be in the order user -> assistant -> user..., i.e., the role value of odd-numbered messages must be user or function, and the role value of even-numbered messages must be assistant, for example: in the example message, the role values are user, assistant, user, assistant, user; the role value of odd-numbered (red box) messages is user, i.e., the role value of the 1st, 3rd, 5th messages is user; the value of even-numbered (blue box) is assistant, i.e., the role value of the 2nd, 4th messages is assistant · When the role of the first message is system, the role values need to be in the order system -> user/function -> assistant -> user/function ... (4) The total length of content in messages cannot exceed the input character limit and input token limit of the corresponding model, please check Context Length Description for Each Model (5) If it is ERNIE 4.5, please refer to the following: consecutive user/assistant and starting message as assistant are not supported. The specific rules are as follows: · messages members cannot be empty, 1 member means single-turn dialogue, multiple members mean multi-turn dialogue; · The role of the first message must be user or system · The role of the last message must be user · After removing the first system role, the roles need to be in the order user -> assistant -> user ... |
Yes |
| stream | bool | No | Whether to return data in the form of a streaming interface, Description: (1) Can only be false for beam search models (2) Default is false |
Yes |
| temperature | float | No | Description: (1) Higher values make the output more random, while lower values make it more focused and deterministic (2) Default 0.95, range (0, 1.0], cannot be 0 (3) Not supported by the following models: · deepSeek-v3 · deepSeek-r1 · ernie-x1-32k-preview |
Yes |
| top_p | float | No | Description: (1) Affects the diversity of the output text. The larger the value, the stronger the diversity of the generated text (2) Default 0.7, value range [0, 1.0] (3) Not supported by the following models: · deepSeek-v3 · deepSeek-r1 · ernie-x1-32k-preview |
Yes |
| penalty_score | float | No | Reduces the phenomenon of repeated generation by adding penalties to already generated tokens. Description: (1) The larger the value, the greater the penalty (2) Default 1.0, value range: [1.0, 2.0] (3) Not supported by the following models: · deepSeek-v3 · deepSeek-r1 · ernie-x1-32k-preview |
No |
| max_completion_tokens | int | No | Specify the maximum number of output tokens for the model, Description: (1) Value range [2, 2048], please check the supported model list for specific model support |
Yes |
| response_format | string | No | Specify the format of the response content, Description: (1) Optional values: · json_object: return in json format, may not meet expectations · text: return in text format (2) If the parameter response_format is not filled, the default is text (3) Not supported by the following models: ernie-x1-32k-preview |
Yes |
| seed | int | No | Description: (1) Value range: (0, 2147483647), will be randomly generated by the model, default is empty (2) If specified, the system will make a best effort for deterministic sampling, so that repeated requests with the same seed and parameters return the same result (3) Not supported by the following models: ernie-x1-32k-preview |
Yes |
| stop | List | No | Generation stop identifier. When the model's generated result ends with an element in stop, text generation stops. Description: (1) Each element's length should not exceed 20 characters (2) At most 4 elements (3) Not supported by the following models: ernie-x1-32k-preview |
Yes |
| frequency_penalty | float | No | Description: (1) Positive values penalize new tokens based on their existing frequency in the text so far, reducing the likelihood of the model repeating the same line verbatim (2) Value range: [-2.0, 2.0] (3) Supported by the following models: ernie-speed-8k、ernie-speed-128k 、ernie-tiny-8k、ernie-char-8k、ernie-lite-8k |
Yes |
| presence_penalty | float | No | Description: (1) Positive values penalize new tokens based on whether they appear in the text so far, increasing the likelihood of the model talking about new topics (2) Value range: [-2.0, 2.0] (3) Supported by the following models: ernie-speed-8k、ernie-speed-128k 、ernie-tiny-8k、ernie-char-8k、ernie-lite-8k |
Yes |
| tools | List(Tool) | No | A list of descriptions of functions that can be triggered. For supported models, please refer to the supported model list in this document - whether function call is supported | Yes |
| tool_choice | string / tool_choice | No | Description: (1) For supported models, please refer to the supported model list in this document - whether function call is supported (2) string type, optional values are as follows: · none: The model is not expected to call any function, only generate user-facing text messages · auto: The model will automatically decide whether to call functions and which functions to call based on the input content · required: The model is expected to always call one or more functions (3) When it is of type tool_choice, it means prompting the large model to select a specified function in a function call scenario. The specified function name must exist in tools |
Yes |
| parallel_tool_calls | bool | No | Description: (1) For supported models, please refer to the supported model list in this document - whether function call is supported (2) Optional values: · true: means enable parallel function calling, enabled by default · false: means disable parallel function calling |
Yes |
| web_search | web_search | No | Search enhancement options, Description: (1) Default is off (not passed) (2) For supported models, please see the model list description above |
No |
message Description
| Name | Type | Required | Description |
|---|---|---|---|
| role | string | Yes | Currently supports the following:user: represents the userassistant: represents the dialogue assistantsystem: represents the persona |
| name | string | No | message name |
| content | string | Yes | Dialogue content, Description: (1) Cannot be empty (2) The content corresponding to the last message cannot be blank characters, such as spaces, "\n", "\r", "\f", etc. |
Tool's function Description
The function description in Tool is as follows
| Name | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Function name |
| description | string | No | Function description |
| parameters | object | No | Function request parameters, in JSON Schema format, refer to JSON Schema Description |
tool_choice Description
| Name | Type | Required | Description |
|---|---|---|---|
| type | string | Yes | Specify the tool type, fixed value function |
| function | function | Yes | Specify the function to use |
tool_choice's function Description
| Name | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Specify the name of the function to use |
web_search Description
| Name | Type | Description |
|---|---|---|
| enable | bool | Whether to enable the real-time search function, Description: (1) If real-time search is disabled, superscript and traceability information will not be returned (2) Optional values:· true: enable · false: disable, default false |
| enable_citation | bool | Whether to enable superscript return, Description: (1) Takes effect when enable is true (2) Optional values: · true: enable; if enabled, in scenarios where search enhancement is triggered, the response content will include superscripts and the corresponding search traceability information for the superscripts · false: not enabled, default false (3) If the retrieved content includes non-public webpages, superscripts will not be effective |
| enable_trace | bool | Whether to return search traceability information, Description: (1) Takes effect when enable is true. (2) Optional values: · true: return; if true, in scenarios where search enhancement is triggered, search traceability information search_results will be returned · false: do not return, default false (3) If the retrieved content is a non-public webpage, traceability information will not be returned even if search is triggered |
4.1.3 Response Parameter Description
| Name | Type | Description |
|---|---|---|
| id | string | Unique identifier for this request, can be used for troubleshooting |
| object | string | Packet type chat.completion: multi-turn dialogue return |
| created | int | Timestamp |
| model | string | Model ID |
| choices | object | Description: The returned content differs when the request parameter stream value is different |
| usage | usage | Token statistics, Description: (1) Returned by default for synchronous requests (2) The actual content will be returned in the last chunk, other chunks return null |
choices Description
| Name | Type | Description |
|---|---|---|
| index | int | Sequence number in the choice list |
| message | message | Response information, returned when stream=false |
| delta | delta | Response information, returned when stream=true |
| finish_reason | string | Output content identifier, Description: normal: The output content is completely generated by the large model, without triggering truncation or replacement stop: The output result was truncated after hitting a specified field in the input parameter stoplength: Reached the maximum number of tokenscontent_filter: The output content was truncated, defaulted, replaced with **, etc.function_call: The function call feature was invoked |
| flag | int | Security subdivision type, Description: (1) When stream=false, the meaning of the flag value is as follows:0 or not returned: Safe1: Low-risk unsafe scenario, conversation can continue2: Chat prohibited: Conversation not allowed to continue, but content can be displayed3: Display prohibited: Conversation not allowed to continue and content cannot be displayed on screen4: Screen retraction (2) When stream=true, a returned flag indicates security was triggered |
| ban_round | int | When flag is not 0, this field indicates which round of dialogue contains sensitive information; if it is the current question, ban_round = -1 |
choices's message Description
| Name | Type | Description |
|---|---|---|
| role | string | Currently supports the following:· user: represents the user· assistant: represents the dialogue assistant· system: represents the persona |
| name | string | message name |
| content | string | Dialogue content |
| tool_calls | List[ToolCall] | Function call, returned in the first round of dialogue in a function call scenario, passed as historical information in the message in the second round |
| tool_call_id | string | Description:(1) This field is required when role=tool(2) The function call id generated by the model, corresponding to tool_calls[].id in tool_calls(3) The caller should pass the real id generated by the model, otherwise the effect will be compromised |
| reasoning_content | string | Chain of thought content, Note: Only valid when the model is DeepSeek-R1 |
delta Description
| Name | Type | Description |
|---|---|---|
| content | string | Streaming response content |
| tool_calls | List[ToolCall] | Function calls generated by the model, including function name and call parameters |
ToolCall Description
| Name | Type | Description |
|---|---|---|
| id | string | Unique identifier for the function call, generated by the model |
| type | string | Fixed value function |
| function | function | Specific content of the function call |
ToolCall's function Description
| Name | Type | Description |
|---|---|---|
| name | string | Function name |
| arguments | string | Function arguments |
search_results Description
| Name | Type | Description |
|---|---|---|
| index | int | Sequence number |
| url | string | Search result URL |
| title | string | Search result title |
usage Description
| Name | Type | Description |
|---|---|---|
| prompt_tokens | int | Number of question tokens (including historical Q&A) |
| completion_tokens | int | Number of answer tokens |
| total_tokens | int | Total number of tokens |
4.2 Text-to-Image
4.2.1 Model Use
import os
from openai import OpenAI
import base64
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
# Generated image returned as a URL
images_url = client.images.generate(prompt="A white cat, red hat", model="Stable-Diffusion-XL", response_format="url")
print(images_url.data[0].url)
# Generated image returned as base64
images_base64 = client.images.generate(prompt="A black cat, blue hat", model="Stable-Diffusion-XL", response_format="b64_json")
# Save the generated images
for i, image in enumerate(images_base64.data):
with open("image_{}.png".format(i), "wb") as f:
f.write(base64.b64decode(image.b64_json))4.2.2 Request Parameter Description
| Name | Type | Required | Description | Natively supported by openai-python |
|---|---|---|---|---|
| model | string | Yes | Model ID, available values can be obtained from client.models.list() |
Yes |
| prompt | string | Yes | Prompt, i.e., the elements the user wants the image to contain. Description: Length limit 1024 characters, recommended total number of Chinese or English words not to exceed 150 | Yes |
| negative_prompt | string | No | Negative prompt, i.e., the elements the user does not want the image to contain. Description: Length limit 1024 characters, recommended total number of Chinese or English words not to exceed 150 | No |
| response_format | string | No | The format for the returned generated image. Must be one of url or b64_json. After image generation, the url is valid for 7 days. |
Yes |
| size | string | No | Generated image width and height, Description: (1) Default value 1024x1024 (2) Value range as follows: Suitable for avatars: ["768x768", "1024x1024", "1536x1536", "2048x2048"] Suitable for article illustrations: ["1024x768", "2048x1536"] Suitable for posters/flyers: ["768x1024", "1536x2048", "576x1024", "1152x2048"] Suitable for computer wallpapers: ["1024x576", "2048x1152"] | Yes |
| n | int | No | Number of images to generate, Description: (1) Default value is 1 (2) Value range is 1-4 (3) Generating many images at once or frequent requests may lead to request timeout | Yes |
| steps | int | No | Number of iterations, Description: Default value is 20 Value range is [10-50] | No |
| style | string | No | Generation style. Description: (1) Default value is Base (2) Optional values: Base: Basic style 3D Model: 3D Model Analog Film: Analog Film Anime: Anime Cinematic: Cinematic Comic Book: Comic Book Craft Clay: Craft Clay Digital Art: Digital Art Enhance: Enhance Fantasy Art: Fantasy Art Isometric: Isometric Line Art: Line Art Lowpoly: Lowpoly Neonpunk: Neonpunk Origami: Origami Photographic: Photographic Pixel Art: Pixel Art Texture: Texture | Yes |
| sampler_index | string | No | Sampling method, Description: (1) Default value: Euler a (2) Optional values as follows: Euler Euler a DPM++ 2M DPM++ 2M Karras LMS Karras DPM++ SDE DPM++ SDE Karras DPM2 a Karras Heun DPM++ 2M SDE DPM++ 2M SDE Karras DPM2 DPM2 Karras DPM2 a LMS | No |
| retry_count | int | No | Number of retries, default 1 | No |
| request_timeout | float | No | Request timeout, default 60 seconds | No |
| backoff_factor | float | No | Request retry parameter, used to specify the retry strategy, default is 0 | No |
| seed | integer | No | Random seed, Description:If not set, a random number is automatically generated Value range [0, 4294967295] | No |
| cfg_scale | float | No | Prompt relevance, Description: Default value is 5, value range 0-30 | No |
4.2.3 Model Response Description
| Name | Type | Description |
|---|---|---|
| created | int | Timestamp |
| data | list(image) | Generated image result |
image Description
| Name | Type | Description |
|---|---|---|
| b64_json | string | Image base64 encoded content, if and only if response_format=b64_json |
| url | string | Image URL, if and only if response_format=url |
| index | int | Sequence number |
4.3 Embeddings
4.3.1 Model Use
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
embeddings = client.embeddings.create(
model="embedding-v1",
input=[
"Recommend some food",
"Tell me a story"
]
)
print(embeddings)4.3.2 Request Parameter Description
| Name | Type | Required | Description |
|---|---|---|---|
| model | str | No | Model ID, available values can be obtained from client.models.list() |
| Input | List[str] | Yes | Input text, Description: (1) Cannot be an empty List, each member of the List cannot be an empty string (2) Number of texts cannot exceed 16 (3) Description: embedding-v1: Number of texts cannot exceed 16, each text's token count cannot exceed 384 and length cannot exceed 1000 characters bge-large-zh: Number of texts cannot exceed 16, each text's token count cannot exceed 512 and length cannot exceed 2000 characters |
4.3.3 Return Parameter Description
| Name | Type | Description |
|---|---|---|
| object | str | Packet type, fixed value "embedding_list" |
| data | List[EmbeddingData] | embedding information, number of data members matches the number of texts |
| usage | Usage | token statistics, token count = number of Chinese characters + number of words*1.3 (estimation logic only) |
EmbeddingData Description
| Name | Type | Description |
|---|---|---|
| object | str | Fixed value "embedding" |
| embedding | List[float] | embedding content |
| index | int | Sequence number |
Usage Description
| Name | Type | Description |
|---|---|---|
| prompt_tokens | int | Question tokens count |
| total_tokens | int | Total tokens count |
5.Model Extension Capability Usage
5.1 Multi-Turn Dialogue
import os
from openai import OpenAI
def get_response(messages):
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(model="ernie-3.5-8k", messages=messages)
return completion
messages = [
{
"role": "system",
"content": "You are an AI Studio developer assistant. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.",
}
]
assistant_output = "Hello, I am the AI Studio developer assistant. How can I help you?"
print(f"""Input: "End" to end the conversation\n""")
print(f"Model output: {assistant_output}\n")
user_input = ""
while "End" not in user_input:
user_input = input("Please enter: ")
# Add user's question to the messages list
messages.append({"role": "user", "content": user_input})
assistant_output = get_response(messages).choices[0].message.content
# Add the model's reply to the messages list
messages.append({"role": "assistant", "content": assistant_output})
print(f"Model output: {assistant_output}")
print("\n")5.2 Streaming Output
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="ernie-3.5-8k",
messages=[
{'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
{'role': 'user', 'content': 'Hello, please introduce AI Studio'}
],
stream=True,
)
for chunk in completion:
print(chunk.choices[0].delta.content or "", end="")5.3 Asynchronous Use
import os
from openai import AsyncOpenAI
import asyncio
client = AsyncOpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
async def main() -> None:
chat_completion = await client.chat.completions.create(
messages=[
{'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
{'role': 'user', 'content': 'Hello, please introduce AI Studio'}
],
model="ernie-3.5-8k",
)
print(chat_completion.choices[0].message.content)
asyncio.run(main())5.4 Search Enhancement
Usage Scenarios
For scenarios requiring real-time information or the latest data, such as news event queries, literature retrieval, and tracking policy changes. Based on web search capabilities, the model can obtain real-time data and information to answer user questions more accurately in specific scenarios.
How to Use
Add the following web_search parameters to the request body to enable web search. The parameter descriptions are as follows:
| Parameter Name | Type | Required | Default Value | Description |
|---|---|---|---|---|
| enable | boolean | No | No | Whether to enable the web search feature |
| enable_trace | boolean | No | false | Whether to return traceability information |
| enable_status | boolean | No | false | Whether to return a search trigger signal in the response. If search is triggered, the first packet returns 'Searching', and delta_tag:search_status indicates this packet is a signal packet |
| enable_citation | boolean | No | false | Whether to include citation source superscripts in the response. Single superscript format example: ^[1]^, multiple superscript format example: ^[1][2]^ |
| search_number | integer | No | 10 | Number of documents to retrieve, range is [1~28] |
| reference_number | integer | No | 10 | Number of documents used for the large model's summary, range is [1~28] (must be ≤ search_num) |
Parameter Example:
{
"web_search": {
"enable": true,
"enable_citation": true,
"enable_trace": true,
"enable_status": true,
"search_num": 10,
"reference_num": 5
}
}Supported Models:
- ernie-4.5
- ernie-4.5-turbo
- ernie-4.0
- ernie-4.0-turbo
- ernie-3.5
- deepseek-r1
- deepseek-v3
Code Example:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="ernie-4.0-turbo-8k",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Who is the men's singles table tennis champion of the 2024 Olympics"
}
]
}
],
extra_body={
"web_search": {
"enable": True,
"enable_trace": True
}
},
stream=True,
)
search_result = []
for chunk in completion:
if (len(chunk.choices) > 0):
if (hasattr(chunk, 'search_results')):
search_result.extend(chunk.search_results)
print(chunk.choices[0].delta.content, end="", flush=True)
unique_dict = {}
for item in search_result:
unique_dict[item["index"]] = item
print("\nReferences:\n")
for result in list(unique_dict.values()):
print(str(result["index"]) + ". " + result["title"] + ". " + result["url"] + "\n")5.5 Structured Output
Introduction
JSON is one of the most widely used formats for applications to exchange data in the world.
Structured output is a feature that ensures the model always generates a response that conforms to the JSON schema you provide, so users don't have to worry about the model omitting required keys or producing invalid enum values.
Some benefits of structured output include:
- Reliable type safety: No need to validate or retry improperly formatted responses
- Clear rejection: Model rejections based on safety can now be detected programmatically
- Simpler prompting: No need to use strongly-worded prompts to achieve consistent formatting
How to enable Control the generation of response content through the response_format field.
| Field | Data Type | Description |
|---|---|---|
| type | string | Specifies the format of the response content. Optional values: json_object: returns in json format, may not meet expectations; text: returns in text format, default is text; json_schema: returns in the format specified by json_schema |
| json_schema | object | json_schema format, please refer to JSON Schema description; this parameter is required when type is json_schema |
Supported Models
- ernie-4.5
- ernie-4.0-turbo
- ernie-3.5
Code Example
{
"model": "ernie-3.5-8k",
"messages": [
{
"role": "user",
"content": "Shanghai weather today"
}
],
"response_format": {
"type": "text" //Can be replaced with json_object, json_schema
}
}We can see that when the format setting is different, the returned content format changes:
- response_format not enabled
Since weather information is updated in real-time, I cannot directly provide the precise weather conditions for Shanghai today.\n\nTo get the latest Shanghai weather information, I recommend you check a weather forecast application, visit the official website of the meteorological bureau, or use other reliable weather information sources. These platforms usually provide detailed real-time weather data such as temperature, humidity, wind speed, precipitation probability, etc., as well as weather forecasts for the next few days.\n\nHope these suggestions are helpful to you!- response_format enabled
"{\n \"Shanghai today's weather\": \"Since I cannot obtain real-time weather information, I am unable to provide the exact weather conditions for Shanghai today.\"\n}\n\nTo get real-time weather for Shanghai today, I recommend you check the weather app on your phone, visit the official website of the meteorological bureau, or use other reliable weather information sources. These channels usually provide the latest weather conditions, temperature, humidity, wind speed, and other detailed information."5.6 Function calling
Capability Introduction
Function call is a feature that can connect large models with external tools or code. This feature can be used to enhance the inference effect of large models in application scenarios such as real-time data and data computation, or to perform other external operations, including tool-calling scenarios like information retrieval, database operations, graph search and processing, etc.
tools is an optional parameter in the model service API used to provide function definitions to the model. With this parameter, the model can generate function parameters that conform to the specifications provided by the user. Please note that the model service API does not actually execute any function calls. It only returns whether to call a function, the name of the function to be called, and the parameters required to call the function. Developers can use the parameters output by the model to further execute the function call in their system.
Supported Models
- ernie-x1-turbo-32k
- deepseek-r1
- deepseek-v3
Call Step Description
- Define the function using JSON Schema format;
- Submit the defined function(s) to the model that supports function call via the tools parameter; multiple functions can be submitted at once;
- The model will decide which function to use, or not to use any function, based on the current chat context;
- If the model decides to use a function, it will return the parameters and information required to call the function in JSON format;
- Use the parameters output by the model to execute the corresponding function, and submit the execution result of this function to the model;
- The model will give the user a reply based on the function's execution result.
Example Code
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
}
}
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]
completion = client.chat.completions.create(
model="deepseek-v3",
messages=messages,
tools=tools,
tool_choice="auto"
)
print(completion)5.7 Print Chain of Thought (Thinking Model)
Non-streaming
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
chat_completion = client.chat.completions.create(
messages=[
{'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
{'role': 'user', 'content': 'Hello, please introduce AI Studio'}
],
model="deepseek-r1",
)
print(chat_completion.choices[0].message.reasoning_content)
print(chat_completion.choices[0].message.content)Streaming
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="deepseek-r1",
messages=[
{'role': 'system', 'content': 'You are a developer assistant for the AI Studio training platform. You are proficient in development-related knowledge and responsible for providing developers with search-related help and suggestions.'},
{'role': 'user', 'content': 'Hello, please introduce AI Studio'}
],
stream=True,
)
for chunk in completion:
if (len(chunk.choices) > 0):
if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
else:
print(chunk.choices[0].delta.content, end="", flush=True)5.8 Multimodality
5.8.1 Multimodal - Text Input
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="ernie-4.5-8k-preview",
messages=[
{
'role': 'user', 'content': [
{
"type": "text",
"text": "Introduce a few famous attractions in Beijing"
}
]
}
]
)
print(completion.choices[0].message.content or "")5.8.2 Multimodal - Text Input - Streaming
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="ernie-4.5-8k-preview",
messages=[
{
'role': 'user', 'content': [
{
"type": "text",
"text": "Introduce a few famous attractions in Beijing"
}
]
}
]
)
for chunk in completion:
if (len(chunk.choices) > 0):
print(chunk.choices[0].delta.content, end="", flush=True)5.8.3 Multimodal - Image Input (url) - Streaming
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="ernie-4.5-8k-preview",
messages=[
{
'role': 'user', 'content': [
{
"type": "image_url",
"image_url": {
"url": "https://bucket-demo-bj.bj.bcebos.com/pic/wuyuetian.png",
"detail": "high"
}
}
]
}
],
stream=True,
)
for chunk in completion:
if (len(chunk.choices) > 0):
print(chunk.choices[0].delta.content, end="", flush=True)5.8.4 Multimodal - Image Input (base64) - Streaming
import os
from openai import OpenAI
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Path to your image
image_path = "/image_1.png"
# Getting the Base64 string
base64_image = encode_image(image_path)
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="ernie-4.5-8k-preview",
messages=[
{
'role': 'user', 'content': [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
stream=True,
)
for chunk in completion:
if (len(chunk.choices) > 0):
print(chunk.choices[0].delta.content, end="", flush=True)5.8.5 Multimodal - Image + Text Input - Streaming
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="ernie-4.5-8k-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Which band is in the picture"
},
{
"type": "image_url",
"image_url": {
"url": "https://bucket-demo-bj.bj.bcebos.com/pic/wuyuetian.png",
"detail": "high"
}
}
]
}
],
stream=True,
)
for chunk in completion:
if (len(chunk.choices) > 0):
print(chunk.choices[0].delta.content, end="", flush=True)5.8.6 Multimodal - Video Understanding - Streaming
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("AI_STUDIO_API_KEY"), # Environment variable containing AI Studio access token, https://aistudio.baidu.com/account/accessToken,
base_url="https://aistudio.baidu.com/llm/lmapi/v3", # aistudio LLM api service domain
)
completion = client.chat.completions.create(
model="default",
temperature=0.6,
messages= [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this video"
},
{
"type": "video_url",
"video_url": {
"url": "https://bucket-demo-01.gz.bcebos.com/video/sea.mov",
"fps": 1
}
}
]
}
],
stream=True
)
for chunk in completion:
if (len(chunk.choices) > 0):
print(chunk.choices[0].delta.content, end="", flush=True)Notes:
- The large model is stateless with each call. You need to manage the information passed to the model yourself. If you need the model to understand the same image multiple times, please pass the image in every request.
- Supports single and multiple images. Each image size should not exceed 10MB. The total tokens for multiple image inputs should not exceed the model's context length. For example, for the ERNIE-4.5 model, the image input should not exceed 8K tokens.
- Image formats:
a. Image base64: JPG, JPEG, PNG, and BMP types. The format passed must be: data:image/;base64, b. Public image url: Supports JPG, JPEG, PNG, BMP, and WEBP types
6. API Code Error Codes
| HTTP Status Code | Type | Error Code | Error Message |
|---|---|---|---|
| 400 | invalid_request_error | malformed_json | Invalid JSON |
| 400 | invalid_request_error | invalid_model | model is empty |
| 400 | invalid_request_error | malformed_json | Invalid Argument |
| 400 | invalid_request_error | malformed_json | 返回的具体错误信息 |
| 400 | invalid_request_error | invalid_messages | 返回的具体错误信息 |
| 400 | invalid_request_error | characters_too_long | the max input characters is xxx |
| 400 | invalid_request_error | invalid_user_id | user_id can not be empty |
| 400 | invalid_request_error | tokens_too_long | Prompt tokens too long |
| 401 | access_denied | no_parameter_permission | 返回的具体错误信息 |
| 401 | invalid_request_error | invalid_model | No permission to use the model |
| 401 | invalid_request_error | invalid_appid | No permission to use the appid |
| 401 | invalid_request_error | invalid_iam_token | IAM Certification failed |
| 403 | unsafe_request | system_unsafe | the content of system field is invalid |
| 403 | unsafe_request | user_setting_unsafe | the content of user field is invalid |
| 403 | unsafe_request | functions_unsafe | the content of functions field is invalid |
| 404 | invalid_request_error | no_such_model | |
| 405 | invalid_request_error | method_not_supported | Only POST requests are accepted |
| 429 | rate_limit_exceeded | rpm_rate_limit_exceeded | Rate limit reached for RPM |
| 429 | rate_limit_exceeded | tpm_rate_limit_exceeded | Rate limit reached for TPM |
| 429 | rate_limit_exceeded | preemptible_rate_limit_exceeded | Rate limit reached for preemptible resource |
| 429 | rate_limit_exceeded | user_rate_limit_exceeded | qps request limit by APP ID reached |
| 429 | rate_limit_exceeded | cluster_rate_limit_exceeded | request limit by resouce cluster reached |
| 500 | Internal_error | internal_error | Internal error |
| 500 | Internal_error | dispatch_internal_error | Internal error |
