资讯 文档
技术能力
语音技术
文字识别
人脸与人体
图像技术
语言与知识
视频技术

端到端语音语言大模型iOS SDK

1. 文档简介

1.1 文档说明

文档名称 端到端语音语言大模型集成文档
所属平台 iOS
提交日期 2025-05-15
概述 本文档是百度语音开放平台iOS SDK的用户指南,描述了端到端语音语言大模型相关接口的使用说明。
获取安装包
端到端语音语言大模型iOS SDK

1.2 申请试用

本接口处于邀测阶段,如需使用,请先提交合作咨询,或者提交工单,提供公司名称、Cloud ID、应用场景,工作人员协助开通权限后方可使用。

2. 开发准备工作

2.1 环境准备

名称 版本号
语音识别 3.3.2.3
系统支持 支持iOS 12.0及以上系统
架构支持 armv7、arm64、
开发环境 工程内使用了LTO等优化选项,建议使用最新版本Xcode进行开发
libBaiduSpeechSDK.a 端到端模型依赖静态库

2.2 SDK目录结构

image.png

2.3 SDK安装

  • 获取安装包:端到端语音语言大模型iOS SDK
  • 将BDSClientHeaders/ASR、BDSClientHeaders/TTS sdk头文件并添加libBaiduSpeechSDK.a静态库到您的项目中
  • framework依赖以及系统必要依赖:
Framework 描述
libc++.tbd 提供对C/C++特性支持
libz.1.2.5.tbd 提供gzip支持
libsqlite3.0.tbd 提供对本地数据库的支持
AudioToolbox 提供录音和播放支持
AVFoundation 提供录音和播放支持
CFNetwork 提供对网络访问的支持
CoreLocation 提供对获取设备地理位置的支持,以提高识别准确度
CoreTelephony 提供对移动网络类型判断的支持
SystemConfiguration 提供对网络状态检测的支持
GLKit 内置识别控件所需
BaiduBCEBasic 基于BaiduBCE文件下的模型内部依赖库
BaiduBCEBOS
BaiduBCESTS
BaiduBCEVOD
ZipArchive 合成tts功能依赖库
CuidSDK
// 参考demo中ASRViewController.mm类文件引入所需头文件
例:
#import "ASRViewController.h"
#import "BDSASRDefines.h"
#import "BDSASRParameters.h"
#import "BDSWakeupDefines.h"
#import "BDSWakeupParameters.h"
#import "BDSUploaderDefines.h"
#import "BDSUploaderParameters.h"
#import "BDSEventManager.h"
#import "BDVRSettings.h"
#import "fcntl.h"
#import "AudioInputStream.h"
#import "BDSSpeechSynthesizer.h"
#import "TTDFileReader.h"
#import "BDSCacheReferenceManager.h"
#import "ASRPlayTestConfigViewController.h"
#import "BDSOTADefines.h"
#import "BDSOTAParameters.h"
#import "BDSCompressionManager.h"
#include <sys/socket.h>
#include <netinet/in.h>
#include <errno.h>
#import <mach/mach.h>
#import <AVFoundation/AVAudioSession.h>

2.4 鉴权方式

2.4.1 access_token鉴权机制

  • 获取AK/SK

请参考通用参考 - 鉴权认证机制 | 百度AI开放平台 中的access_token鉴权机制获取AK/SK, 并得到AppID、API Key、Secret Key三个信息

  • 在初始化SDK时,传入认证信息。详细参数参考"初始化SDK"
// ... 其他参数已忽略
[self.asrEventManager setParameter:@"开放平台获取的 appid" forKey:BDS_ASR_APP_ID];    // appid字段
[self.asrEventManager setParameter:@"开放平台获取 apikey" forKey:BDS_ASR_API_KEY]; // apikey字段

2.4.2 API Key鉴权机制

注意: 邀测阶段暂时仅支持access_token鉴权机制

3. SDK集成

3.1 功能接口

SDK中主要的类和接口如下:

  • BDSEventManager:语音事件管理类,用于管理语音识别、语音合成等事件。
  • BDSSpeechSynthesizer: 语音合成类,用于管理语音合成和播放
  • BDSASRParameters.h、BDSSpeechSynthesizerParams.h: 包含语音识别、语音合成等参数的key常量
  • VoiceRecognitionClientWorkStatus(协议方法): 遵循语音识别协议,实现协议方法用于处理语音识别过程中产生的各种事件并响应完成回调。
  • BDSSpeechSynthesizerDelegate(协议): 遵循语音合成协议,通过协议方法处理语音合成过程中产生的各种事件

3.1.1 BDSEventManager:语音识别操作类

  • 方法列表

    • SDKInitial: 初始化SDK

      • 功能说明:在应用启动后执行一次,不可重复调用。
      • 输入参数

        • configVoiceRecognitionClient:初始化sdk,配置sdk参数

          • asr参数:通过setParameter..forkey 方法设置, 可用参数见下表:
事件参数 类型/值 是否必须 描述
BDS_ASR_PRODUCT_ID String 识别环境ID,邀测环境使用 4144779, 正式环境请联系技术支持获取
deviceID String 设备唯一id,通过getDeviceId方法获取(不允许手动传入)
PID String 识别环境 id,邀测环境使用 4144779, 正式环境请联系技术支持获取
APP_KEY String 识别环境 key,默认值:com.baidu.app
URL String 识别环境 url, 默认值https://audiotest.baidu.com/v2_llm_test
APP_ID String 开放平台创建应用后分配的鉴权信息,上线后请使用此参数填写鉴权信息。参考 "2.4 鉴权方式"
APP_API_KEY String 开放平台创建应用后分配的鉴权信息,上线后请使用此参数填写鉴权信息。参考 "2.4 鉴权方式"
BDS_ASR_COMPRESSION_TYPE String 必传,音频压缩类型, 默认值:EVR_AUDIO_COMPRESSION_OPUS
可选值:EVR_AUDIO_COMPRESSION_BV32 ,建议使用默认OPUS
TRIGGER_MODE int 1: 点按识别(iOS不区分点按或长按); 2: 唤醒后识别考"4.2 开发场景" 配置
ASR_ENABLE_MUTIPLY_SPEECH int 可选 是否启动全双工 默认双工
BDS_ASR_MODEL_VAD_DAT_FILE String vad 资源路径,"初始化环境"时拷贝的资源路径
LOG_LEVEL int 可选 设置日志等级,可选值:
EVRDebugLogLevelOff = 0, 默认
EVRDebugLogLevelFatal = 1,
EVRDebugLogLevelError = 2,
EVRDebugLogLevelWarning = 3,
EVRDebugLogLevelInformation = 4,
EVRDebugLogLevelDebug = 5,
EVRDebugLogLevelTrace = 6 全量
        * 识别回调:  VoiceRecognitionClientWorkStatus, 识别协议接口, 详情请参考VoiceRecognitionClientWorkStatus
* BDS_ASR_CMD_STOP: 停止语音识别
    * 功能说明:停止当前语音识别,并保留当前的识别结果,已经发送的音频会被正常处理。
    * 输入参数:无
    * 返回值: 无
* BDS_ASR_CMD_CANCEL: 取消语音识别
    * 功能说明:取消语音识别,并停止后续处理,已经发送未处理的音频不会继续处理。
    * 输入参数:无
    * 返回值: 无
* BDS_ASR_CMD_PAUSE: 暂停语音识别
    * 功能说明:暂停语音识别 // TODO: 待补充细节
    * 输入参数:无
    * 返回值: 无
* BDS_ASR_CMD_RESUME: 恢复语音识别
    * 功能说明:恢复语音识别 // TODO: 待补充细节
    * 输入参数:无
    * 返回值: 无
    

3.1.2 VoiceRecognitionClientWorkStatus:语音识别事件监听

用户需要实现此接口,在启动识别时传入。

  • 方法列表
  • (void)VoiceRecognitionClientWorkStatus:(int)workStatus obj:(id)aObj;

    • workStatus: 接收识别过程中产生的回调事件

      • 输入参数

        • workStatus: enum, 识别回调状态, 具体事件请参考下表
        • obj: id, 事件回调音频数据对象,需解析。
      • 返回值

识别回调状态 回调参数 类型 说明
EVoiceRecognitionClientWorkStatusStartWorkIng 枚举 准备就绪,可以说话,⼀般在收到此事件后通过UI通知⽤户可以说话了
EVoiceRecognitionClientWorkStatusStart 枚举 检测到开始说话
EVoiceRecognitionClientWorkStatusFirstPlaying 枚举 tts首包播放
EVoiceRecognitionClientWorkStatusEndPlaying 枚举 tts 播放完成
EVoiceRecognitionClientWorkStatusCancel 枚举 用户取消识别
EVoiceRecognitionClientWorkStatusFlushData 返回json字符串,格式为:
  • -sn: 识别语句id, 例如:"C2FF23BA-9894-4430-98E2-149C5FF61493_2"
  • -results_recognition:识别结果
枚举 语音识别中间结果返回
EVoiceRecognitionClientWorkStatusEnd 枚举 本地声音采集结束,等待识别结果返回并结束录音
EVoiceRecognitionClientWorkStatusNewRecordData 枚举 录音数据回调(对应识别中间结果)
EVoiceRecognitionClientWorkStatusChunkTTS 返回json字符串,格式为:
  • sn: 识别语句id, 例如:"C2FF23BA-9894-4430-98E2-149C5FF61493_2"
  • origin_result: 合成结果,内容是一个json,包含以下字段
    • tex: 音频对应的文本
  • audio_data: 音频二进制数据
枚举 TTS 数据返回
EVoiceRecognitionClientWorkStatusFinish 返回json字符串,格式为:
  • sn: 识别语句id, 例如:"C2FF23BA-9894-4430-98E2-149C5FF61493_2"
  • results_recognition:识别结果
枚举 识别完成,服务器返回结果

3.1.3 SpeechSynthesizer:语音合成操作类

  • 方法列表

    • [[BDSSpeechSynthesizer sharedInstance] setSynthesizerDelegate:self];:语音合成协议监听器, 需要实现BDSSpeechSynthesizerDelegate协议接口, 详情请参考SpeechSynthesizerListener

      • 功能说明:设置语音合成回调协议实现合成回调,
      • 返回值: 无
    • speakSentence:(NSString*)sentence withError:(NSError**)err 播放音频

      • 功能说明:合成播放音频接口
      • 输入参数:

        • sentence: NSString, 音频信息(包含本地文本txt、文字字符串)
        • withError:回调错误信息

3.1.4 BDSSpeechSynthesizerDelegate:语音合成协议监听

用户需要遵循协议,可以监听音频播放器的播放事件

  • 方法列表

@optional

  • (void)synthesizerNewDataArrived:(NSData *)newData DataFormat:(BDSAudioFormat)fmt characterCount:(int)newLength sentenceNumber:(NSInteger)SynthesizeSentence;
  • 功能说明:语音合成
  • 输入参数:
  • newData: NSData, 语音合成响应.
  • fmt:BDSAudioFormat 传递的缓冲区中的音频格式
  • newLength:int 当前句子的当前合成字符数
  • SynthesizeSentence:NSInteger 合成句子ID由SDK生成,并返回

其余协议方法可参考BDSSpeechSynthesizerDelegate.h头文件说明

3.2 集成步骤

3.2.1 初始化环境

包括几个步骤:

  • 将VAD模型文件和AEC模型文件拷贝到设备的目录下
[self configVADFile];
  - (void)configVADFile {
    NSString *vad_filepath = [[NSBundle mainBundle] pathForResource:@"BDSClientSDK.bundle/EASRResources/chuangxin.ota.v1.vad" ofType:@"pkg"];
    [self.asrEventManager setParameter:vad_filepath forKey:BDS_ASR_MODEL_VAD_DAT_FILE];
}
[self configAECVADDebug];
- (void)configAECVADDebug {
    [self.asrEventManager setParameter:@(YES) forKey:BDS_MIC_SAVE_AEC_DEBUG_FILE];
    [self.asrEventManager setParameter:@(YES) forKey:BDS_MIC_SAVE_VAD_DEBUG_FILE];
    [self.asrEventManager setParameter:@(YES) forKey:BDS_MIC_SAVE_WAKEUP_DEBUG_FILE];
}
  • 申请需要的权限

    • 需要的权限列表:
权限 说明 是否必须
Privacy - Microphone Usage Description 麦克风权限
Application supports indirect input events 支持硬件设备输入

权限参考info.plist image-2.png

3.2.2 实现事件回调函数

  • 语音识别回调:即实现协议接口, 示例代码如下:详细参数说明参考VoiceRecognitionClientWorkStatus
- (void)VoiceRecognitionClientWorkStatus:(int)workStatus obj:(id)aObj {
    switch (workStatus) {
        case EVoiceRecognitionClientWorkStatusNewEncodeData:
        {
            if (self.encodeFileHandler == nil) {
                self.encodeFileHandler = [self createFileHandleWithName:@"audio_encode" isAppend:NO];
            }
            [self.encodeFileHandler writeData:(NSData *)aObj];
            break;
        }
            // 录音数据回调
        case EVoiceRecognitionClientWorkStatusNewRecordData: {
            if (!self.isPlayTesting) {
                [self.fileHandler writeData:(NSData *)aObj];
            }
            break;
        }
            // 识别工作开始,开始采集及处理数据
        case EVoiceRecognitionClientWorkStatusStartWorkIng: {
            if (!self.isPlayTesting) {
                if (self.currentPlayTones & ASRVoiceRecognitionPlayTonesTypeStart) {
                    [self playTone:@"record_start.pcm"];
                }
                
                NSDictionary *logDic = [self parseLogToDic:aObj];
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: start vr, log: %@\n", logDic]];
                [self onStartWorking];
            }
            break;
        }
        // 检测到用户开始说话 
        case EVoiceRecognitionClientWorkStatusStart: {
            if (!self.isPlayTesting) {
                NSLog(@"====EVoiceRecognitionClientWorkStatusStart: %@", aObj);
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: detect voice start point, sn: %@.\n", aObj]];
            }
            
            [self cancelTTSWithSN:cTTS_ASR_SN];
            break;
        }
        // 本地声音采集结束,等待识别结果返回并结束录音
        case EVoiceRecognitionClientWorkStatusEnd: {
            if (!self.isPlayTesting) {
                if (self.currentPlayTones & ASRVoiceRecognitionPlayTonesTypeEnd) {
                    [self playTone:@"record_end.pcm"];
                }
                
                NSLog(@"====EVoiceRecognitionClientWorkStatusEnd: %@", aObj);
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: detect voice end point, sn: %@.\n", aObj]];
            }
            break;
        }
        // 连续上屏
        case EVoiceRecognitionClientWorkStatusFlushData: {
            if (!self.isPlayTesting) {
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: partial result - %@.\n\n", [self getDescriptionForDic:aObj]]];
            }
            break;
        }
        // 语音识别功能完成,服务器返回正确结果
        case EVoiceRecognitionClientWorkStatusFinish: {
            if (!self.isPlayTesting) {
                if (self.currentPlayTones & ASRVoiceRecognitionPlayTonesTypeSuccess) {
                    [self playTone:@"record_success.pcm"];
                }
                
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: final result - %@.\n\n", [self getDescriptionForDic:aObj]]];
                if (aObj) {
                    self.resultTextView.text = [self getDescriptionForDic:aObj];
                }
                if (!self.longSpeechFlag && !(self.enableMutiplySpeech || self.audioFileType == AudioFileType_MutiplySpeech_ASR || self.audioFileType == AudioFileType_MutiplySpeech_Wakeup)) {
                    [self onEnd];
                }
            } else {
                // 播测全双工场景,20s有finalResult则不停止
                self.playTestShouldStop = NO;
                if (self.playTestSceneType == ASRPlayTestSceneType_BaseLine) { // 基线播测:FinalResult里播报一句query
                    [self stopTTS];
                    [[BDSSpeechSynthesizer sharedInstance] speakSentence:@"北京限行尾号 1、6" withError:nil];
                }
            }
            break;
        }
        // 当前音量回调
        case EVoiceRecognitionClientWorkStatusMeterLevel: {
            if (!self.isPlayTesting) {
                //            [self printLogTextView:[NSString stringWithFormat:@"Current Level: %d\n", [aObj intValue]]];
            }
            break;
        }
        // 用户取消
        case EVoiceRecognitionClientWorkStatusCancel: {
            if (!self.isPlayTesting) {
                if (self.currentPlayTones & ASRVoiceRecognitionPlayTonesTypeCancel) {
                    [self playTone:@"record_cancel.pcm"];
                }
                
                [self printLogTextView:@"CALLBACK: user press cancel.\n"];
                [self onEnd];
                if (self.shouldStartASR) {
                    self.shouldStartASR = NO;
                    [self startWakeup2ASR];
                }
                
                if (self.randomStress) {
                   [self wp_rec_randomStress];
                } else if (self.randomStressASR || self.randomStressMutiplyASR) {
                    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, NSEC_PER_SEC * 0.5), dispatch_get_main_queue(), ^{
                        [self voiceRecogButtonHelper:EVR_TRIGGER_MODE_CLICK];
                    });
                }
            } else {
                if (self.shouldStartASR) {
                    self.shouldStartASR = NO;
                    [self startWakeup2ASR];
                }
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusError: {
            if (!self.isPlayTesting) {
                if (self.currentPlayTones & ASRVoiceRecognitionPlayTonesTypeFail) {
                    [self playTone:@"record_fail.pcm"];
                }
                
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: encount error - %@.\n", (NSError *)aObj]];
                int errorcode = (int)[(NSError *)aObj code];
                // ota load engine error 需要重新调用""
                // [self.otaEventManger sendCommand:BDS_OTA_INIT withParameters:otaDic];
                // 用法详见:confOTATest 方法
                if (errorcode == EVRClientErrorCodeLoadEngineError) {
                    dispatch_async(dispatch_get_global_queue(0, 0), ^{
                        [self.asrEventManager sendCommand:BDS_ASR_CMD_STOP]; // 先停止ASR引擎
                        [self confOTATest]; // 然后调用:BDS_OTA_INIT
                        [self.asrEventManager sendCommand:BDS_ASR_CMD_START]; //然后重启ASR引擎
                    });
                }
                // asr engine is busy.
                if (errorcode == EVRClientErrorCodeASRIsBusy) {
                    if (self.asrEventManager.isAsrRunning) {
                        [self onEnd];
                    } else {
                        [self onStartWorking];
                    }
                    return;
                }

                if (!(self.enableMutiplySpeech
                      || self.audioFileType == AudioFileType_MutiplySpeech_ASR
                      || self.audioFileType == AudioFileType_MutiplySpeech_Wakeup) ||
                    (self.enableMutiplySpeech && (errorcode == EVRClientErrorCodeDecoderNetworkUnavailable
                     || errorcode == EVRClientErrorCodeRecoderNoPermission
                     || errorcode == EVRClientErrorCodeRecoderException
                     || errorcode == EVRClientErrorCodeRecoderUnAvailable
                     || errorcode == EVRClientErrorCodeInterruption
                     || errorcode == EVRClientErrorCodeCommonPropertyListInvalid))) { // 非全双工 || 全双工内部cancel
                    [self onEnd];
                    if (self.randomStressASR) {
                        dispatch_after(dispatch_time(DISPATCH_TIME_NOW, NSEC_PER_SEC * 0.5), dispatch_get_main_queue(), ^{
                            [self voiceRecogButtonHelper:EVR_TRIGGER_MODE_CLICK];
                        });
                    }
                }
                if (self.randomStress) {
                    [self wp_rec_randomStress];
                }
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusLoaded: {
            if (!self.isPlayTesting) {
                [self printLogTextView:@"CALLBACK: offline engine loaded.\n"];
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusUnLoaded: {
            if (!self.isPlayTesting) {
                [self printLogTextView:@"CALLBACK: offline engine unLoaded.\n"];
            }
            break;
        }
            /*#SECTION_REMOVE_FROM:OPEN_PLATFORM*/
        case EVoiceRecognitionClientWorkStatusChunkThirdData: {
            if (!self.isPlayTesting) {
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: Chunk 3-party data length: %lu\n", (unsigned long)[(NSData *)aObj length]]];
                if (self.thirdTtsTest) {
                    if (self.isThirdFirst) {
                        self.isThirdFirst = NO;
                    } else {
                        self.isThirdFirst = YES;
                        [self stopTTS];
                        [self startThirdSynthesize];
                    }
                }
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusChunkNlu: {
            if (!self.isPlayTesting) {
                NSString *nlu = [[NSString alloc] initWithData:(NSData *)aObj encoding:NSUTF8StringEncoding];
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: Chunk NLU data: %@\n", nlu]];
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusChunkEnd: {
            if (!self.isPlayTesting) {
                NSLog(@"====EVoiceRecognitionClientWorkStatusChunkEnd");
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: Chunk end, sn: %@.\n", aObj]];
                if (!self.longSpeechFlag && !(self.enableMutiplySpeech || self.audioFileType == AudioFileType_MutiplySpeech_ASR || self.audioFileType == AudioFileType_MutiplySpeech_Wakeup)) {
                    [self onEnd];
                }
                if (self.randomStress) {
                   [self wp_rec_randomStress];
                } else if (self.randomStressASR) {
                    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, NSEC_PER_SEC * 0.5), dispatch_get_main_queue(), ^{
                        [self voiceRecogButtonHelper:EVR_TRIGGER_MODE_CLICK];
                    });
                }
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusFeedback: {
            if (!self.isPlayTesting) {
                NSDictionary *logDic = [self parseLogToDic:aObj];
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK Feedback: %@\n", logDic]];
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusRecorderEnd: {
            if (!self.isPlayTesting) {
                [self printLogTextView:@"CALLBACK: recorder closed.\n"];
                if (self.audioFileType == AudioFileType_MutiplySpeech_ASR || self.audioFileType == AudioFileType_ASR) {
                    [self asrGuanceRecorderClosed];
                }
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusLongSpeechEnd: {
            if (!self.isPlayTesting) {
                [self printLogTextView:@"CALLBACK: Long Speech end.\n"];
                [self onEnd];
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusStop: {
            if (!self.isPlayTesting) {
                [self printLogTextView:@"CALLBACK: user press stop.\n"];
                [self onEnd];
                
                if (self.randomStress) {
                   [self wp_rec_randomStress];
                } else if (self.randomStressASR || self.randomStressMutiplyASR) {
                    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, NSEC_PER_SEC * 0.5), dispatch_get_main_queue(), ^{
                        [self voiceRecogButtonHelper:EVR_TRIGGER_MODE_CLICK];
                    });
                }
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusChunkTTS: {
            if (!self.isPlayTesting) {
                NSDictionary *dict = (NSDictionary *)aObj;
                if (dict) {
                    NSDictionary *ttsResult = [dict objectForKey:@"origin_result"];
                    if (ttsResult && [ttsResult isKindOfClass:[NSDictionary class]]) {
                        NSString *txt = [ttsResult objectForKey:@"tex"];
                        if (txt.length) {
                            [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: ChunkTTS %@.\n", txt]];
                        }
                        
//                        [self playByAudioInfo:dict];
                    }
//                    NSData *ttsData = [dict objectForKey:@"audio_data"];
                    cTTS_ASR_SN = [dict objectForKey:BDS_ASR_CANCEL_TTS_SN_A];
                }
                NSLog(@"Chunk TTS data: %@", [dict description]);
            }
            break;
        }
        case EVoiceRecognitionClientWorkStatusRecorderPermission: {
            if (!self.isPlayTesting) {
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: recorder permisson - %@.\n", [aObj objectForKey:BDS_ASR_RESP_RECORDER_PERMISSION]]];
            }
            break;
        }
            /*#SECTION_REMOVE_END*/
        default:
            break;
    }
}
  • 语音合成回调:即实现BDSSpeechSynthesizerDelegate协议接口, 示例代码如下:详细参数说明参考BDSSpeechSynthesizerDelegate
#pragma mark - implement BDSSpeechSynthesizerDelegate
- (void)synthesizerStartWorkingSentence:(NSInteger)SynthesizeSentence{
    NSLog(@"Did start synth %ld", SynthesizeSentence);
    [self.CancelButton setEnabled:YES];
    [self.PauseOrResumeButton setEnabled:YES];
}

- (void)synthesizerFinishWorkingSentence:(NSInteger)SynthesizeSentence engineType:(BDSSynthesizerEngineType)type {
    NSLog(@"Did finish synth: engineType:, %ld, %d", SynthesizeSentence, type);
    if(!isSpeak){
        if(self.synthesisTexts.count > 0 &&
           SynthesizeSentence == [[[self.synthesisTexts objectAtIndex:0] objectForKey:@"ID"] integerValue]){
            [self.synthesisTexts removeObjectAtIndex:0];
            [self updateSynthProgress];
        }
        else{
            NSLog(@"Sentence ID mismatch??? received ID: %ld\nKnown sentences:", (long)SynthesizeSentence);
            for(NSDictionary* dict in self.synthesisTexts){
                NSLog(@"ID: %ld Text:\"%@\"", [[dict objectForKey:@"ID"] integerValue], [((NSAttributedString*)[dict objectForKey:@"TEXT"]) string]);
            }
        }
        if(self.synthesisTexts.count == 0){
//            [self.CancelButton setEnabled:NO];
//            [self.PauseOrResumeButton setEnabled:NO];
            [self.PauseOrResumeButton setTitle:[[NSBundle mainBundle] localizedStringForKey:@"pause" value:@"" table:@"Localizable"] forState:UIControlStateNormal];
        }
    }
}

- (void)synthesizerSpeechStartSentence:(NSInteger)SpeakSentence{
    NSLog(@"Did start speak %ld", SpeakSentence);
}

- (void)synthesizerSpeechEndSentence:(NSInteger)SpeakSentence{
    NSLog(@"Did end speak %ld", SpeakSentence);
    NSString *docDir = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) firstObject];
    NSString *path = [docDir stringByAppendingPathComponent:[NSString stringWithFormat:@"test_speed_9_onnn.pcm", (long)SpeakSentence]];
    [self.playerData writeToFile:path atomically:YES];
    
    self.playerData = [[NSMutableData alloc] init];
    
    if(self.synthesisTexts.count > 0 &&
       SpeakSentence == [[[self.synthesisTexts objectAtIndex:0] objectForKey:@"ID"] integerValue]){
        [self.synthesisTexts removeObjectAtIndex:0];
        [self updateSynthProgress];
    }
    else{
        NSLog(@"Sentence ID mismatch??? received ID: %ld\nKnown sentences:", (long)SpeakSentence);
        for(NSDictionary* dict in self.synthesisTexts){
            NSLog(@"ID: %ld Text:\"%@\"", [[dict objectForKey:@"ID"] integerValue], [((NSAttributedString*)[dict objectForKey:@"TEXT"]) string]);
        }
    }
    if(self.synthesisTexts.count == 0){
//        [self.CancelButton setEnabled:NO];
//        [self.PauseOrResumeButton setEnabled:NO];
        [self.PauseOrResumeButton setTitle:[[NSBundle mainBundle] localizedStringForKey:@"pause" value:@"" table:@"Localizable"] forState:UIControlStateNormal];
    }
    
    [self SynthesizeTapped:nil];
}

- (void)synthesizerNewDataArrived:(NSData *)newData
                       DataFormat:(BDSAudioFormat)fmt
                   characterCount:(int)newLength
                   sentenceNumber:(NSInteger)SynthesizeSentence{
    NSLog(@"NewDataArrived data len: %lu", (unsigned long)[self.playerData length]);
    [self.playerData appendData:newData];
    
    
    NSMutableDictionary* sentenceDict = nil;
    for(NSMutableDictionary *dict in self.synthesisTexts){
        if([[dict objectForKey:@"ID"] integerValue] == SynthesizeSentence){
            sentenceDict = dict;
            break;
        }
    }
    if(sentenceDict == nil){
        NSLog(@"Sentence ID mismatch??? received ID: %ld\nKnown sentences:", (long)SynthesizeSentence);
        for(NSDictionary* dict in self.synthesisTexts){
            NSLog(@"ID: %ld Text:\"%@\"", [[dict objectForKey:@"ID"] integerValue], [((NSAttributedString*)[dict objectForKey:@"TEXT"]) string]);
        }
        return;
    }
    [sentenceDict setObject:[NSNumber numberWithInteger:newLength] forKey:@"SYNTH_LEN"];
    [self refreshAfterProgressUpdate:sentenceDict];
}

- (void)synthesizerTextSpeakLengthChanged:(int)newLength
                           sentenceNumber:(NSInteger)SpeakSentence{
    NSLog(@"SpeakLen %ld, %d", SpeakSentence, newLength);
    NSMutableDictionary* sentenceDict = nil;
    for(NSMutableDictionary *dict in self.synthesisTexts){
        if([[dict objectForKey:@"ID"] integerValue] == SpeakSentence){
            sentenceDict = dict;
            break;
        }
    }
    if(sentenceDict == nil){
        NSLog(@"Sentence ID mismatch??? received ID: %ld\nKnown sentences:", (long)SpeakSentence);
        for(NSDictionary* dict in self.synthesisTexts){
            NSLog(@"ID: %ld Text:\"%@\"", [[dict objectForKey:@"ID"] integerValue], [((NSAttributedString*)[dict objectForKey:@"TEXT"]) string]);
        }
        return;
    }
    [sentenceDict setObject:[NSNumber numberWithInteger:newLength] forKey:@"SPEAK_LEN"];
    [self refreshAfterProgressUpdate:sentenceDict];
}

3.2.3 初始化SDK

  • 示例代码
// 初始化日志配置 默认0 不打印日志 6是打印全部日志
[self.asrEventManager setParameter:@(EVRDebugLogLevelTrace) forKey:BDS_ASR_DEBUG_LOG_LEVEL];
// 环境信息设置:
// 识别环境 - pid. 邀测环境PID统一为4144779,仅供体验使用。生产环境使用请联系技术支持获取专属PID
[self.asrEventManager setParameter:@"4144779" forKey:BDS_ASR_PRODUCT_ID];
// 识别环境 -key, 
[self.asrEventManager setParameter:@"com.baidu.app" forKey:BDS_ASR_CHUNK_KEY];
// 识别环境 - url, 
[self.asrEventManager setParameter:@"https://audiotest.baidu.com/v2_llm_test" forKey:BDS_ASR_SERVER_URL];

 NSString *deviceId = [self.asrEventManager getDeviceId];
 // 获取uuid
 NSUUID *vendorUUID = [[UIDevice currentDevice] identifierForVendor];
 NSString *uuidString = [vendorUUID UUIDString];
 [self.asrEventManager setParameter:uuidString forKey:BDS_ASR_CUID];
 NSDictionary *extraDic = @{
        @"params": @{
            @"sessionId": uuidString       // 随机生成 session id
        }
    };
[self.asrEventManager setChunkPamWithBdvsID:@"6973568c17a9d84040fe2abe365eb5e3_refactor"            // 设置 bdvs 协议,其中 extraDic 为协议内容
                                       deviceID:deviceId
                                       extraDic:extraDic
                                       queryText:nil];
  NSString *pam = {...//参考demo传递参数具体内容}
 [self.asrEventManager setParameter:pam ?: @"" forKey:BDS_ASR_CHUNK_PARAM];

3.2.4 开始识别

// 开始识别阶段参数设置
// 设置VAD/TriggerMode/双工,请参考**开发场景**根据您的业务需求设置。下面示例为双工识别场景。
// 开启VAD
[self.asrEventManager setParameter:YES forKey:BDS_ASR_VAD_ENABLE_LONG_PRESS];
// 是否启动双工
[self.asrEventManager setParameter:@(0) forKey:BDS_ASR_ENABLE_MUTIPLY_SPEECH];
// trigger 1: 点按识别; 2: 唤醒后识别
[self.asrEventManager setParameter:@(EVR_TRIGGER_MODE_WAKEUP) forKey:BDS_ASR_TRIGGE_MODE];
// 必选:设置音频压缩类型,
[self.asrEventManager setParameter:@(EVR_AUDIO_COMPRESSION_OPUS) forKey:BDS_ASR_COMPRESSION_TYPE];
// 是否返回识别中间结果 调用开始识别接口
[self.asrEventManager setParameter:YES forKey:BDS_ASR_COMPRESSION_TYPE];
// 调用BDS_ASR_CMD_START 开始识别
[self.asrEventManager sendCommand:BDS_ASR_CMD_START];

3.2.5 停止识别

停止识别,但保留当前识别结果。已经发送的音频会正常识别并生成响应音频。

  • 示例代码
[self.asrEventManager sendCommand:BDS_ASR_CMD_STOP];

3.2.6 取消识别

取消识别,并停止后续处理。已经发送但是还没有识别和响应的数据将会丢弃。

  • 示例代码
[self.asrEventManager sendCommand:BDS_ASR_CMD_CANCEL];

3.2.7 暂停识别

  • 示例代码
NSMutableDictionary *params = [NSMutableDictionary dictionaryWithCapacity:2];
// _lastAsrsn为语音识别对应的sn,需要通过VoiceRecognitionClientWorkStatus的回调函数获取。参考下面的代码。
[params setValue:_lastAsrsn forKey:BDS_ASR_SN];
[self.asrEventManager sendCommand:BDS_ASR_CMD_PAUSE withParameters:params];

// 在回调函数中获取sn
- (void)VoiceRecognitionClientWorkStatus:(int)workStatus obj:(id)aObj {
    // 回调obj做解析处理 拿到sn
    NSDictionary *dict = (NSDictionary *)aObj;
    if (dict) {
        NSDictionary *ttsResult = [dict objectForKey:@"origin_result"];
        if (ttsResult && [ttsResult isKindOfClass:[NSDictionary class]]) {
            NSString *txt = [ttsResult objectForKey:@"tex"];
            if (txt.length) {
                [self printLogTextView:[NSString stringWithFormat:@"CALLBACK: ChunkTTS %@.\n", txt]];
            }
            // 此处获取sn
        cTTS_ASR_SN = [dict objectForKey:BDS_ASR_CANCEL_TTS_SN_A];
    }
}

3.2.8 恢复识别

  • 示例代码
[self.asrEventManager sendCommand:BDS_ASR_CMD_RESUME];

3.3 开发场景

SDK支持三种开发场景:

  • 双工识别:启动后可以多次进行语音对话,直到用户主动停止识别
  • 点按短语音识别:启动后进行60s内的单个短句对话。 // TODO: 补充详细描述
  • 长按识别:启动后按住按钮不松开会持续识别,且不会进行断句

3.3.1 双工识别

  • 设置方法
- (IBAction)voiceRecogButtonTouchDown:(id)sender {
    // 双工模式识别
    [self.asrEventManager setParameter:@(1) forKey:BDS_ASR_ENABLE_MUTIPLY_SPEECH];
    // 默认点击方式触发  1.点击方式触发 2.唤醒方式触发 
    [self.asrEventManager setParameter:@(EVR_TRIGGER_MODE_CLICK) forKey:BDS_ASR_TRIGGE_MODE];
}
  • 交互流程

mermaid-2025515 203805.png

3.3.2 点按短语音识别

  • 设置方法
// .... 其他参数已忽略
    // 点按 
    self.longPressFlag = NO;
    // 双工模式识别
    [self.asrEventManager setParameter:@(1) forKey:BDS_ASR_ENABLE_MUTIPLY_SPEECH];
    // 默认点击方式触发  1.点击方式触发 2.唤醒方式触发
    [self.asrEventManager setParameter:@(EVR_TRIGGER_MODE_CLICK) forKey:BDS_ASR_TRIGGE_MODE];
  • 交互流程

mermaid-2025515 205100.png

3.3.3 长按识别

  • 设置方法
// .... 其他参数已忽略
    // 点按 
    self.longPressFlag = YES;
    // 双工模式识别
    [self.asrEventManager setParameter:@(1) forKey:BDS_ASR_ENABLE_MUTIPLY_SPEECH];
    // 默认点击方式触发  1.点击方式触发 2.唤醒方式触发
    [self.asrEventManager setParameter:@(EVR_TRIGGER_MODE_CLICK) forKey:BDS_ASR_TRIGGE_MODE];
  • 交互流程

mermaid-2025515 204026.png

3.4 SDK错误码

错误码 含义
655361 录音设备异常
655362 无录音权限
655363 录音设备不可用
655364 录音中断
655365 大量空白音频
1310722 用户未说话
1966082 网络不可用
1966084 解析url失败
2031617 请求超时
2031620 本地网络联接出错
2225211 服务端交互发生错误
2225213 等待上行流建立超时
2225218 语音过长
2225219 声音不符合识别要求
2225221 没有找到匹配结果
2225223 协议参数错误
2625535 识别繁忙
上一篇
端到端语音语言大模型Android SDK
下一篇
API文档