失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 实时实时语音识别(websocket)接入-腾讯云

实时实时语音识别(websocket)接入-腾讯云

时间:2022-06-24 07:32:08

相关推荐

实时实时语音识别(websocket)接入-腾讯云

预期结果:实时语音文字识别

三方功能支持:腾讯云语音识别

对接要求:在识别过程中,客户端持续上传 binary message 到后台,内容为音频流二进制数据。建议每40ms 发送40ms 时长(即1:1实时率)的数据包,对应 pcm 大小为:8k 采样率640字节,16k 采样率1280字节。音频发送速率过快超过1:1实时率或者音频数据包之间发送间隔超过6秒,可能导致引擎出错,后台将返回错误并主动断开连接。

音频流上传完成之后,客户端需发送以下内容的 text message,通知后台结束识别。

拼装请求url:

传入参数:engine_model_type【引擎模型类型】16k_zh_dialect:多方言,

expired【签名的有效期截止时间,UNIX 时间戳,单位为秒】System.currentTimeMillis() / 1000L + 86400L,

needvad【语音分片长】1:开启

nonce【随机正整数】RandomUtil.randomInt(1000, 99999);

timestamp【当前 UNIX 时间戳,单位为秒】,

secretid【密钥】

voice_format【语音编码方式】1:pcm

voice_id【音频流识别全局唯一标识】AsrUtils.getVoiceId(asrConfig.getAppId())

signature【接口签名参数】

举例:wss://asr./asr/v2/1256841545?engine_model_type=16k_zh_dialect&expired=1685527216&needvad=1&nonce=37769&secretid=AKIDkHZbOtm7qKYu1ktrY0D9k6E6hfPdFIkx&timestamp=1685440816&voice_format=8&voice_id=1256841545_1685440816950_byc53&signature=Qv1YDDYCP7skMsASStxFuAVMa0w=

签名生成:

1、对除 signature 之外的所有参数按字典序进行排序,拼接请求 URL 作为签名原文,这里以Appid=125922***SecretId=*****Qq1zhZMN8dv0******为例拼接签名原文,则拼接的签名原文为:

asr./asr/v2/125922***?engine_model_type=16k_zh&expired=1673494772&needvad=1&nonce=1673408372&secretid=*****Qq1zhZMN8dv0******&timestamp=1673408372&voice_format=1&voice_id=c64385ee-3e5c-4fc5-bbfd-7c71addb35b0

实现方法:

private TreeMap<String, Object> getRequestParamMap(AsrConfig asrConfig, AsrRequest request, AsrRequestContent content) {TreeMap<String, Object> treeMap = new TreeMap();treeMap.put(TencentContents.SECRET_ID, asrConfig.getSecretId());treeMap.put(TencentContents.ENGINE_MODEL_TYPE, request.getEngineModelType());treeMap.put(TencentContents.VOICE_ID, content.getVoiceId());treeMap.put(TencentContents.VOICE_FORMAT, request.getVoiceFormat());treeMap.put(TencentContents.TIMESTAMP, request.getTimestamp());treeMap.put(TencentContents.EXPIRED, request.getExpired());treeMap.put(TencentContents.NONCE, request.getNonce());treeMap.put(TencentContents.NEED_VAD, request.getNeedVad());return treeMap;}private TreeMap<String, Object> getWsParams(AsrConfig asrConfig, AsrRequest request, AsrRequestContent content) {TreeMap<String, Object> treeMap = this.getRequestParamMap(asrConfig, request, content);if (request.getExtendsParam() != null) {Iterator var5 = request.getExtendsParam().entrySet().iterator();while (var5.hasNext()) {Map.Entry<String, Object> entry = (Map.Entry) var5.next();treeMap.put(entry.getKey(), entry.getValue());}}return treeMap;}public static String createUrl(Map<String, Object> paramMap) {StringBuilder sb = new StringBuilder();sb.append("?");Iterator var2 = paramMap.entrySet().iterator();while(var2.hasNext()) {Map.Entry<String, Object> entry = (Map.Entry)var2.next();if (entry.getValue() != null && entry.getValue() != "") {sb.append((String)entry.getKey());sb.append('=');sb.append(entry.getValue());sb.append('&');}}if (paramMap.size() > 0) {sb.setLength(sb.length() - 1);}return sb.toString();}String signUrl = new StringBuilder().append(asrConfig.getWsSignUrl()).append(asrConfig.getAppId()).append(paramUrl).toString();public AsrConfig(String appId, String secretKey, String secretId, Long waitTime, String realAsrUrl, String signUrl, String logUrl, String wsUrl, String token) {super(secretId, secretKey, Long.valueOf(appId), token);this.realAsrUrl = (String)Optional.ofNullable(realAsrUrl).orElse("https://asr./asr/v1/");this.signUrl = (String)Optional.ofNullable(signUrl).orElse("asr./asr/v1/");this.logUrl = (String)Optional.ofNullable(logUrl).orElse("/");this.wsUrl = (String)Optional.ofNullable(wsUrl).orElse("wss://asr./asr/v2/");this.wsSignUrl = "asr./asr/v2/";this.flashUrl = "https://asr./asr/flash/v1/";this.flashSignUrl = "asr./asr/flash/v1/";this.waitTime = (Long)Optional.ofNullable(waitTime).orElse(6000L);}

2、对签名原文使用 SecretKey 进行 HmacSha1 加密,之后再进行 base64 编码。例如对上一步的签名原文,SecretKey=*****SkqpeHgqmSz*****,使用 HmacSha1 算法进行加密并做 base64 编码处理:

Base64Encode(HmacSha1("asr./asr/v2/125922***?engine_model_type=16k_zh&expired=1673494772&needvad=1&nonce=1673408372&secretid=*****Qq1zhZMN8dv0******&timestamp=1673408372&voice_format=1&voice_id=c64385ee-3e5c-4fc5-bbfd-7c71addb35b0", "*****SkqpeHgqmSz*****"))

得到 signature 签名值为:G8jDQBRg1JfeBi/YnTjyjekxfDA=

代码:

public static String base64_hmac_sha1(String originalText, String secretKey) {try {Mac hmac = Mac.getInstance("HmacSHA1");hmac.init(new SecretKeySpec(secretKey.getBytes("UTF-8"), "HmacSHA1"));byte[] hash = hmac.doFinal(originalText.getBytes("UTF-8"));return Base64.encodeBase64String(hash);} catch (Exception var4) {var4.printStackTrace();return "";}}

4、将 signature 值进行urlencode(必须进行 URL 编码,否则将导致鉴权失败偶现)后拼接得到最终请求 URL 为:

wss://asr./asr/v2/1259228442?engine_model_type=16k_zh&expired=1592380492&filter_dirty=1&filter_modal=1&filter_punc=1&needvad=1&nonce=1592294092123&secretid=AKIDoQq1zhZMN8dv0psmvud6OUKuGPO7pu0r&timestamp=1592294092&voice_format=1&voice_id=RnKu9FODFHK5FPpsrN&signature=HepdTRX6u155qIPKNKC%2B3U0j1N0%3D

websocket调用代码:

/*** 单独执行** @param client SpeechClient*/public static void runOnce(final SpeechClient client) {try {//案例使用文件模拟实时获取语音流,用户使用可直接调用write传入字节数据FileInputStream fileInputStream = new FileInputStream(new File("E:\\CloudMusic\\电台节目\\365读书 - 钱钟书:谈教训.mp3"));// FileInputStream fileInputStream = new FileInputStream(new File("E:\\Download\\珍惜-孙露.mp3"));//http 建议每次传输200ms数据 websocket建议每次传输40ms数据List<byte[]> speechData = ByteUtils.subToSmallBytes(fileInputStream,SpeechRecognitionSysConfig.requestWay == AsrConstant.RequestWay.Http ? 6400 : 640);//请求参数,用于配置语音识别相关参数,可使用init方法进行默认配置或使用 builder的方式构建自定义参数SpeechRecognitionRequest request = SpeechRecognitionRequest.initialize();request.setEngineModelType("16k_zh_dialect"); //模型类型为必传参数,否则异常request.setVoiceFormat(8); //指定音频格式SpeechRecognizer speechWsRecognizer = client.newSpeechRecognizer(request, new MySpeechRecognitionListener());//开始识别 调用start方法speechWsRecognizer.start();for (int i = 0; i < speechData.size(); i++) {//模拟音频间隔Thread.sleep(SpeechRecognitionSysConfig.requestWay == AsrConstant.RequestWay.Http ? 200 : 20);//发送数据speechWsRecognizer.write(speechData.get(i));}//结束识别调用stop方法speechWsRecognizer.stop();fileInputStream.close();} catch (Exception e) {e.printStackTrace();}}

public class SpeechWsRecognizer implements SpeechRecognizer {protected AsrConfig asrConfig;protected SpeechRecognitionRequest asrRequest;protected AsrRequestContent asrRequestContent;protected SpeechRecognitionListener listener;protected WebSocket webSocket;protected int reConnectMaxNum = 10;protected int connectNum = 0;protected volatile boolean isConnect = false;protected volatile AtomicBoolean endFlag = new AtomicBoolean(false);protected volatile AtomicBoolean startFlag = new AtomicBoolean(false);protected SpeechRecognitionSignService speechRecognitionSignService = new SpeechRecognitionSignService();private ReentrantLock lock = new ReentrantLock();private final CountDownLatch startLatch = new CountDownLatch(1);private final CountDownLatch closeLatch = new CountDownLatch(1);private boolean begin = false;private AtomicLong adder = new AtomicLong(0L);private TractionManager tractionManager;private WsClientService wsClientService;public SpeechWsRecognizer(WsClientService wsClientService, String streamId, AsrConfig config, SpeechRecognitionRequest request, SpeechRecognitionListener listener) {this.wsClientService = wsClientService;this.asrConfig = config;this.asrRequest = request;if (StringUtils.isEmpty(request.getVoiceId())) {request.setVoiceId(AsrUtils.getVoiceId(config.getAppId()));}this.asrRequestContent = AsrRequestContent.builder().seq(0).end(0).streamId(streamId).voiceId(request.getVoiceId()).build();this.listener = listener;this.tractionManager = new TractionManager(config.getAppId());}private Boolean createWebsocket() throws SdkRunException {if (!this.isConnect || this.webSocket == null) {Boolean var2;try {this.lock.lock();if (this.isConnect && this.webSocket != null) {return true;}ReportService.ifLogMessage(this.getId(), "create websocket", false);this.asrRequest.setTimestamp(System.currentTimeMillis() / 1000L);this.asrRequest.setExpired(System.currentTimeMillis() / 1000L + 86400L);String paramUrl = SignHelper.createUrl(this.speechRecognitionSignService.getWsParams(this.asrConfig, this.asrRequest, this.asrRequestContent));String signUrl = this.asrConfig.getWsSignUrl() + this.asrConfig.getAppId() + paramUrl;String sign = SignBuilder.base64_hmac_sha1(signUrl, this.asrConfig.getSecretKey());String url = this.asrConfig.getWsUrl() + this.asrConfig.getAppId() + paramUrl;WebSocketListener webSocketListener = this.createWebSocketListener();this.webSocket = this.wsClientService.asrWebSocket(this.asrConfig.getToken(), url, sign, webSocketListener);this.isConnect = true;boolean countDown = this.startLatch.await((long)SpeechRecognitionSysConfig.wsStartMethodWait, TimeUnit.SECONDS);if (!countDown) {throw new SdkRunException(Code.CODE_10001);}return true;} catch (Exception var10) {var10.printStackTrace();var2 = false;} finally {this.lock.unlock();}return var2;} else {return true;}}public void start() throws SdkRunException {Boolean success = this.createWebsocket();if (success) {this.startFlag.set(true);this.tractionManager.beginTraction(this.asrRequestContent.getStreamId());}}public void write(byte[] data) throws SdkRunException {if (!this.startFlag.get()) {ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " package please call start method!!", false);throw new SdkRunException(Code.CODE_10002);} else if (this.endFlag.get()) {ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " can`t write,because you call stop method or send message fail", false);throw new SdkRunException(Code.CODE_10003);} else if (!this.isConnect) {ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " client is closing", false);throw new SdkRunException(Code.CODE_10004);} else {ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " package", false);boolean success = this.webSocket.send(ByteString.of(data));ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " package " + success, false);this.adder.incrementAndGet();if (!success) {for(int i = 0; i < SpeechRecognitionSysConfig.retryRequestNum; ++i) {success = this.webSocket.send(ByteString.of(data));if (success) {break;}}}}}private void write(String data) {if (!this.endFlag.get()) {ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " end package", false);this.adder.incrementAndGet();this.webSocket.send(data);}}public Boolean stop() {if (this.endFlag.get()) {return true;} else {this.write(JsonUtil.toJson(MapUtil.builder().put("type", "end").build()));this.endFlag.set(true);try {this.closeLatch.await((long)SpeechRecognitionSysConfig.wsStopMethodWait, TimeUnit.SECONDS);} catch (InterruptedException var2) {var2.printStackTrace();ReportService.ifLogMessage(this.getId(), "stop_exception:" + var2.getMessage(), false);}return true;}}private WebSocketListener createWebSocketListener() {return new WebSocketListener() {public void onClosed(WebSocket webSocket, int code, String reason) {super.onClosed(webSocket, code, reason);ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "ws onClosed" + reason, false);SpeechWsRecognizer.this.isConnect = false;SpeechWsRecognizer.this.countDownStop("onClosed");}public void onClosing(WebSocket webSocket, int code, String reason) {super.onClosing(webSocket, code, reason);ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "ws onClosing", false);SpeechWsRecognizer.this.isConnect = false;SpeechWsRecognizer.this.countDownStop("onClosing");}public void onFailure(WebSocket webSocket, Throwable t, Response response) {try {SpeechWsRecognizer.this.isConnect = false;SpeechWsRecognizer.this.countDownStart("onFailure");SpeechWsRecognizer.this.countDownStop("onFailure");String trace = Tutils.getStackTrace(t);if (!StringUtils.contains(trace, "Socket closed") && !SpeechWsRecognizer.this.endFlag.get()) {ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onFailure:" + trace, true);SpeechRecognitionResponse rs = new SpeechRecognitionResponse();rs.setCode(Code.EXCEPTION.getCode());rs.setMessage(trace);rs.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());rs.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onFailure", false);ReportService.report(false, String.valueOf(rs.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, rs, SpeechWsRecognizer.this.asrConfig.getWsUrl(), t.getMessage());SpeechWsRecognizer.this.listener.onFail(rs);}} catch (Throwable var6) {throw var6;}}public void onMessage(WebSocket webSocket, String text) {try {super.onMessage(webSocket, text);ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onMessage:" + text, false);SpeechRecognitionResponse response = (SpeechRecognitionResponse)JsonUtil.fromJson(text, SpeechRecognitionResponse.class);if (SpeechWsRecognizer.this.listener != null && response != null) {SpeechWsRecognizer.this.listener.onMessage(response);if (response.getCode() == 0) {SpeechWsRecognizer.this.resultCallBack(response);ReportService.report(true, String.valueOf(response.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, response, SpeechWsRecognizer.this.asrConfig.getWsUrl(), response.getMessage());} else {ReportService.report(false, String.valueOf(response.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, response, SpeechWsRecognizer.this.asrConfig.getWsUrl(), response.getMessage());response.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());response.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());SpeechWsRecognizer.this.endFlag.set(true);SpeechWsRecognizer.this.listener.onFail(response);}}} catch (Throwable var4) {throw var4;}}public void onMessage(WebSocket webSocket, ByteString bytes) {super.onMessage(webSocket, bytes);}public void onOpen(WebSocket webSocket, Response response) {super.onOpen(webSocket, response);ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onOpen:" + JsonUtil.toJson(response), false);SpeechWsRecognizer.this.isConnect = response.code() == 101;if (!SpeechWsRecognizer.this.isConnect) {ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onOpen: fail", false);webSocket.close(1001, "onOpen");}SpeechWsRecognizer.this.countDownStart("onOpen");if (SpeechWsRecognizer.this.listener != null) {SpeechRecognitionResponse recognitionResponse = new SpeechRecognitionResponse();recognitionResponse.setCode(0);recognitionResponse.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());recognitionResponse.setFinalSpeech(0);recognitionResponse.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());recognitionResponse.setMessage("success");SpeechWsRecognizer.this.listener.onRecognitionStart(recognitionResponse);}}};}private void resultCallBack(SpeechRecognitionResponse response) {response.setStreamId(this.asrRequestContent.getStreamId());if (response.getFinalSpeech() == null) {response.setFinalSpeech(0);}SpeechRecognitionResponse beginResp;if (response.getResult() != null && this.listener != null) {if (response.getResult().getSliceType() == 0) {this.begin = true;this.listener.onSentenceBegin(response);} else if (response.getResult().getSliceType() == 2) {if (!this.begin) {beginResp = (SpeechRecognitionResponse)JsonUtil.fromJson(JsonUtil.toJson(response), SpeechRecognitionResponse.class);beginResp.getResult().setSliceType(0);this.listener.onSentenceBegin(beginResp);}this.begin = false;this.listener.onSentenceEnd(response);} else {this.listener.onRecognitionResultChange(response);}}if (response.getFinalSpeech() != null && response.getFinalSpeech() == 1) {if (this.listener != null) {beginResp = new SpeechRecognitionResponse();beginResp.setCode(0);beginResp.setVoiceId(this.asrRequestContent.getVoiceId());beginResp.setFinalSpeech(1);beginResp.setStreamId(this.asrRequestContent.getStreamId());beginResp.setMessage("success");beginResp.setMessageId(response.getMessageId());this.listener.onRecognitionComplete(beginResp);}this.countDownStop("final");this.webSocket.cancel();}}private String getId() {return this.asrRequestContent.getStreamId() + "_" + this.asrRequestContent.getVoiceId();}private void reconnect(byte[] data) {if (!this.endFlag.get()) {if (this.connectNum <= this.reConnectMaxNum) {try {Thread.sleep(10L);this.write(data);++this.connectNum;} catch (InterruptedException var3) {var3.printStackTrace();}}}}private void countDownStop(String source) {try {if (this.closeLatch.getCount() > 0L) {this.closeLatch.countDown();ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_closeLatch_countDown", false);}} catch (Exception var3) {ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_closeLatch_exception" + var3.getMessage(), true);}}private void countDownStart(String source) {try {if (this.startLatch.getCount() > 0L) {this.startLatch.countDown();ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_startLatch_countDown", false);}} catch (Exception var3) {ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + " _startLatch_countDown" + var3.getMessage(), true);}}}

这是我的一个模拟测试,详细代码后续补充

如果觉得《实时实时语音识别(websocket)接入-腾讯云》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。