添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

本文提供代码片段来描述这些概念。 提供了指向每个用例的完整示例的链接。

向候选语言提供 AutoDetectSourceLanguageConfig 对象。 你预计音频中至少出现了一个候选语言。 对于 起始 LID ,最多可添加 4 种语言;对于 连续 LID ,最多可添加 10 种语言。 语音服务会返回提供的其中一种候选语言,即使音频中不存在这些语言。 例如,如果提供 fr-FR (法语)和 en-US (英语)作为候选语言,但语音使用的是德语,则服务会返回 fr-FR en-US

必须提供带有破折号 ( - ) 分隔符的完整区域设置,但语言标识仅使用每个基本语言一个区域设置。 对于同一语言,请勿包含多个区域设置(例如“ en-US ”和“ en-GB ”)。

var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
auto autoDetectSourceLanguageConfig = 
    AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });
auto_detect_source_language_config = \
    speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE", "zh-CN"));
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([("en-US", "de-DE", "zh-CN"]);
NSArray *languages = @[@"en-US", @"de-DE", @"zh-CN"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
    [[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];

有关详细信息,请参阅支持的语言

“起始”和“连续”语言标识

语音支持起始和连续语言标识 (LID)。

只有 C#、C++、Java(仅限语音转文本)、JavaScript(仅限文本转语音)和 Python 中的语音 SDK 支持连续语言标识。

  • “起始 LID”在音频的前几秒钟内标识语言一次。 如果音频中的语言不会更改,请使用“起始 LID”。 使用起始 LID 时,可以在不到 5 秒的时间内检测并返回一种语言。
  • 连续 LID 可以在音频期间识别多种语言。 如果音频中的语言会更改,请使用“连续 LID”。 “连续 LID”不支持在同一句子中更改语言。 例如,如果你主要说西班牙语并插入一些英语单词,它不会检测每个单词的语言更改。
  • 通过调用用于识别一次或连续识别的方法,实现“起始 LID”或“连续 LID”。 仅连续识别支持连续 LID。

    识别一次或连续识别

    语言标识通过识别对象和操作完成。 向语音服务请求识别音频。

    请勿将识别与标识混淆。 识别可以与语言标识一起使用,也可单独使用。

    将调用“识别一次”方法,或启动和停止连续识别方法。 你可从以下选项中进行选择:

  • 使用“起始 LID”识别一次。 “识别一次”不支持连续 LID。
  • 使用启动时 LID 和连续识别。
  • 使用连续 LID 和连续识别。
  • 仅“连续 LID”需要 SpeechServiceConnection_LanguageIdMode 属性。 如果不进行设置,语音服务默认为“启动时 LID”。 支持的值为启动时 LID AtStart 或连续 LID Continuous

    // Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
    var result = await recognizer.RecognizeOnceAsync();
    // Start and stop continuous recognition with At-start LID
    await recognizer.StartContinuousRecognitionAsync();
    await recognizer.StopContinuousRecognitionAsync();
    // Start and stop continuous recognition with Continuous LID
    speechConfig.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
    await recognizer.StartContinuousRecognitionAsync();
    await recognizer.StopContinuousRecognitionAsync();
    
    // Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
    auto result = recognizer->RecognizeOnceAsync().get();
    // Start and stop continuous recognition with At-start LID
    recognizer->StartContinuousRecognitionAsync().get();
    recognizer->StopContinuousRecognitionAsync().get();
    // Start and stop continuous recognition with Continuous LID
    speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
    recognizer->StartContinuousRecognitionAsync().get();
    recognizer->StopContinuousRecognitionAsync().get();
    
    // Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
    SpeechRecognitionResult  result = recognizer->RecognizeOnceAsync().get();
    // Start and stop continuous recognition with At-start LID
    recognizer.startContinuousRecognitionAsync().get();
    recognizer.stopContinuousRecognitionAsync().get();
    // Start and stop continuous recognition with Continuous LID
    speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
    recognizer.startContinuousRecognitionAsync().get();
    recognizer.stopContinuousRecognitionAsync().get();
    
    # Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
    result = recognizer.recognize_once()
    # Start and stop continuous recognition with At-start LID
    recognizer.start_continuous_recognition()
    recognizer.stop_continuous_recognition()
    # Start and stop continuous recognition with Continuous LID
    speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
    recognizer.start_continuous_recognition()
    recognizer.stop_continuous_recognition()
    

    使用“语音转文本”功能

    当需要识别音频源中的语言并将其转录为文本时,使用语音转文本识别。 有关详细信息,请参阅语音转文本概述

    C#、C++、Python、Java、JavaScript 和 Objective-C 中的语音 SDK 支持使用起始语言标识的语音转文本识别。 只有 C#、C++、Java、JavaScript 和 Python 中的语音 SDK 支持使用连续语言标识的语音转文本识别。

    目前,对于使用连续语言识别的语音转文本识别,必须从终结点创建 SpeechConfig,如代码示例所示。

    有关使用语言标识的语音转文本识别的更多示例,请参阅 GitHub

    using Microsoft.CognitiveServices.Speech.Audio; var speechConfig = SpeechConfig.FromEndpoint(new Uri("YourSpeechEndpoint"), "YourSpeechKey"); var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages( new string[] { "en-US", "de-DE", "zh-CN" }); using var audioConfig = AudioConfig.FromDefaultMicrophoneInput(); using (var recognizer = new SpeechRecognizer( speechConfig, autoDetectSourceLanguageConfig, audioConfig)) var speechRecognitionResult = await recognizer.RecognizeOnceAsync(); var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(speechRecognitionResult); var detectedLanguage = autoDetectSourceLanguageResult.Language; using Microsoft.CognitiveServices.Speech.Audio; var config = SpeechConfig.FromEndpoint(new Uri("YourSpeechEndpoint"), "YourSpeechKey"); // Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart) config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous"); var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" }); var stopRecognition = new TaskCompletionSource<int>(); using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav")) using (var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput)) // Subscribes to events. recognizer.Recognizing += (s, e) => if (e.Result.Reason == ResultReason.RecognizingSpeech) Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}"); var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result); Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}"); recognizer.Recognized += (s, e) => if (e.Result.Reason == ResultReason.RecognizedSpeech) Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}"); var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result); Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}"); else if (e.Result.Reason == ResultReason.NoMatch) Console.WriteLine($"NOMATCH: Speech could not be recognized."); recognizer.Canceled += (s, e) => Console.WriteLine($"CANCELED: Reason={e.Reason}"); if (e.Reason == CancellationReason.Error) Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}"); Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}"); Console.WriteLine($"CANCELED: Did you set the speech resource key and endpoint values?"); stopRecognition.TrySetResult(0); recognizer.SessionStarted += (s, e) => Console.WriteLine("\n Session started event."); recognizer.SessionStopped += (s, e) => Console.WriteLine("\n Session stopped event."); Console.WriteLine("\nStop recognition."); stopRecognition.TrySetResult(0); // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition. await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false); // Waits for completion. // Use Task.WaitAny to keep the task rooted. Task.WaitAny(new[] { stopRecognition.Task }); // Stops recognition. await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
    using namespace std;
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Audio;
    auto speechConfig = SpeechConfig::FromEndpoint("YourServiceEndpoint", "YourSpeechResoureKey");
    auto autoDetectSourceLanguageConfig =
        AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });
    auto recognizer = SpeechRecognizer::FromConfig(
        speechConfig,
        autoDetectSourceLanguageConfig
    speechRecognitionResult = recognizer->RecognizeOnceAsync().get();
    auto autoDetectSourceLanguageResult =
        AutoDetectSourceLanguageResult::FromResult(speechRecognitionResult);
    auto detectedLanguage = autoDetectSourceLanguageResult->Language;
    // Creates an instance of a speech config with specified subscription key and service region.
    // Note: For multi-lingual speech recognition with language id, it only works with speech v2 endpoint,
    // you must use FromEndpoint api in order to use the speech v2 endpoint.
    // Replace YourServiceRegion with your region, for example "westus", and
    // replace YourSubscriptionKey with your own speech key.
    string speechv2Endpoint = "wss://YourServiceRegion.stt.speech.microsoft.com/speech/universal/v2";
    auto speechConfig = SpeechConfig::FromEndpoint(speechv2Endpoint, "YourSubscriptionKey");
    // Set the mode of input language detection to either "AtStart" (the default) or "Continuous".
    // Please refer to the documentation of Language ID for more information.
    // https://aka.ms/speech/lid?pivots=programming-language-cpp
    speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
    // Define the set of languages to detect
    auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "zh-CN" });
    // Creates a speech recognizer using file as audio input.
    // Replace with your own audio file name.
    auto audioInput = AudioConfig::FromWavFileInput("en-us_zh-cn.wav");
    auto recognizer = SpeechRecognizer::FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioInput);
    // promise for synchronization of recognition end.
    promise<void> recognitionEnd;
    // Subscribes to events.
    recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
            auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
            cout << "Recognizing in " << lidResult->Language << ": Text =" << e.Result->Text << std::endl;
    recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
            if (e.Result->Reason == ResultReason::RecognizedSpeech)
                auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
                cout << "RECOGNIZED in " << lidResult->Language << ": Text=" << e.Result->Text << "\n"
                    << "  Offset=" << e.Result->Offset() << "\n"
                    << "  Duration=" << e.Result->Duration() << std::endl;
            else if (e.Result->Reason == ResultReason::NoMatch)
                cout << "NOMATCH: Speech could not be recognized." << std::endl;
    recognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
            cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
            if (e.Reason == CancellationReason::Error)
                cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
                    << "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
                    << "CANCELED: Did you update the subscription info?" << std::endl;
                recognitionEnd.set_value(); // Notify to stop recognition.
    recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
            cout << "Session stopped.";
            recognitionEnd.set_value(); // Notify to stop recognition.
    // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
    recognizer->StartContinuousRecognitionAsync().get();
    // Waits for recognition end.
    recognitionEnd.get_future().get();
    // Stops recognition.
    recognizer->StopContinuousRecognitionAsync().get();
    
    AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
        AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE"));
    SpeechRecognizer recognizer = new SpeechRecognizer(
        speechConfig,
        autoDetectSourceLanguageConfig,
        audioConfig);
    Future<SpeechRecognitionResult> future = recognizer.recognizeOnceAsync();
    SpeechRecognitionResult result = future.get(30, TimeUnit.SECONDS);
    AutoDetectSourceLanguageResult autoDetectSourceLanguageResult =
        AutoDetectSourceLanguageResult.fromResult(result);
    String detectedLanguage = autoDetectSourceLanguageResult.getLanguage();
    recognizer.close();
    speechConfig.close();
    autoDetectSourceLanguageConfig.close();
    audioConfig.close();
    result.close();
    
    // Shows how to do continuous speech recognition on a multilingual audio file with continuous language detection. Here, we assume the
    // spoken language in the file can alternate between English (US), Spanish (Mexico) and German.
    // If specified, speech recognition will use the custom model associated with the detected language.
    public static void continuousRecognitionFromFileWithContinuousLanguageDetectionWithCustomModels() throws InterruptedException, ExecutionException, IOException, URISyntaxException
        // Creates an instance of a speech config with specified
        // subscription key and endpoint URL. Replace with your own subscription key
        // and endpoint URL.
        SpeechConfig speechConfig = SpeechConfig.fromEndpoint(new URI("YourEndpointUrl"), "YourSubscriptionKey");
        // Change the default from at-start language detection to continuous language detection, since the spoken language in the audio
        // may change.
        speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
        // Define a set of expected spoken languages in the audio, with an optional custom model endpoint ID associated with each.
        // Update the below with your own languages. Please see https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support
        // for all supported languages.
        // Update the below with your own custom model endpoint IDs, or omit it if you want to use the standard model.
        List<SourceLanguageConfig> sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
        sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("en-US", "YourEnUsCustomModelID"));
        sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("es-MX", "YourEsMxCustomModelID"));
        sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("de-DE"));
        // Creates an instance of AutoDetectSourceLanguageConfig with the above 3 source language configurations.
        AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(sourceLanguageConfigs);
        // We provide a WAV file with English and Spanish utterances as an example. Replace with your own multilingual audio file name.
        AudioConfig audioConfig = AudioConfig.fromWavFileInput( "es-mx_en-us.wav");
        // Creates a speech recognizer using file as audio input and the AutoDetectSourceLanguageConfig
        SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, autoDetectSourceLanguageConfig, audioConfig);
        // Semaphore used to signal the call to stop continuous recognition (following either a session ended or a cancelled event)
        final Semaphore doneSemaphone = new Semaphore(0);
        // Subscribes to events.
        /* Uncomment this to see intermediate recognition results. Since this is verbose and the WAV file is long, it is commented out by default in this sample.
        speechRecognizer.recognizing.addEventListener((s, e) -> {
            AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
            String language = autoDetectSourceLanguageResult.getLanguage();
            System.out.println(" RECOGNIZING: Text = " + e.getResult().getText());
            System.out.println(" RECOGNIZING: Language = " + language);
        speechRecognizer.recognized.addEventListener((s, e) -> {
            AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
            String language = autoDetectSourceLanguageResult.getLanguage();
            if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                System.out.println(" RECOGNIZED: Text = " + e.getResult().getText());
                System.out.println(" RECOGNIZED: Language = " + language);
            else if (e.getResult().getReason() == ResultReason.NoMatch) {
                if (language == null || language.isEmpty() || language.toLowerCase().equals("unknown")) {
                    System.out.println(" NOMATCH: Speech Language could not be detected.");
                else {
                    System.out.println(" NOMATCH: Speech could not be recognized.");
        speechRecognizer.canceled.addEventListener((s, e) -> {
            System.out.println(" CANCELED: Reason = " + e.getReason());
            if (e.getReason() == CancellationReason.Error) {
                System.out.println(" CANCELED: ErrorCode = " + e.getErrorCode());
                System.out.println(" CANCELED: ErrorDetails = " + e.getErrorDetails());
                System.out.println(" CANCELED: Did you update the subscription info?");
            doneSemaphone.release();
        speechRecognizer.sessionStarted.addEventListener((s, e) -> {
            System.out.println("\n Session started event.");
        speechRecognizer.sessionStopped.addEventListener((s, e) -> {
            System.out.println("\n Session stopped event.");
            doneSemaphone.release();
        // Starts continuous recognition and wait for processing to end
        System.out.println(" Recognizing from WAV file... please wait");
        speechRecognizer.startContinuousRecognitionAsync().get();
        doneSemaphone.tryAcquire(30, TimeUnit.SECONDS);
        // Stop continuous recognition
        speechRecognizer.stopContinuousRecognitionAsync().get();
        // These objects must be closed in order to dispose underlying native resources
        speechRecognizer.close();
        speechConfig.close();
        audioConfig.close();
        for (SourceLanguageConfig sourceLanguageConfig : sourceLanguageConfigs)
            sourceLanguageConfig.close();
        autoDetectSourceLanguageConfig.close();
    
    auto_detect_source_language_config = \
            speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE"])
    speech_recognizer = speechsdk.SpeechRecognizer(
            speech_config=speech_config, 
            auto_detect_source_language_config=auto_detect_source_language_config, 
            audio_config=audio_config)
    result = speech_recognizer.recognize_once()
    auto_detect_source_language_result = speechsdk.AutoDetectSourceLanguageResult(result)
    detected_language = auto_detect_source_language_result.language
    speech_key, endpoint_string = "YourSpeechResoureKey","YourServiceEndpoint"
    weatherfilename="en-us_zh-cn.wav"
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, endpoint=endpoint_string)
    audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
    # Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
    speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
    auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
        languages=["en-US", "de-DE", "zh-CN"])
    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, 
        auto_detect_source_language_config=auto_detect_source_language_config,
        audio_config=audio_config)
    done = False
    def stop_cb(evt):
        """callback that signals to stop continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        nonlocal done
        done = True
    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)
    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)
    speech_recognizer.stop_continuous_recognition()
    
    NSArray *languages = @[@"en-US", @"de-DE", @"zh-CN"];
    SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
            [[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
    SPXSpeechRecognizer* speechRecognizer = \
            [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig
                               autoDetectSourceLanguageConfiguration:autoDetectSourceLanguageConfig
                                                  audioConfiguration:audioConfig];
    SPXSpeechRecognitionResult *result = [speechRecognizer recognizeOnce];
    SPXAutoDetectSourceLanguageResult *languageDetectionResult = [[SPXAutoDetectSourceLanguageResult alloc] init:result];
    NSString *detectedLanguage = [languageDetectionResult language];
    
    var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages(["en-US", "de-DE"]);
    var speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioConfig);
    speechRecognizer.recognizeOnceAsync((result: SpeechSDK.SpeechRecognitionResult) => {
            var languageDetectionResult = SpeechSDK.AutoDetectSourceLanguageResult.fromResult(result);
            var detectedLanguage = languageDetectionResult.language;
    

    语音转文本自定义模型

    自定义模型的语言检测只可用于实时语音转文本和语音翻译。 批量听录仅支持默认基础模型的语言检测。

    此示例演示如何将语言检测与自定义终结点一起使用。 如果检测到的语言为 en-US,则示例使用默认模型。 如果检测到的语言为 fr-FR,则示例使用自定义模型终结点。 有关详细信息,请参阅部署自定义语音识别模型

    var sourceLanguageConfigs = new SourceLanguageConfig[]
        SourceLanguageConfig.FromLanguage("en-US"),
        SourceLanguageConfig.FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR")
    var autoDetectSourceLanguageConfig =
        AutoDetectSourceLanguageConfig.FromSourceLanguageConfigs(
            sourceLanguageConfigs);
    

    此示例演示如何将语言检测与自定义终结点一起使用。 如果检测到的语言为 en-US,则示例使用默认模型。 如果检测到的语言为 fr-FR,则示例使用自定义模型终结点。 有关详细信息,请参阅部署自定义语音识别模型

    std::vector<std::shared_ptr<SourceLanguageConfig>> sourceLanguageConfigs;
    sourceLanguageConfigs.push_back(
        SourceLanguageConfig::FromLanguage("en-US"));
    sourceLanguageConfigs.push_back(
        SourceLanguageConfig::FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));
    auto autoDetectSourceLanguageConfig =
        AutoDetectSourceLanguageConfig::FromSourceLanguageConfigs(
            sourceLanguageConfigs);
    

    此示例演示如何将语言检测与自定义终结点一起使用。 如果检测到的语言为 en-US,则示例使用默认模型。 如果检测到的语言为 fr-FR,则示例使用自定义模型终结点。 有关详细信息,请参阅部署自定义语音识别模型

    List sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
    sourceLanguageConfigs.add(
        SourceLanguageConfig.fromLanguage("en-US"));
    sourceLanguageConfigs.add(
        SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));
    AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
        AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(
            sourceLanguageConfigs);
    

    此示例演示如何将语言检测与自定义终结点一起使用。 如果检测到的语言为 en-US,则示例使用默认模型。 如果检测到的语言为 fr-FR,则示例使用自定义模型终结点。 有关详细信息,请参阅部署自定义语音识别模型

     en_language_config = speechsdk.languageconfig.SourceLanguageConfig("en-US")
     fr_language_config = speechsdk.languageconfig.SourceLanguageConfig("fr-FR", "The Endpoint Id for custom model of fr-FR")
     auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
            sourceLanguageConfigs=[en_language_config, fr_language_config])
    

    此示例演示如何将语言检测与自定义终结点一起使用。 如果检测到的语言为 en-US,则示例使用默认模型。 如果检测到的语言为 fr-FR,则示例使用自定义模型终结点。 有关详细信息,请参阅部署自定义语音识别模型

    SPXSourceLanguageConfiguration* enLanguageConfig = [[SPXSourceLanguageConfiguration alloc]init:@"en-US"];
    SPXSourceLanguageConfiguration* frLanguageConfig = \
            [[SPXSourceLanguageConfiguration alloc]initWithLanguage:@"fr-FR"
                                                         endpointId:@"The Endpoint Id for custom model of fr-FR"];
    NSArray *languageConfigs = @[enLanguageConfig, frLanguageConfig];
    SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
            [[SPXAutoDetectSourceLanguageConfiguration alloc]initWithSourceLanguageConfigurations:languageConfigs];
    
    var enLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("en-US");
    var frLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR");
    var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs([enLanguageConfig, frLanguageConfig]);
    

    运行语音翻译

    当需要识别音频源中的语言并将其翻译为另一种语言时,请使用语音翻译。 有关详细信息,请参阅语音翻译概述

    C#、C++、JavaScript 和 Python 中的语音 SDK 仅支持使用语言标识的语音翻译。

    有关使用语言标识的语音翻译的更多示例,请参阅 GitHub

    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    using Microsoft.CognitiveServices.Speech.Translation;
    public static async Task RecognizeOnceSpeechTranslationAsync()
        var endpointUrl = new Uri("YourSpeechResoureEndpoint");
        var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSpeechResoureKey");
        // Source language is required, but currently ignored. 
        string fromLanguage = "en-US";
        speechTranslationConfig.SpeechRecognitionLanguage = fromLanguage;
        speechTranslationConfig.AddTargetLanguage("de");
        speechTranslationConfig.AddTargetLanguage("fr");
        var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
        using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
        using (var recognizer = new TranslationRecognizer(
            speechTranslationConfig, 
            autoDetectSourceLanguageConfig,
            audioConfig))
            Console.WriteLine("Say something or read from file...");
            var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
            if (result.Reason == ResultReason.TranslatedSpeech)
                var lidResult = result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);
                Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={result.Text}");
                foreach (var element in result.Translations)
                    Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
    
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    using Microsoft.CognitiveServices.Speech.Translation;
    public static async Task MultiLingualTranslation()
        var endpointUrl = new Uri("YourSpeechResoureEndpoint");
        var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSpeechResoureKey");
        // Source language is required, but currently ignored. 
        string fromLanguage = "en-US";
        config.SpeechRecognitionLanguage = fromLanguage;
        config.AddTargetLanguage("de");
        config.AddTargetLanguage("fr");
        // Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
        config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
        var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
        var stopTranslation = new TaskCompletionSource<int>();
        using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav"))
            using (var recognizer = new TranslationRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
                recognizer.Recognizing += (s, e) =>
                    var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);
                    Console.WriteLine($"RECOGNIZING in '{lidResult}': Text={e.Result.Text}");
                    foreach (var element in e.Result.Translations)
                        Console.WriteLine($"    TRANSLATING into '{element.Key}': {element.Value}");
                recognizer.Recognized += (s, e) => {
                    if (e.Result.Reason == ResultReason.TranslatedSpeech)
                        var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);
                        Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={e.Result.Text}");
                        foreach (var element in e.Result.Translations)
                            Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
                    else if (e.Result.Reason == ResultReason.RecognizedSpeech)
                        Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                        Console.WriteLine($"    Speech not translated.");
                    else if (e.Result.Reason == ResultReason.NoMatch)
                        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                recognizer.Canceled += (s, e) =>
                    Console.WriteLine($"CANCELED: Reason={e.Reason}");
                    if (e.Reason == CancellationReason.Error)
                        Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you set the speech resource key and endpoint values?");
                    stopTranslation.TrySetResult(0);
                recognizer.SpeechStartDetected += (s, e) => {
                    Console.WriteLine("\nSpeech start detected event.");
                recognizer.SpeechEndDetected += (s, e) => {
                    Console.WriteLine("\nSpeech end detected event.");
                recognizer.SessionStarted += (s, e) => {
                    Console.WriteLine("\nSession started event.");
                recognizer.SessionStopped += (s, e) => {
                    Console.WriteLine("\nSession stopped event.");
                    Console.WriteLine($"\nStop translation.");
                    stopTranslation.TrySetResult(0);
                // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
                Console.WriteLine("Start translation...");
                await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
                Task.WaitAny(new[] { stopTranslation.Task });
                await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
    
    auto endpointString = "YourSpeechResoureEndpoint";
    auto config = SpeechTranslationConfig::FromEndpoint(endpointString, "YourSpeechResoureKey");
    auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE" });
    // Sets source and target languages
    // The source language will be detected by the language detection feature. 
    // However, the SpeechRecognitionLanguage still need to set with a locale string, but it will not be used as the source language.
    // This will be fixed in a future version of Speech SDK.
    auto fromLanguage = "en-US";
    config->SetSpeechRecognitionLanguage(fromLanguage);
    config->AddTargetLanguage("de");
    config->AddTargetLanguage("fr");
    // Creates a translation recognizer using microphone as audio input.
    auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig);
    cout << "Say something...\n";
    // Starts translation, and returns after a single utterance is recognized. The end of a
    // single utterance is determined by listening for silence at the end or until a maximum of 15
    // seconds of audio is processed. The task returns the recognized text as well as the translation.
    // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
    // shot recognition like command or query.
    // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
    auto result = recognizer->RecognizeOnceAsync().get();
    // Checks result.
    if (result->Reason == ResultReason::TranslatedSpeech)
        cout << "RECOGNIZED: Text=" << result->Text << std::endl;
        for (const auto& it : result->Translations)
            cout << "TRANSLATED into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
    else if (result->Reason == ResultReason::RecognizedSpeech)
        cout << "RECOGNIZED: Text=" << result->Text << " (text could not be translated)" << std::endl;
    else if (result->Reason == ResultReason::NoMatch)
        cout << "NOMATCH: Speech could not be recognized." << std::endl;
    else if (result->Reason == ResultReason::Canceled)
        auto cancellation = CancellationDetails::FromResult(result);
        cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
        if (cancellation->Reason == CancellationReason::Error)
            cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
            cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
            cout << "CANCELED: Did you set the speech resource key and endpoint values?" << std::endl;
    
    using namespace std;
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Audio;
    using namespace Microsoft::CognitiveServices::Speech::Translation;
    void MultiLingualTranslation()
        auto config = SpeechTranslationConfig::FromEndpoint("YourSpeechResoureEndpoint", "YourSpeechResoureKey");
        // Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
        speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
        auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });
        promise<void> recognitionEnd;
        // Source language is required, but currently ignored. 
        auto fromLanguage = "en-US";
        config->SetSpeechRecognitionLanguage(fromLanguage);
        config->AddTargetLanguage("de");
        config->AddTargetLanguage("fr");
        auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
        auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig, audioInput);
        recognizer->Recognizing.Connect([](const TranslationRecognitionEventArgs& e)
                std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);
                cout << "Recognizing in Language = "<< lidResult << ":" << e.Result->Text << std::endl;
                for (const auto& it : e.Result->Translations)
                    cout << "  Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
        recognizer->Recognized.Connect([](const TranslationRecognitionEventArgs& e)
                if (e.Result->Reason == ResultReason::TranslatedSpeech)
                    std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);
                    cout << "RECOGNIZED in Language = " << lidResult << ": Text=" << e.Result->Text << std::endl;
                else if (e.Result->Reason == ResultReason::RecognizedSpeech)
                    cout << "RECOGNIZED: Text=" << e.Result->Text << " (text could not be translated)" << std::endl;
                else if (e.Result->Reason == ResultReason::NoMatch)
                    cout << "NOMATCH: Speech could not be recognized." << std::endl;
                for (const auto& it : e.Result->Translations)
                    cout << "  Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
        recognizer->Canceled.Connect([&recognitionEnd](const TranslationRecognitionCanceledEventArgs& e)
                cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
                if (e.Reason == CancellationReason::Error)
                    cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << std::endl;
                    cout << "CANCELED: ErrorDetails=" << e.ErrorDetails << std::endl;
                    cout << "CANCELED: Did you set the speech resource key and endpoint values?" << std::endl;
                    recognitionEnd.set_value();
        recognizer->Synthesizing.Connect([](const TranslationSynthesisEventArgs& e)
                auto size = e.Result->Audio.size();
                cout << "Translation synthesis result: size of audio data: " << size
                    << (size == 0 ? "(END)" : "");
        recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
                cout << "Session stopped.";
                recognitionEnd.set_value();
        // Starts continuos recognition. Use StopContinuousRecognitionAsync() to stop recognition.
        recognizer->StartContinuousRecognitionAsync().get();
        recognitionEnd.get_future().get();
        recognizer->StopContinuousRecognitionAsync().get();
    speech_key, service_endpoint = "YourSpeechResoureKey","YourServiceEndpoint"
    weatherfilename="en-us_zh-cn.wav"
    # set up translation parameters: source language and target languages
    translation_config = speechsdk.translation.SpeechTranslationConfig(
        subscription=speech_key,
        endpoint=service_endpoint,
        speech_recognition_language='en-US',
        target_languages=('de', 'fr'))
    audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
    # Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
    auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])
    # Creates a translation recognizer using and audio file as input.
    recognizer = speechsdk.translation.TranslationRecognizer(
        translation_config=translation_config, 
        audio_config=audio_config,
        auto_detect_source_language_config=auto_detect_source_language_config)
    # Starts translation, and returns after a single utterance is recognized. The end of a
    # single utterance is determined by listening for silence at the end or until a maximum of 15
    # seconds of audio is processed. The task returns the recognition text as result.
    # Note: Since recognize_once() returns only a single utterance, it is suitable only for single
    # shot recognition like command or query.
    # For long-running multi-utterance recognition, use start_continuous_recognition() instead.
    result = recognizer.recognize_once()
    # Check the result
    if result.reason == speechsdk.ResultReason.TranslatedSpeech:
        print("""Recognized: {}
        German translation: {}
        French translation: {}""".format(
            result.text, result.translations['de'], result.translations['fr']))
    elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(result.text))
        detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
        print("Detected Language: {}".format(detectedSrcLang))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        print("Translation canceled: {}".format(result.cancellation_details.reason))
        if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(result.cancellation_details.error_details))
    speech_key, service_endpoint = "YourSpeechResoureKey","YourServiceEndpoint"
    weatherfilename="en-us_zh-cn.wav"
    # Currently the v2 endpoint is required. In a future SDK release you won't need to set it. 
    translation_config = speechsdk.translation.SpeechTranslationConfig(
        subscription=speech_key,
        endpoint=service_endpoint,
        speech_recognition_language='en-US',
        target_languages=('de', 'fr'))
    audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
    # Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
    translation_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
    # Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
    auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])
    # Creates a translation recognizer using and audio file as input.
    recognizer = speechsdk.translation.TranslationRecognizer(
        translation_config=translation_config, 
        audio_config=audio_config,
        auto_detect_source_language_config=auto_detect_source_language_config)
    def result_callback(event_type, evt):
        """callback to display a translation result"""
        print("{}: {}\n\tTranslations: {}\n\tResult Json: {}".format(
            event_type, evt, evt.result.translations.items(), evt.result.json))
    done = False
    def stop_cb(evt):
        """callback that signals to stop continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        nonlocal done
        done = True
    # connect callback functions to the events fired by the recognizer
    recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    # event for intermediate results
    recognizer.recognizing.connect(lambda evt: result_callback('RECOGNIZING', evt))
    # event for final result
    recognizer.recognized.connect(lambda evt: result_callback('RECOGNIZED', evt))
    # cancellation event
    recognizer.canceled.connect(lambda evt: print('CANCELED: {} ({})'.format(evt, evt.reason)))
    # stop continuous recognition on either session stopped or canceled events
    recognizer.session_stopped.connect(stop_cb)
    recognizer.canceled.connect(stop_cb)
    def synthesis_callback(evt):
        callback for the synthesis event
        print('SYNTHESIZING {}\n\treceived {} bytes of audio. Reason: {}'.format(
            evt, len(evt.result.audio), evt.result.reason))
        if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("RECOGNIZED: {}".format(evt.result.properties))
            if evt.result.properties.get(speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult) == None:
                print("Unable to detect any language")
            else:
                detectedSrcLang = evt.result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
                jsonResult = evt.result.properties[speechsdk.PropertyId.SpeechServiceResponse_JsonResult]
                detailResult = json.loads(jsonResult)
                startOffset = detailResult['Offset']
                duration = detailResult['Duration']
                if duration >= 0:
                    endOffset = duration + startOffset
                else:
                    endOffset = 0
                print("Detected language = " + detectedSrcLang + ", startOffset = " + str(startOffset) + " nanoseconds, endOffset = " + str(endOffset) + " nanoseconds, Duration = " + str(duration) + " nanoseconds.")
                global language_detected
                language_detected = True
    # connect callback to the synthesis event
    recognizer.synthesizing.connect(synthesis_callback)
    # start translation
    recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)
    recognizer.stop_continuous_recognition()
    

    运行和使用容器

    语音容器提供通过语音 SDK 和语音 CLI 访问的基于 Websocket 的查询终结点 API。 默认情况下,语音 SDK 和语音 CLI 使用公共语音服务。 若要使用该容器,需要更改初始化方法。 使用容器主机 URL,而不是密钥和终结点。

    在容器中运行语言 ID 时,请使用 SourceLanguageRecognizer 对象而不是 SpeechRecognizerTranslationRecognizer

    有关容器的详细信息,请参阅语言识别语音容器操作指南。

    实现语音转文本批量听录

    若要使用 批量转录 REST API 来识别语言,请在 转录 - 提交 请求的正文中使用 languageIdentification 属性。

    批量听录仅支持默认基础模型的语言标识。 如果在听录请求中同时指定了语言识别和自定义模型,则服务会回退到使用指定候选语言的基本模型。 这可能会导致意外的识别结果。

    如果语音转文本方案既需要语言识别,又需要自定义模型,请使用实时语音转文本,而不是批量听录。

    以下示例显示了 languageIdentification 属性在四种候选语言中的用法。 有关请求属性的详细信息,请参阅创建批量听录

    "properties": { "languageIdentification": { "candidateLocales": [ "en-US", "ja-JP", "zh-CN", "hi-IN" 尝试语音转文本快速入门 使用自定义语音识别提高识别准确度 使用批量听录