Video Intelligence API는 지원되는 동영상 파일에서 음성 텍스트 변환을 수행합니다. 지원되는 두 가지 모델은 'default' 및 'video'입니다.
동영상의 음성 텍스트 변환 요청
REST
프로세스 요청 전송
다음은 videos:annotate 메서드에 POST 요청을 보내는 방법을 보여줍니다.
이 예시에서는 Google Cloud CLI를 사용하는 프로젝트의 서비스 계정을 설정하기 위해 액세스 토큰을 사용합니다. Google Cloud CLI 설치, 서비스 계정으로 프로젝트 설정, 액세스 토큰 획득 방법은 Video Intelligence 빠른 시작을 참조하세요.
요청 데이터를 사용하기 전에 다음을 바꿉니다.
INPUT_URI: 파일 이름을 포함하여 주석을 추가하고자 하는 파일을 포함한 Cloud Storage 버킷입니다. gs://로 시작해야 합니다. 예시:
"inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
Video Intelligence에 인증하려면 애플리케이션 기본 ���용자 인증 정보를 설정합니다.
자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.
funcspeechTranscriptionURI(wio.Writer,filestring)error{ctx:=context.Background()client,err:=video.NewClient(ctx)iferr!=nil{returnerr}deferclient.Close()op,err:=client.AnnotateVideo(ctx,&videopb.AnnotateVideoRequest{Features:[]videopb.Feature{videopb.Feature_SPEECH_TRANSCRIPTION,},VideoContext:&videopb.VideoContext{SpeechTranscriptionConfig:&videopb.SpeechTranscriptionConfig{LanguageCode:"en-US",EnableAutomaticPunctuation:true,},},InputUri:file,})iferr!=nil{returnerr}resp,err:=op.Wait(ctx)iferr!=nil{returnerr}// A single video was processed. Get the first result.result:=resp.AnnotationResults[0]for_,transcription:=rangeresult.SpeechTranscriptions{// The number of alternatives for each transcription is limited by// SpeechTranscriptionConfig.MaxAlternatives.// Each alternative is a different possible transcription// and has its own confidence score.for_,alternative:=rangetranscription.GetAlternatives(){fmt.Fprintf(w,"Alternative level information:\n")fmt.Fprintf(w,"\tTranscript: %v\n",alternative.GetTranscript())fmt.Fprintf(w,"\tConfidence: %v\n",alternative.GetConfidence())fmt.Fprintf(w,"Word level information:\n")for_,wordInfo:=rangealternative.GetWords(){startTime:=wordInfo.GetStartTime()endTime:=wordInfo.GetEndTime()fmt.Fprintf(w,"\t%4.1f - %4.1f: %v (speaker %v)\n",float64(startTime.GetSeconds())+float64(startTime.GetNanos())*1e-9,// start as secondsfloat64(endTime.GetSeconds())+float64(endTime.GetNanos())*1e-9,// end as secondswordInfo.GetWord(),wordInfo.GetSpeakerTag())}}}returnnil}
Java
Video Intelligence에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다.
자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.
// Instantiate a com.google.cloud.videointelligence.v1.VideoIntelligenceServiceClienttry(VideoIntelligenceServiceClientclient=VideoIntelligenceServiceClient.create()){// Set the language codeSpeechTranscriptionConfigconfig=SpeechTranscriptionConfig.newBuilder().setLanguageCode("en-US").setEnableAutomaticPunctuation(true).build();// Set the video context with the above configurationVideoContextcontext=VideoContext.newBuilder().setSpeechTranscriptionConfig(config).build();// Create the requestAnnotateVideoRequestrequest=AnnotateVideoRequest.newBuilder().setInputUri(gcsUri).addFeatures(Feature.SPEECH_TRANSCRIPTION).setVideoContext(context).build();// asynchronously perform speech transcription on videosOperationFuture<AnnotateVideoResponse,AnnotateVideoProgress>response=client.annotateVideoAsync(request);System.out.println("Waiting for operation to complete...");// Display the resultsfor(VideoAnnotationResultsresults:response.get(600,TimeUnit.SECONDS).getAnnotationResultsList()){for(SpeechTranscriptionspeechTranscription:results.getSpeechTranscriptionsList()){try{// Print the transcriptionif(speechTranscription.getAlternativesCount() > 0){SpeechRecognitionAlternativealternative=speechTranscription.getAlternatives(0);System.out.printf("Transcript: %s\n",alternative.getTranscript());System.out.printf("Confidence: %.2f\n",alternative.getConfidence());System.out.println("Word level information:");for(WordInfowordInfo:alternative.getWordsList()){doublestartTime=wordInfo.getStartTime().getSeconds()+wordInfo.getStartTime().getNanos()/1e9;doubleendTime=wordInfo.getEndTime().getSeconds()+wordInfo.getEndTime().getNanos()/1e9;System.out.printf("\t%4.2fs - %4.2fs: %s\n",startTime,endTime,wordInfo.getWord());}}else{System.out.println("No transcription found");}}catch(IndexOutOfBoundsExceptionioe){System.out.println("Could not retrieve frame: "+ioe.getMessage());}}}}
Node.js
Video Intelligence에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다.
자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.
// Imports the Google Cloud Video Intelligence libraryconstvideoIntelligence=require('@google-cloud/video-intelligence');// Creates a clientconstclient=newvideoIntelligence.VideoIntelligenceServiceClient();/** * TODO(developer): Uncomment the following line before running the sample. */// const gcsUri = 'GCS URI of video to analyze, e.g. gs://my-bucket/my-video.mp4';asyncfunctionanalyzeVideoTranscript(){constvideoContext={speechTranscriptionConfig:{languageCode:'en-US',enableAutomaticPunctuation:true,},};constrequest={inputUri:gcsUri,features:['SPEECH_TRANSCRIPTION'],videoContext:videoContext,};const[operation]=awaitclient.annotateVideo(request);console.log('Waiting for operation to complete...');const[operationResult]=awaitoperation.promise();// There is only one annotation_result since only// one video is processed.constannotationResults=operationResult.annotationResults[0];for(constspeechTranscriptionofannotationResults.speechTranscriptions){// The number of alternatives for each transcription is limited by// SpeechTranscriptionConfig.max_alternatives.// Each alternative is a different possible transcription// and has its own confidence score.for(constalternativeofspeechTranscription.alternatives){console.log('Alternative level information:');console.log(`Transcript: ${alternative.transcript}`);console.log(`Confidence: ${alternative.confidence}`);console.log('Word level information:');for(constwordInfoofalternative.words){constword=wordInfo.word;conststart_time=wordInfo.startTime.seconds+wordInfo.startTime.nanos*1e-9;constend_time=wordInfo.endTime.seconds+wordInfo.endTime.nanos*1e-9;console.log('\t'+start_time+'s - '+end_time+'s: '+word);}}}}analyzeVideoTranscript();
Python
Video Intelligence에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다.
자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.
"""Transcribe speech from a video stored on GCS."""fromgoogle.cloudimportvideointelligencevideo_client=videointelligence.VideoIntelligenceServiceClient()features=[videointelligence.Feature.SPEECH_TRANSCRIPTION]config=videointelligence.SpeechTranscriptionConfig(language_code="en-US",enable_automatic_punctuation=True)video_context=videointelligence.VideoContext(speech_transcription_config=config)operation=video_client.annotate_video(request={"features":features,"input_uri":path,"video_context":video_context,})print("\nProcessing video for speech transcription.")result=operation.result(timeout=600)# There is only one annotation_result since only# one video is processed.annotation_results=result.annotation_results[0]forspeech_transcriptioninannotation_results.speech_transcriptions:# The number of alternatives for each transcription is limited by# SpeechTranscriptionConfig.max_alternatives.# Each alternative is a different possible transcription# and has its own confidence score.foralternativeinspeech_transcription.alternatives:print("Alternative level information:")print("Transcript: {}".format(alternative.transcript))print("Confidence: {}\n".format(alternative.confidence))print("Word level information:")forword_infoinalternative.words:word=word_info.wordstart_time=word_info.start_timeend_time=word_info.end_timeprint("\t{}s - {}s: {}".format(start_time.seconds+start_time.microseconds*1e-6,end_time.seconds+end_time.microseconds*1e-6,word,))
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2024-11-18(UTC)"],[],[]]