In this article, we will discuss how to use the UI interface of Google Cloud to perform speech-to-text transcription and then translate the audio transcript through Python integration with translation APIs. Unlike the previous article on “Google Cloud Speech-to-Text and Translation: Live Translation”, which mainly introduced how to perform live translation. This article teaches how to transcribe audio files on the Google Cloud’s UI interface. For instance, the speech file we are transcribing is an hour and a half long, exceeding Google Cloud’s 10MB limit for online transcription, so we need to upload the file to Cloud Storage first.
Speech-to-text UI interface
After the upload, open the UI interface and create a new audio configuration, entering the required speech information, as shown in Figure 1.
If you directly select the file uploaded to Cloud Storage, the system will automatically fill in the audio type, sampling rate, and other information, as shown in Figure 2.
In the second step, we need to fill in the spoken language used in the audio file. English has different dialects such as British English, American English, Hong Kong English, and Singapore English, and we can also choose up to 3 alternative spoken languages. Users can select the corresponding options, such as replacing spoken emojis with the corresponding Unicode symbols, or filtering out profane words and phrases, as shown in Figure 3:
In the third step, select the option to enable the model adaptation feature according to your needs, and then click the “Submit” button to start processing the file. As shown in Figure 4, the file starts to be processed.
When the file processing is completed, a green checkmark will be displayed on the left side of the file. Click on the green checkmark to download the transcript. You can choose the file format to download, including CSV or TXT. As shown in Figure 5.
For now, we have successfully converted the speech file into a text file. Next, we need to perform translation. The method used in this article is to directly download the JSON file from Cloud Storage, as shown in Figure 6. Then, we use Python to integrate with translation APIs for translation.
We will be useing the new_speech.json file, which is a speech-to-text file downloaded from Cloud Storage. The following code shows how we use the API to translate English into the target language zh-TW, using the Neural Machine Translation (NMT) model, and save the result as a txt file. If you download the generated transcript file directly, you can choose whether to include settings such as timestamps and file types. Compared to downloading the json file and integrating with API translation, downloading the transcript file directly can more directly obtain the desired file content and type, as shown in Figure 7.
In summary, this article detailed how to perform speech-to-text transcription and translation through the UI interface of Google Cloud. Users can first upload files to Cloud Storage, then create settings through the UI interface, fill in the required speech information and language options to perform transcription. Finally, users can download the converted CSV or TXT file. If translation is required, users can use Python to integrate with translation APIs to translate the converted text file. If users need to directly obtain the desired file content and type, they can also choose to download the transcript file. Through these steps, users can easily and quickly perform speech-to-text transcription and translation operations, and can customize settings according to their needs.