[!NOTE] The Mixed Reality Academy tutorials were designed with HoloLens (1st gen) and Mixed Reality Immersive Headsets in mind. As such, we feel it is important to leave these tutorials in place for developers who are still looking for guidance in developing for those devices. These tutorials will not be updated with the latest toolsets or interactions being used for HoloLens 2. They will be maintained to continue working on the supported devices. There will be a new series of tutorials that will be posted in the future that will demonstrate how to develop for HoloLens 2. This notice will be updated with a link to those tutorials when they are posted.
In this course, you will learn how to add translation capabilities to a mixed reality application using Azure Cognitive Services, with the Translator Text API.
The Translator Text API is a translation Service which works in near real-time. The Service is cloud-based, and, using a REST API call, an app can make use of the neural machine translation technology to translate text to another language. For more information, visit the Azure Translator Text API page.
Upon completion of this course, you will have a mixed reality application which will be able to do the following:
This course will teach you how to get the results from the Translator Service into a Unity-based sample application. It will be up to you to apply these concepts to a custom application you might be building.
Course | HoloLens | Immersive headsets |
---|---|---|
MR and Azure 301: Language translation | ✔️ | ✔️ |
[!NOTE] While this course primarily focuses on Windows Mixed Reality immersive (VR) headsets, you can also apply what you learn in this course to Microsoft HoloLens. As you follow along with the course, you will see notes on any changes you might need to employ to support HoloLens. When using HoloLens, you may notice some echo during voice capture.
[!NOTE] This tutorial is designed for developers who have basic experience with Unity and C#. Please also be aware that the prerequisites and written instructions within this document represent what has been tested and verified at the time of writing (May 2018). You are free to use the latest software, as listed within the install the tools article, though it should not be assumed that the information in this course will perfectly match what you’ll find in newer software than what’s listed below.
We recommend the following hardware and software for this course:
If you’re using a microphone and headphones connected to (or built-in to) your headset, make sure the option “When I wear my headset, switch to headset mic” is turned on in Settings > Mixed reality > Audio and speech.
[!WARNING] Be aware that if you are developing for an immersive headset for this lab, you may experience audio output device issues. This is due to an issue with Unity, which is fixed in later versions of Unity (Unity 2018.2). The issue prevents Unity from changing the default audio output device at run time. As a work around, ensure you have completed the above steps, and close and re-open the Editor, when this issue presents itself.
To use the Azure Translator API, you will need to configure an instance of the Service to be made available to your application.
Log in to the Azure Portal.
[!NOTE] If you do not already have an Azure account, you will need to create one. If you are following this tutorial in a classroom or lab situation, ask your instructor or one of the proctors for help setting up your new account.
Once you are logged in, click on New in the top left corner and search for “Translator Text API.” Select Enter.
[!NOTE] The word New may have been replaced with Create a resource, in newer portals.
The new page will provide a description of the Translator Text API Service. At the bottom left of this page, select the Create button, to create an association with this Service.
Once you have clicked on Create:
Choose a Resource Group or create a new one. A resource group provides a way to monitor, control access, provision and manage billing for a collection of Azure assets. It is recommended to keep all the Azure Services associated with a single project (e.g. such as these labs) under a common resource group).
If you wish to read more about Azure Resource Groups, please visit the resource group article.
Select Create.
A notification will appear in the portal once the Service instance is created.
Click on the notification to explore your new Service instance.
Click the Go to resource button in the notification to explore your new Service instance. You will be taken to your new Translator Text API Service instance.
Set up and test your mixed reality immersive headset.
[!NOTE] You will not need motion controllers for this course. If you need support setting up an immersive headset, please follow these steps.
The following is a typical set up for developing with mixed reality and, as such, is a good template for other projects:
Open Unity and click New.
You will now need to provide a Unity Project name. Insert MR_Translation. Make sure the project type is set to 3D. Set the Location to somewhere appropriate for you (remember, closer to root directories is better). Then, click Create project.
With Unity open, it is worth checking the default Script Editor is set to Visual Studio. Go to Edit > Preferences and then from the new window, navigate to External Tools. Change External Script Editor to Visual Studio 2017. Close the Preferences window.
Next, go to File > Build Settings and switch the platform to Universal Windows Platform, by clicking on the Switch Platform button.
Go to File > Build Settings and make sure that:
Target Device is set to Any Device.
For Microsoft HoloLens, set Target Device to HoloLens.
Save the scene and add it to the build.
Do this by selecting Add Open Scenes. A save window will appear.
Create a new folder for this, and any future, scene, then select the New folder button, to create a new folder, name it Scenes.
Open your newly created Scenes folder, and then in the File name: text field, type MR_TranslationScene, then press Save.
Be aware, you must save your Unity scenes within the Assets folder, as they must be associated with the Unity Project. Creating the scenes folder (and other similar folders) is a typical way of structuring a Unity project.
In the Build Settings window, click on the Player Settings button, this will open the related panel in the space where the Inspector is located.
In this panel, a few settings need to be verified:
In the Other Settings tab:
API Compatibility Level should be .NET 4.6
Within the Publishing Settings tab, under Capabilities, check:
Microphone
Further down the panel, in XR Settings (found below Publish Settings), tick Virtual Reality Supported, make sure the Windows Mixed Reality SDK is added.
[!IMPORTANT] If you wish to skip the Unity Set up component of this course, and continue straight into code, feel free to download this .unitypackage, import it into your project as a Custom Package, and then continue from Chapter 5. You will still need to create a Unity Project.
To do this, select the Gear icon next to the Camera’s Transform component, and selecting Reset.
The Transform component should then look like:
And Scale is set to 1, 1, 1
An Audio Source component will be added to the Main Camera, as demonstrated below.
[!NOTE] For Microsoft HoloLens, you will need to also change the following, which are part of the Camera component on your Main Camera:
- Clear Flags: Solid Color.
- Background ‘Black, Alpha 0’ – Hex color: #00000000.
To show the input and output of the translation, a basic UI needs to be created. For this course, you will create a Canvas UI object, with several ‘Text’ objects to show the data.
Right-click in an empty area of the Hierarchy Panel, under UI, add a Canvas.
Next, change the following parameters in the Inspector Panel’s Rect Transform:
Scale - X 0.13 Y 0.13 Z 0.13
For each Text Object, select it and use the below tables to set the parameters in the Inspector Panel.
For the Rect Transform component:
Name | Transform - Position | Width | Height |
---|---|---|---|
MicrophoneStatusLabel | X -80 Y 90 Z 0 | 300 | 30 |
AzureResponseLabel | X -80 Y 30 Z 0 | 300 | 30 |
DictationLabel | X -80 Y -30 Z 0 | 300 | 30 |
TranslationResultLabel | X -80 Y -90 Z 0 | 300 | 30 |
For the Text (Script) component:
Name | Text | Font Size |
---|---|---|
MicrophoneStatusLabel | Microphone Status: | 20 |
AzureResponseLabel | Azure Web Response | 20 |
DictationLabel | You just said: | 20 |
TranslationResultLabel | Translation: | 20 |
Also, make the Font Style Bold. This will make the text easier to read.
For each of these children, select it and use the below tables to set the parameters in the Inspector Panel.
For the Rect Transform component:
Name | Transform - Position | Width | Height |
---|---|---|---|
MicrophoneStatusText | X 0 Y -30 Z 0 | 300 | 30 |
AzureResponseText | X 0 Y -30 Z 0 | 300 | 30 |
DictationText | X 0 Y -30 Z 0 | 300 | 30 |
TranslationResultText | X 0 Y -30 Z 0 | 300 | 30 |
For the Text (Script) component:
Name | Text | Font Size |
---|---|---|
MicrophoneStatusText | ?? | 20 |
AzureResponseText | ?? | 20 |
DictationText | ?? | 20 |
TranslationResultText | ?? | 20 |
Next, select the ‘centre’ alignment option for each text component:
To ensure the child UI Text objects are easily readable, change their Color. Do this by clicking on the bar (currently ‘Black’) next to Color.
Then, in the new, little, Color window, change the Hex Color to: 0032EAFF
In the Hierarchy Panel:
In the Scene and Game Views:
The first script you need to create is the Results class, which is responsible for providing a way to see the results of translation. The Class stores and displays the following:
To create this class:
Right-click in the Project Panel, then Create > Folder. Name the folder Scripts.
With the Scripts folder create, double click it to open. Then within that folder, right-click, and select Create > then C# Script. Name the script Results.
Insert the following namespaces:
using UnityEngine;
using UnityEngine.UI;
Inside the Class insert the following variables:
public static Results instance;
[HideInInspector]
public string azureResponseCode;
[HideInInspector]
public string translationResult;
[HideInInspector]
public string dictationResult;
[HideInInspector]
public string micStatus;
public Text microphoneStatusText;
public Text azureResponseText;
public Text dictationText;
public Text translationResultText;
Then add the Awake() method, which will be called when the class initializes.
private void Awake()
{
// Set this class to behave similar to singleton
instance = this;
}
Finally, add the methods which are responsible for outputting the various results information to the UI.
/// <summary>
/// Stores the Azure response value in the static instance of Result class.
/// </summary>
public void SetAzureResponse(string result)
{
azureResponseCode = result;
azureResponseText.text = azureResponseCode;
}
/// <summary>
/// Stores the translated result from dictation in the static instance of Result class.
/// </summary>
public void SetDictationResult(string result)
{
dictationResult = result;
dictationText.text = dictationResult;
}
/// <summary>
/// Stores the translated result from Azure Service in the static instance of Result class.
/// </summary>
public void SetTranslatedResult(string result)
{
translationResult = result;
translationResultText.text = translationResult;
}
/// <summary>
/// Stores the status of the Microphone in the static instance of Result class.
/// </summary>
public void SetMicrophoneStatus(string result)
{
micStatus = result;
microphoneStatusText.text = micStatus;
}
The second class you are going to create is the MicrophoneManager.
This class is responsible for:
To create this class:
Update the namespaces to be the same as the following, at the top of the MicrophoneManager class:
using UnityEngine;
using UnityEngine.Windows.Speech;
Then, add the following variables inside the MicrophoneManager class:
// Help to access instance of this object
public static MicrophoneManager instance;
// AudioSource component, provides access to mic
private AudioSource audioSource;
// Flag indicating mic detection
private bool microphoneDetected;
// Component converting speech to text
private DictationRecognizer dictationRecognizer;
Code for the Awake() and Start() methods now needs to be added. These will be called when the class initializes:
private void Awake()
{
// Set this class to behave similar to singleton
instance = this;
}
void Start()
{
//Use Unity Microphone class to detect devices and setup AudioSource
if(Microphone.devices.Length > 0)
{
Results.instance.SetMicrophoneStatus("Initialising...");
audioSource = GetComponent<AudioSource>();
microphoneDetected = true;
}
else
{
Results.instance.SetMicrophoneStatus("No Microphone detected");
}
}
Now you need the methods that the App uses to start and stop the voice capture, and pass it to the Translator class, that you will build soon. Copy the following code and paste it beneath the Start() method.
/// <summary>
/// Start microphone capture. Debugging message is delivered to the Results class.
/// </summary>
public void StartCapturingAudio()
{
if(microphoneDetected)
{
// Start dictation
dictationRecognizer = new DictationRecognizer();
dictationRecognizer.DictationResult += DictationRecognizer_DictationResult;
dictationRecognizer.Start();
// Update UI with mic status
Results.instance.SetMicrophoneStatus("Capturing...");
}
}
/// <summary>
/// Stop microphone capture. Debugging message is delivered to the Results class.
/// </summary>
public void StopCapturingAudio()
{
Results.instance.SetMicrophoneStatus("Mic sleeping");
Microphone.End(null);
dictationRecognizer.DictationResult -= DictationRecognizer_DictationResult;
dictationRecognizer.Dispose();
}
[!TIP] Though this application will not make use of it, the StopCapturingAudio() method has also been provided here, should you want to implement the ability to stop capturing audio in your application.
You now need to add a Dictation Handler that will be invoked when the voice stops. This method will then pass the dictated text to the Translator class.
/// <summary>
/// This handler is called every time the Dictation detects a pause in the speech.
/// Debugging message is delivered to the Results class.
/// </summary>
private void DictationRecognizer_DictationResult(string text, ConfidenceLevel confidence)
{
// Update UI with dictation captured
Results.instance.SetDictationResult(text);
// Start the coroutine that process the dictation through Azure
StartCoroutine(Translator.instance.TranslateWithUnityNetworking(text));
}
[!WARNING]
At this point you will notice an error appearing in the Unity Editor Console Panel (“The name ‘Translator’ does not exist…”). This is because the code references the Translator class, which you will create in the next chapter.
The last script you need to create is the Translator class.
This class is responsible for:
To create this Class:
Add the following namespaces to the top of the file:
using System;
using System.Collections;
using System.Xml.Linq;
using UnityEngine;
using UnityEngine.Networking;
Then add the following variables inside the Translator class:
public static Translator instance;
private string translationTokenEndpoint = "https://api.cognitive.microsoft.com/sts/v1.0/issueToken";
private string translationTextEndpoint = "https://api.microsofttranslator.com/v2/http.svc/Translate?";
private const string ocpApimSubscriptionKeyHeader = "Ocp-Apim-Subscription-Key";
//Substitute the value of authorizationKey with your own Key
private const string authorizationKey = "-InsertYourAuthKeyHere-";
private string authorizationToken;
// languages set below are:
// English
// French
// Italian
// Japanese
// Korean
public enum Languages { en, fr, it, ja, ko };
public Languages from = Languages.en;
public Languages to = Languages.it;
[!NOTE]
- The languages inserted into the languages enum are just examples. Feel free to add more if you wish; the API supports over 60 languages (including Klingon)!
- There is a more interactive page covering available languages, though be aware the page only appears to work when the site language is set to ‘’ (and the Microsoft site will likely redirect to your native language). You can change site language at the bottom of the page or by altering the URL.
- The authorizationKey value, in the above code snippet, must be the Key you received when you subscribed to the Azure Translator Text API. This was covered in Chapter 1.
In this case, the code will make a call to Azure using the authorization Key, to get a Token.
private void Awake()
{
// Set this class to behave similar to singleton
instance = this;
}
// Use this for initialization
void Start()
{
// When the application starts, request an auth token
StartCoroutine("GetTokenCoroutine", authorizationKey);
}
[!NOTE] The token will expire after 10 minutes. Depending on the scenario for your app, you might have to make the same coroutine call multiple times.
The coroutine to obtain the Token is the following:
/// <summary>
/// Request a Token from Azure Translation Service by providing the access key.
/// Debugging result is delivered to the Results class.
/// </summary>
private IEnumerator GetTokenCoroutine(string key)
{
if (string.IsNullOrEmpty(key))
{
throw new InvalidOperationException("Authorization key not set.");
}
using (UnityWebRequest unityWebRequest = UnityWebRequest.Post(translationTokenEndpoint, string.Empty))
{
unityWebRequest.SetRequestHeader("Ocp-Apim-Subscription-Key", key);
yield return unityWebRequest.SendWebRequest();
long responseCode = unityWebRequest.responseCode;
// Update the UI with the response code
Results.instance.SetAzureResponse(responseCode.ToString());
if (unityWebRequest.isNetworkError || unityWebRequest.isHttpError)
{
Results.instance.azureResponseText.text = unityWebRequest.error;
yield return null;
}
else
{
authorizationToken = unityWebRequest.downloadHandler.text;
}
}
// After receiving the token, begin capturing Audio with the MicrophoneManager Class
MicrophoneManager.instance.StartCapturingAudio();
}
[!WARNING] If you edit the name of the IEnumerator method GetTokenCoroutine(), you need to update the StartCoroutine and StopCoroutine call string values in the above code. As per Unity documentation, to Stop a specific Coroutine, you need to use the string value method.
Next, add the coroutine (with a “support” stream method right below it) to obtain the translation of the text received by the MicrophoneManager class. This code creates a query string to send to the Azure Translator Text API, and then uses the internal Unity UnityWebRequest class to make a ‘Get’ call to the endpoint with the query string. The result is then used to set the translation in your Results object. The code below shows the implementation:
/// <summary>
/// Request a translation from Azure Translation Service by providing a string.
/// Debugging result is delivered to the Results class.
/// </summary>
public IEnumerator TranslateWithUnityNetworking(string text)
{
// This query string will contain the parameters for the translation
string queryString = string.Concat("text=", Uri.EscapeDataString(text), "&from=", from, "&to=", to);
using (UnityWebRequest unityWebRequest = UnityWebRequest.Get(translationTextEndpoint + queryString))
{
unityWebRequest.SetRequestHeader("Authorization", "Bearer " + authorizationToken);
unityWebRequest.SetRequestHeader("Accept", "application/xml");
yield return unityWebRequest.SendWebRequest();
if (unityWebRequest.isNetworkError || unityWebRequest.isHttpError)
{
Debug.Log(unityWebRequest.error);
yield return null;
}
// Parse out the response text from the returned Xml
string result = XElement.Parse(unityWebRequest.downloadHandler.text).Value;
Results.instance.SetTranslatedResult(result);
}
}
Drag the appropriate Text objects from the Hierarchy Panel to those four slots, as shown in the image below.
Lastly, click on the Main Camera and look at the Inspector Panel. You will notice that in the script you dragged on, there are two drop down boxes that will allow you to set the languages.
At this point you need to test that the Scene has been properly implemented.
Ensure that:
You can test the immersive headset by pressing the Play button in the Unity Editor. The App should be functioning through the attached immersive headset.
[!WARNING]
If you see an error in the Unity console about the default audio device changing, the scene may not function as expected. This is due to the way the mixed reality portal deals with built-in microphones for headsets that have them. If you see this error, simply stop the scene and start it again and things should work as expected.
Everything needed for the Unity section of this project has now been completed, so it is time to build it from Unity.
From the Build Settings window, click Build.
To deploy your application:
In the Solution Platform, select x86, Local Machine.
For the Microsoft HoloLens, you may find it easier to set this to Remote Machine, so that you are not tethered to your computer. Though, you will need to also do the following:
- Know the IP Address of your HoloLens, which can be found within the Settings > Network & Internet > Wi-Fi > Advanced Options; the IPv4 is the address you should use.
- Ensure Developer Mode is On; found in Settings > Update & Security > For developers.
Congratulations, you built a mixed reality app that leverages the Azure Translation Text API to convert speech to translated text.
Can you add text-to-speech functionality to the app, so that the returned text is spoken?
Make it possible for the user to change the source and output languages (‘from’ and ‘to’) within the app itself, so the app does not need to be rebuilt every time you want to change languages.