Thursday, October 17, 2024

Google Cloud TTS App, Convert Hindi Text into Audio file

Google Text-to-Speech (TTS) service for the Hindi language


Objectives:

In this I have explained how you can develop a Text-ToSpeech Application that will convert Hindi text into Audio MP3 file. The limitations of the app is that it can convert text upto 5000 bytes. You can use chunking the text so that all the words upto 5000 bytes are processed by the Google TTS API at a time. In the first version of the application, I have provided simple example which can process text upto 5000 bytes. Neither text length is checked, nor logging is done in the application to keep the learning clear and concise.

Steps for TTS App Development:

You can use Google Cloud Text-to-Speech API to convert text files into speech, including Hindi text. Here's a step-by-step guide on how to do this:

Steps to Convert Text to Speech (Hindi) using Google Cloud TTS:

  1. Set up Google Cloud Project:
    • Go to Google Cloud Console.
    • Sign in and create a new project or select an existing one.
    • Enable the Text-to-Speech API from the API library.
  2. Set up Billing:
    • Google Cloud Text-to-Speech requires billing to be enabled on your project, though you can use the free tier for limited use.
    • Go to the billing section in the console and add a billing account if you haven't already.
  3. Create Credentials:
    • Go to the Credentials section and create an API key. This key will be used to authenticate API requests.
  4. Install Google Cloud Client Library for C#:
    • To interact with the Google Cloud Text-to-Speech API in C#, you need to install the Google.Cloud.TextToSpeech.V1 package via NuGet
  5. Write C# Code to Convert Hindi Text to Speech : Here's an example of a C# application that converts a Hindi text file into speech using the Google Cloud TTS API. The code is written in ASP.NET Core API controller named as TTSController. The code will work in .ASPNET Core SDK version 6 and later versions.
  6. 
    using Google.Apis.Auth.OAuth2;
    using Google.Cloud.TextToSpeech.V1;
    using Microsoft.AspNetCore.Mvc;
    
    namespace ShriWebTTS.Controllers
    {
        [Route("api/[controller]")]
        [ApiController]
        public class TTSController : ControllerBase
        {
            private const string GoogleAppCredentials = "GOOGLE_APPLICATION_CREDENTIALS";
            [HttpPost("convert-text-to-speech")]
            public async Task ConvertTextToSpeechAsync([FromBody] string text)
            {
    
                // Replace custom symbols with break tags
                text = text.Replace("|", "");   // 1 second pause
                text = text.Replace("||", "");  // 2 seconds pause
                text = text.Replace("\r\n", " "); // Replace Windows line breaks
    
                // Load the service account key file from the environment variable
                string? serviceAccountPath = Environment.GetEnvironmentVariable(GoogleAppCredentials);
    
                if (string.IsNullOrEmpty(serviceAccountPath) || !System.IO.File.Exists(serviceAccountPath))
                {
                    return BadRequest("Service account key file path is not set or file does not exist.");
                }
    
                // Create TextToSpeechClient using service account credentials
                var credential = GoogleCredential.FromFile(serviceAccountPath);
                var textToSpeechClient = TextToSpeechClient.Create(); // No parameters here
    
                // Create SSML input with pitch and volume adjustments
                var ssmlInput = $@"
                <speak xmlns='http://www.w3.org/2001/10/synthesis' version='1.0'>

    <prosody pitch='+5%' volume='+5dB' rate='slow'> {text}
    </prosody></speak>";

    var response = await textToSpeechClient.SynthesizeSpeechAsync(new SynthesisInput { Ssml = ssmlInput }, new VoiceSelectionParams { LanguageCode = "hi-IN", SsmlGender = SsmlVoiceGender.Male }, new AudioConfig { AudioEncoding = AudioEncoding.Mp3 }); var outputFilePath = "output.mp3"; // Write the audio content directly to the file using (var output = System.IO.File.Create(outputFilePath)) { response.AudioContent.WriteTo(output); } return File(System.IO.File.ReadAllBytes(outputFilePath), "audio/mpeg", "output.mp3"); } } }

    The Swagger API is enabled in the Application. Swagger is used so that you could send text data without using any separate client app. You can do the testing even using Postman but using Swagger helped me to avoid switching to any other app for testing. Look at the Program class coding this regard:

    
    namespace ShriWebTTS;
    public class Program
    {
        public static void Main(string[] args)
        {
            var builder = WebApplication.CreateBuilder(args);
    
            // Add services to the container.
    
            builder.Services.AddControllers();
            // Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle
            builder.Services.AddEndpointsApiExplorer();
            builder.Services.AddSwaggerGen();
    
            var app = builder.Build();
    
            // Configure the HTTP request pipeline.
            if (app.Environment.IsDevelopment())
            {
                app.UseSwagger();
                app.UseSwaggerUI();
            }
    
            app.UseHttpsRedirection();
    
            app.UseAuthorization();
    
    
            app.MapControllers();
    
            app.Run();
        }
    }
  7. Run the Application
    • Replace path/to/your/hindi-text-file.txt with the actual path to your Hindi text file.
    • Run the application, and it will generate an MP3 file with the synthesized speech.

Key Components in the Code:

  • TextToSpeechClient: This is the Google Cloud client that interacts with the Text-to-Speech API.
  • SynthesisInput: Contains the text input that will be converted to speech.
  • VoiceSelectionParams: Specifies the voice's language ("hi-IN" for Hindi) and gender.
  • AudioConfig: Specifies the output audio format (MP3 in this case).

Additional Notes:

  • Ensure that you have authenticated your Google Cloud SDK by setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your service account key JSON file, which can be created in the Google Cloud Console under Credentials.

For large-scale usage, you may also consider configuring more advanced features such as prosody, pitch, and speed settings. 

Client side Interface to send text

You can create a client side Interface to send text data to the API application. The following code generates a form with input text area and submit button. In this case, you will have to enable CORS in the TTS Web API app.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <form id="ttsForm">
        <label for="text">Enter Hindi Text:</label>
        <textarea id="text" name="text"></textarea>
        <button type="submit">Convert to Speech</button>
      </form>

      <audio id="audioPlayer" controls></audio>
      <script>
        document.getElementById('ttsForm').addEventListener('submit', async function (e) {
          e.preventDefault();
     
          const text = document.getElementById('text').value;
          const response = await fetch('https://your-api-url/api/tts/convert', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ text })
          });

          const audioBlob = await response.blob();
          const audioUrl = URL.createObjectURL(audioBlob);
          document.getElementById('audioPlayer').src = audioUrl;
        });
      </script>
</body>
</html>

As pointed before that if you're testing the API from a web front-end during development, you might need to enable CORS to allow requests from your local web browser. In Startup.cs, enable CORS for your development environment. The code will be as follows:


public void ConfigureServices(IServiceCollection services)
{
    services.AddCors(options =>
    {
        options.AddPolicy("AllowLocalhost",
            builder =>
            {
                builder.WithOrigins("http://localhost:3000") // Adjust based on your front-end URL
                       .AllowAnyHeader()
                       .AllowAnyMethod();
            });
    });

    services.AddControllers();
}

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
    if (env.IsDevelopment())
    {
        app.UseDeveloperExceptionPage();
        app.UseCors("AllowLocalhost"); // Enable CORS for development
    }

    app.UseRouting();
    app.UseEndpoints(endpoints =>
    {
        endpoints.MapControllers();
    });
}

OAuth Setup Steps

The OAuth consent screen is the interface that Google users see when your app requests permission to access their data. You need to provide certain information about your application to users, such as the name of your app, the scope of data you want to access, and your privacy policy.

Steps to Configure the OAuth Consent Screen

  1. Go to the Google Cloud Console:
    • Navigate to the Google Cloud Console.
  2. Select or Create a Project:
    • If you haven't already done so, make sure you have selected the appropriate Google Cloud project or create a new one.
  3. Open the OAuth Consent Screen Configuration:
    • From the left-hand navigation menu, go to APIs & Services > OAuth consent screen.
  4. Choose the User Type:
    • You will be asked to choose the user type:
      • Internal: Only available to users within your Google Workspace organization.
      • External: Available to any Google account holder (used for public-facing apps). This is the most common option for most web and mobile apps.
      • Select the appropriate user type and click Create.
  5. Fill in the Consent Screen Details:
    • App Name: The name of your app that users will see.
    • User Support Email: An email that users can contact for support.
    • App Logo (optional): You can upload a logo for your app, which will appear on the consent screen.
    • App Domain (Optional): Provide your website or app domain (optional, but recommended if you have a public-facing app).
    • Authorized Domains: Add the domains that are associated with your app (e.g., yourapp.com).
    • Developer Contact Information: Provide the developer's contact information, such as your email.
  6. Scopes:
    • Scopes define what kind of access your app is requesting. By default, Google Cloud will include basic profile information, but you can add other specific API scopes if your application requires access to additional user data.
  7. Summary and Submit:
    • Review the consent screen summary. You can save it as a draft if you’re still working on it or submit it for verification if you're using sensitive or restricted scopes.
  8. Verification (Optional):
    • If your app requests sensitive scopes (like user email or profile info), Google might require a verification process to ensure your app is secure. If you’re using non-sensitive scopes, you may not need this.
  9. Finish:
    • After configuring the consent screen, you'll be able to go back to the Credentials section and create your OAuth Client ID.

Now you can create an OAuth Client ID:

Once the consent screen is configured:

  1. Go back to the Credentials page in APIs & Services.
  2. Click on Create Credentials > OAuth 2.0 Client IDs.
  3. Select the application type (Web Application, Desktop, etc.).
  4. Fill in the necessary fields (e.g., Authorized Redirect URIs).
  5. Click Create.

This will generate your OAuth Client ID and Client Secret.

Conclusion:

The OAuth consent screen setup is a mandatory step that ensures transparency for users when your app requests access to their data. Once completed, you can proceed to create the OAuth client ID and integrate it into your application.

No comments:

Post a Comment

Hot Topics