Introduction

While I'm somehow impressed by what Language Learning Models (LLMs) can accomplish, their limitations within Integrated Development Environments (IDEs) assistants are somewhat disappointing. For example, they can't scan an entire project and comment on all non-compiling code so it compiles, nor can they transform a repository into a monorepo/turborepo. They will not even allow you to provide your UI library full documentation to minimize their output adjustments.

Fortunately, these limitations can be circumvented with code, leaving the only boundary as your imagination.

Today, we'll examine a use case by auto-localizing a project that's fully hardcoded with strings. The goal is to show that AI alone is often not enough, but by using it smartly, you can achieve great results in term of productivity.

⚠️
This is a very technical post, so unless you are a developer, it could be challenging to follow.

Context

There is already a tool called the localizator, which I co-created with Celian Moutafis, designed to facilitate project localization. Available on GitHub, its main function is to download translations from a Google spreadsheet automatically. It downloads a CSV file parse it and create the localization files for the platform, making the project easily translatable into different languages and thus accessible to a global audience.

The tool works effectively with iOS and Android native code, as well as with React. This versatility makes it suitable for a wide range of projects, and it can be a critical asset in your localization efforts.

You can find it for free here.

A common problem

A common problem when starting a web project is to hardcode strings directly into the code. This approach may seem simpler and quicker at first, but it can create significant problems down the line, because hardcoded strings can be difficult to extract and can make the process of adapting your website for different languages and regions a cumbersome task. It's also worth noting that making changes to hardcoded strings can require substantial effort, as you would have to find and replace each instance manually. Even some plugins like Ally i18n does not give perfect result as it fails recognizing ternary expression or toast for example. But what if you didn’t have to?

AI to the rescue

AI presents a new and exciting opportunity to address the problem. Previously, identifying and extracting hardcoded strings was a tedious and manual process. However, with advances in AI, we now have the capability to automate this process, which can save a lot of time and effort.

A program can scan through codebases, allowing a LLM to accurately identify hardcoded strings, and even replace them with meaningful keys and codebase. This can be done in a fraction of the time it would take a human to perform the same task, allowing developers to focus on more complex and creative tasks.

The process

The process is actually pretty simple (with a script):

  • Open all the .tsx files in the project recursively.
  • Transform all the hardcoded texts using a prompt.
💡
Prompt:

Find hardcoded text in the following TSX file and replace it with keys for the react-localization library considering this good example of using the library and a key:
``` // Your localization library snippet, providing context ```

A hardcoded text candidate can be identified by the fact that it is ALWAYS in ${originalLanguage} for this project.

Your answer should IMPERATIVELY contain ONLY the transformed JSX in plain text (with no comment or markdown triple backquote surrounding) as it will be used to replace the original content in the file. The original file content is:
``` ${content} ```
  • Rewrite the entire file and save it to its original destination. This allows for easy tracking of changes in git diff, and can be reverted partially if necessary.
  • Create a patch and, using a prompt, extract all the key values of the patch changes into a JSON. This JSON can be copied and pasted into the source language .ts or .json file.
💡
Prompt:

Considering this patch file, create a JSON file with all the keys and values for the replaced hardcoded strings in ${originalLanguage}
  • Convert the JSON into CSV, then copy paste it into the reference Google Spreadsheet.
  • Add a formula =GOOGLE_TRANSLATE(source_cell; "fr"; "de") in the spreadsheet cells to translate all the keys into other languages columns.
  • Run the localize.js script, which will create the correct translation files for your other languages from the Google spreadsheet.

Here is an example of code:

import {config} from 'dotenv'
import fs from 'fs'
import log4node from 'log4node'
import OpenAI from 'openai/index'
import path from 'path'

config({path: '../../.env'})

const log = new log4node.Log4Node({level: 'debug', file: 'log.md'})

const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY})

const originalLanguage = 'french'

const processFile = async filePath => {
  console.log(filePath)
  const content = await fs.promises.readFile(filePath, 'utf-8')

  const prompt = `Find hardcoded text in the following TSX file and replace it with keys for the react-localization library considering this good example of using the library and a key:
\`\`\`
import LocalizedStrings from 'react-localization'
import {en} from './en'
import {fr} from './fr'

export const k = new LocalizedStrings({
en: en,
fr: fr,
})

export const $ = (stringKey: string, ...rest: any) => k.formatString(stringKey, …rest)

// Usage in the JSX:
import {$, k} from '@/i18n/localization'

<Text maxW='90%' textAlign='center'>
      {$(k.welcomeMessage)}
    </Text>
\`\`\`

A hardcoded text candidate can be identified by the fact that it is ALWAYS in ${originalLanguage} for this project.

Your answer should IMPERATIVELY contain ONLY the transformed JSX in plain text (with no comment or markdown triple backquote surrounding) as it will be used to replace the original content in the file. The original file content is: 
\`\`\`
${content}
\`\`\``

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{role: 'user', content: prompt}],
    temperature: 0,
  })

  const result = response.choices[0].message?.content?.trim()

  if (result) {
    log.debug(result)
    await fs.promises.writeFile(filePath, result.trim(), 'utf-8')
  }
}

const processDirectory = async dirPath => {
  const entries = await fs.promises.readdir(dirPath, {withFileTypes: true})

  for (const entry of entries) {
    const fullPath = path.join(dirPath, entry.name)

    if (entry.isDirectory()) {
      await processDirectory(fullPath)
    } else if (entry.isFile() && entry.name.endsWith('.tsx')) {
      await processFile(fullPath)
    }
  }
}

const main = async () => {
  const subfolderPath = '../frontend'

  await processDirectory(subfolderPath)

  console.log(
    `Hardcoded strings replaced, now you can do a patch and use the following prompt: 
    Considering this patch file, create a JSON file with all the keys and values for 
    the replaced hardcoded strings in ${originalLanguage}`
  )
}

main().catch(error => {
  console.error('An error occurred:', error)
})

Of course, this works really well for 'french' or 'other than english' languages, but more effort tweaking the prompt might be required to extract 'english'.

And voila!

Conclusion

Integrating AI into the localization process significantly simplifies an otherwise cumbersome task. This automation of identifying and replacing hardcoded strings not only conserves time and effort but also enhances project consistency and reliability. The AI-powered localizator tool provides a promising solution to the challenges of project localization, facilitating the adaptation of projects to various languages and expanding their reach to a global audience. As we delve further into AI capabilities, we anticipate more innovative solutions like this that transform software development approaches. The potential of AI in this field is huge, and we've only just begun to tap into it. Exciting times are ahead!

Cheers 🍻