i don't care, i love it
The first approach (of three) I want to talk about is using LLMs to translate to as many languages as possible without worrying too much about the final result.
Given that LLMs have proven to be good enough in the languages I can evaluate, I am going to assume that they will work similarly in others.
Obviously, this is not scientifically correct.
It is known that LLMs have greater abilities with languages that have a lot of text available on the internet, especially given the significant distance between English and other languages.
I do not have enough data, but from what I see, I tend to trust that to some extent the Pareto principle holds true.
80/20 is almost the same as 70/30.
As for the cost of translation, it is a bit difficult to calculate, but possible.
The tokenization process (transforming the text into “numbers” so that the LLM can interpret it) varies depending on the provider and the language of origin.
For OpenAI, English is cheaper than Spanish for example:
- English: 1 word ~ 1.3 tokens
- Spanish: 1 word ~ 2 tokens
Depending on the model we use and the number of tokens, we can calculate the costs.
I can already say that for an interactive fiction game, the costs are one or two orders of magnitude lower when translating with LLMs than with humans.
I do not want to make ethical judgments on this [yet], I am just presenting the data.
I will not give my personal opinion until the end of the series, for now, I am just presenting strategies.
As I said in the previous email: the result needs to be playtested, unless you really don’t care and love it.
PS: In one way or another, I managed to bring Charli XCX into this 😅.