Developer communities are often reluctant to write documentation because, for many developers, the fun part of community activities is writing code not documents. When it comes to localisation, even fewer people are interested or able to help.
Here, the Linux Foundation's Noriaki Fukuyasu shares his advice on how non native English speaking communities can help to make localisation fun and productive.
Takeaways coming soon!
Noriaki Fukuyasu: Thank you so much for inviting me inviting me to speak at the. I'm really excited. No. You mentioned that, you know, like, Tatsugawa san mentioned that, you know, he cried, you know, when he canceled, you know, previous, you know, the So I was really excited that, you know, the VeruCon came back as the VeruCon Earth. So I'm I'm really happy to, you know, be here.
So, again, my name is Noria Gifpiesu. I work for the Linux Foundation. I'm the VP of digital operations. And I I guess now I only have twenty five minutes, so let me go let me go right into the agenda today. So today, I'm gonna be talking about it takes community and technologies to make documents generation work more fun and productive.
So basically, would like to talk about how we can make community translation more fun work to do and more productive work to do. So that's the main agenda of today. Alright. So this is not the, like, a table of contents, a key point, key topics that I would like to share today. So one is in how technology can actually solve the issue with the translation we have at the open source community, particularly the open source local open source user communities.
And the other thing I would like to mention is no. There's a tech. What are the tools we can take advantage you know, to to solve the issue? And finally, I would like to talk about how can we make the transition work more fun and productive. So those are the three things I would like to mention today.
So the big head big headache for local open source community has always been translation. As, you know, Taishisan mentioned in the previous session that if the content is in English, and then people won't read it. Right? So this is sadly true particularly in Japan. So I would say the impact of language barrier to the non English speaking countries like Japan is huge, I would say.
And, you know, without, you know, translation, you know, I would say the new technology originated in overseas, like in United States and Europe, will not be spread in non English speaking countries such as Japan again. So, you know, you know, it is impossible to make every Japanese people, all Japanese people, all Japanese engineer engineer to fluently read English content, but it is possible to localize more document a lot faster by taking advantage of technology. So this is the kind of headache currently we are facing. And I wanna hope is the technology and community work to actually solve the headache we have. So how can how technology can actually potentially solve the issue of translation?
In other words, what is the to tackle that issue, have to first we have to touch upon what is the nature of the work translation. What is the nature of the translation work? And what is the area of the what what is the area that technology can actually, you know, able to intervene? So we I I like to touch upon that. So today, translation work is mainly human work.
I think in the major part of the translation is done by human. But if you take a closer look into the work we call it translation, it's actually split into two two pieces. One is translation that is required on the language skill. So I would call it lower value of work. And the other part of translation work is actually, you know, we can call it interpretation.
And this interpretation part is actually require like specialty, like a technology specialty. Right? So for example, you know, if you if you would if you would like to translate blockchain documentation into Japanese language, then, you know, many of the the terminologies in English actually actually, you know, something something quite new or a new way of using in Japanese language. So this is where the people who has the specialty, like, you know, technical skill and community skill, you know, come in and decide which terminology which Japanese terminology can best fit to this specific English word. And so for for this work and because of that, for this work, interpretation work requires specialty and and I would consider it as high value added work.
So but you know and if with technology so today, you know, those are, you know, interpretation translation altogether and mainly human work, you know, done by, you know, people. But with technology, you know, we can actually change the things. Right? So a translation part requires only language scale, lower body part. It can be done by technology like machine translation AI as hard time submission.
And the other part, interpretation part remains as you know, human work. And this is the high value high value added work. And for people who has technical skill and people who has open source understanding, this is actually a good, you know, I would say, the place where they are able to, you know, execute their specialty. So this is a good place to invest their their time. So it's the same thing, but I know let me let me cut the from let me cut the translation work from the perspective of business model.
So on the left hand side, and then in this slide, you know, I am I'm trying to compare the business model of translation with the open source business state. Right? And, you know, I would like to I would like to think that, you know, will the business model can be would the business model be changed? Just like an open source has done the changes in the business model today. So open source business today, you know, we we can say the business can be divided into, I'll say like, products can be divided into two parts.
Right? One part is non core competence part And the other is core core competence. So your solution part. And today, non core competence part is largely replaced by open source. So people take advantage of open source for non core core competence.
And, you know, therefore, they you know, companies and companies focus of their investment to their solution, like their core competence. That's now what is happening today in the open source industry. So on the other hand, this model no. This similar model can be applied to future translation business. So as I mentioned in the previous slide, translation work can be divided into two parts.
One is the translation, which is the non core value, and the interpretation part is the the place where there is a value. Right? So non core competence part can be translated by machine translation engine. Right? This this this is not a high value part.
So, you know, know, we do not have to reinvent the wheel. So let a machine translate this part and we focusing know, people who has a specialty in technology area should focus on solution. So that's the kind of the model I think in the future will happen in the translation industry. And when it comes to our industry, open source industry, we need an open and common platform for, open source industry, open source community we can take advantage. And this if the this translation tool, translation engine, where we can translate the no core competence part.
If there's a a open and a common platform to take to take care of this part, you know, this should make our translation work a lot easier. And also because now we can focus on our core competence solution part, you know, it it actually makes sense for people who have the specialty to spend time on. So now naturally, you know, for those work, it's more fun than just, know, translating, you know, a no non core competence part. So, you know, we would like to have something that we can, you know, take advantage. We know we can take take advantage for free.
And so I would I would like to say I would like to, you know, shout out that let's build open and common platform together. So a platform doesn't mean that, you know, like a translation tool, but it it should include something like your machine translation engine and also community translation process and all and more importantly, people's mindset to do the transition together, not by themselves. So altogether, I call it common platform. Right. Okay.
I'm right on. Since I only have twenty five minutes, so I'm I I try to be careful with the time. So in order to, you know, build our common platform, what are the things we need? Right? I I would think and I would think there are three three things that that will be in need.
So one is everyone use tool to translate and share the translation memory. This is the probably the most important part. Share the translation memory by using the translation tool. That's one thing. The other thing is train the machine translation engine optimized for open source industry.
That's that's number two. And three, and this is probably I said, share the translation memory is very important, but probably even more important part is build the mindset and process that, you know, the translation would will have to be done by the community. Never do it alone. You know, if if people do the translation alone, then people many people will reinvent the wheel. You know?
Right? And translation of the documentation isn't isn't the core competence of people's business. Right? So let's try not to do that by themselves. Do it do it together and never do it alone.
So those are the three things now, you know, that will be quite important to build the common platform. So as I said, translation memory is really important. Right? But why? What is, you know, and what is the translation memory and why it matters?
Right? So I'd like to talk about that. So translation memory. Here after I would like to just say TM. You know, I would like to I I put it in the slides just the TM.
But TM means TM stands for translation memory. So translation memory is basically a database of a word previously already translated. So it can be just a sentence. It it can be word or it can be sentence. It can be paragraphs or or even like in a sentence like units like, you know, headings, titles, elements.
So so these, you know, these words that has been previously translated will become translation translation memory if you use a tool, you know, when you translate. And also there's a and no. If you take a take a look at the image on the right right hand side. So that's the that's the actually, you know, translation memory. And there's a industry standard of file format called TMX, translation memory exchange.
And because we have industry standard of format, we can actually easily share and combine share and merge the everybody's translation memory. So what is the you know, why it matters? You know, why we do the translation memory? You know, because there are several benefits. One is avoiding the avoiding from translating the same sentence words, sentence like it relates over and over.
So basically once you translate, then as long as you use the translation memory, you don't have to translate again. So this is actually a huge benefit. And also, you know, we can use translation memory as it's like guidance of, you know, translate one thing to local language. Like, specific terminologies, you you'd like to translate one specific terminology into, you know, specific way, right, into Japanese. But if we do not use a translation memory, maybe a translator a translate this into this way, but a translator b would would translate something different.
And by using translation memory, by sharing a translation memory with others, with the community, we can avoid that case. And also, we can set the translation rule by the community. You know, you probably want to, you know, keep the same, you know, format style format, you know, style seek, style style guide. So by using translation memories, you know, we can we can set the, you know, kind of rules. So this is why, you know, you know, keeping the translation mem translation memory and sharing with others is quite important.
So what to do with the translation memory? You know? Okay. You know, we got translation memories. Well, but, you know, what to do with it?
Okay. So this is how how you do it. So basically, you know, you get the translation memories from the community, maybe from the GitHub. And translate your document using translation memory. Then if you translate new contents, then you would you will have new translation memory, additional translation memory, and you share with the with the community.
And at the community, merge the new translation memory with the existing translation translation memory. So maybe, you know, it's like commit commit additional translation memory to the mainstream mainline translation memory. And then update the trans community translation memory. And when you do the translation next time, make sure you get the updated translation memory from the community repository and do the translation. So basically, you will be able to get a lot of the productivity by spinning this cycle a lot.
So I would like to, you know, touch upon the available tool that, you know, that we can use actually in order to make it happen. So one of the tool we have been using is omega t. This is an open source translation tool and we have been using a lot. And this actually is compatible with the variety of translation memory formats. So as I mentioned, TMX is the industry standard, but I know there are other other formats.
And OmegaT can actually know how can be compatible with other formats as well. And also, OmegaT is able to translate over 30 document format such as Microsoft Word, Excel, PowerPoint. And actually PowerPoint is actually really helpful because, you know, I do a lot of translation of PowerPoints created by the my American coworkers. And, you know, they OmegaT will be able to eat your PowerPoint text and translate it. You know, we can we can work on it.
So it's it's actually really really helpful. And not only, you know, Microsoft Office documents, but you know, it can it can be used HTML and open document document format and media wiki. And also, of course, there are plain text. And, you know, one of the things, know, one of the good things about, you know, omega t is, you know, it has the interface with the popular machine translation engines such as Google translation and DPL. So basically, you know, we, you know, when we do the translation, we connect omega t with translation engine like Google translation API.
Then translation API will translate it and, you know, omega you know, we we can get omega t will basically, you know, shows the the result of the machine translation. So this is one good tool, interesting tool. Maybe you you can use it. And one other tool we have found recently, and we haven't really tried it a lot yet, but it seems like quite interesting, is the Textura translation editor. In Japanese, Minna no Jidoho Yaku.
So this is the this is actually one one of the tool we we have found recently and currently we we are studying a lot. So this is basically an online tool. So OmegaT, which, you know, I explained in the previous page, is basically a it's a it's it's a software you install on your PC. But Textra is an online tool online tool provided by National Institution of Information and Communication Technology. We call it Nict.
And this is the Japanese government affiliated agency, research research and development agency. And this is not an open source tool but anyone can actually use it for free. And this Texture Translation Editor has the machine translation engine behind it. So we can actually take advantage of their machine translate translation engine to translate it. And then once again, no need to install software on your PC.
This is actually one good thing. And also it it has the capability to translate many different file format just like Omega TV. And also one good things one good thing about this Textra is it has a capability to use WYSIWYG mode. So I know basically what you see is what you get. So in in the same format, you know, we we can do the translation work.
And also it is capable to create machine capable capable to create translation memory in multiple different file format just like omega t does. And, you know, this is also capable to generate a growth rate from the translation memory. So this is actually a really good functionality for community. And my future dream my future dream is basically in a translation engine optimized having, you know, having the translation engine optimized for open source usage. So to do that, train the machine translation engine optimized for open open source.
So it actually really matters if you share a translation engine. If you eventually would like to have something like this. So if you yeah. If if many people share the translation memories with others, entire you know, if the entire Japanese community start start customly, you know, habitually leaving the sharing the translation memory, then we should we should be able to get a lot of the translation memories. And that may help optimize the translation engine.
And one cool thing about Textra, which I explained in the previous slide, is actually take those translation memory and optimize for that industry's usage. So this this is one good thing about one one reason why I think Texture is something quite interesting. And also one thing I would like to add, know, my future dream is build the ecosystem around that machine translation engine. So, you know, having the ecosystem together, you know, that that we can use together with the machine in translation engine. It doesn't have to be like a open source, but then a a commercial tool as well.
For like a CMS that we can, you know, automatically, you know, translate the web our web page using the optimized industry optimized translation engine. And also deep repository that, you know, we can basically not commit automatically from the translation engine, something like it. But, you know, after all, you know, in order to make the translation work more fun and productive, people matters most, basically. You know, we have the tool. You know, we have the translation engine, but the people matters most.
And this is the most important thing thing. And and to make people motivate and people work smoothly within within the community, we have you know, we need to do a few things. One is now build a process to share the translation result and memory. So to do that, we we we have to have a mindset that translation as a fun collaborative work. Right?
You know, translation sometimes consider to be a a less interesting and time consuming work. But, no, it we we we need to have a a mindset to my mindset to consider that translation work as a fun collaborative work. And also we need a common repository. And not only, you know you know, people you know share translation outcome, but people don't do the machine translation memory. But from now on what we need to share is not only the translation outcome but translation memories.
And you know, we need a right work process. For example, we need we need to have a maintainer of the translation project. And no no maintainer will merge the translation memory. You know, maintain in a mainline translation memory and new translation memory needs to be merged by the maintainer. There's a tool to easy merge the translation memories.
So it's not a complete task, but a translation translation maintainer will be a good person to judge what's what's the right translation. And maybe we have to consider about some maybe like some legal time like a CLA, DCO, you know, so people people feel, guess, comfortable to contribute the the result. And one of the things that we we have to do is, you know, we have to I would like to touch upon is community transition can be can become fun work. And, you know, people I spoke to in terms of the translation, they said, you know, translation is an opportunity to learn technology. And, you know, when they do when when they when they do translation and when they learn new things, they they feel it's fun to translate.
And also collaboration with others. You know, if the translation can be done collaborate we get more feedback to our work and also translator can get good reputation and big appreciation. Then translation work can become more fun thing. And also as I mentioned in the very early earlier in my presentation, we can focus on high value added work and let the tool take care of the low value part. And this way, you know, we can focus on the part where it makes sense for the, you know, talented people can spend time on.
So, you know you know, if we do all this, we can probably make translational work more fun and a lot more productive. And my one final word is universal community translators unite and share the translation memory. This is my final thought, my final road that I would like to leave you with. And thank you so much, and back to Nakazo Nakatsuga san. Thank you.