Tags diminishing machine translation results
Thread poster: Thijs Vissia
Thijs Vissia
Thijs Vissia
Netherlands
Mar 2, 2019

I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory(Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):

“Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

... See more
I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory(Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):

“Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

With some tags/formatting strewn in, this became: “Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

In there, there is the widespread idiom “beschikken over” (meaning “have access to”, “have at ones disposal”). I don’t know about the quality of the machine translation at MyMemory, but as a widespread idiom this should be recognised and translated properly.

When the tags were in there, this was returned as:
“Everyone institute about more money and more buildings.”

The connection between “ieder” and “institute” was broken by the tag, so instead of “every institute” it rendered this as “everyone institute”. Similarly, the composite verb “beschikken over”, was also interrupted by a tag, so the MT treated each piece separately, and apparently left out the verb entirely.

However, after creating a new file without tags, it came back as:

“Every institute will have more money and more buildings.”

Which may not be my phrasing of choice but otherwise a fine translation.

So I was wondering why the tags get sent out to the machine translation services in the first place? Is it so that all the formatting doesn’t need to be put back in manually afterwards? Wouldn’t it be almost as easy to strip the strings of the tags before sending the query to the MT service?
Considering that OmegaT already needs to recognise tags as such (to treat them differently in the editor pane), wouldn't it be possible to make sending them to the MT service optional?

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


[Edited at 2019-03-03 10:37 GMT]
Collapse


 
Milan Condak
Milan Condak  Identity Verified
Local time: 15:52
English to Czech
Translator can remove tags before translation Apr 1, 2019

Thijs Vissia wrote:

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


Translator can remove tags before pretranslation against TMX or using MT,

http://www.condak.cz/nove/2019-03/31/en/00.html

and put them back after pretranslation.

Milan


 
Thijs Vissia
Thijs Vissia
Netherlands
TOPIC STARTER
ah Apr 1, 2019

Milan Condak wrote:

Translator can remove tags before pretranslation against TMX or using MT, (...)
and put them back after pretranslation.

Milan


hi Milan,
Ah, thank you for the clarification, I didn't realize you could put them back afterwards by toggling the option again, but of course the source file isn't changed. I somehow assumed this worked the same way as tagwipe, which does affect the source file.

I think the documentation could be a bit clearer about this, or even the option in Preferences, 'Remove tags' seems rather definitive.

But clearly this solves my problem, I can translate and use MT and manually put tags back after translating.

cheers,
Thijs


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 15:52
Member (2006)
English to Afrikaans
+ ...
Fixed post (your membership fee will never buy fixed forum software) Apr 2, 2019

Thijs Vissia wrote:
I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory (Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):
Ieder instituut gaat beschikken over meer geld en meer gebouwen.

With some tags/formatting strewn in, this became:
Ieder <f0>instituut gaat beschikken</f0><f1> </f1><f2>over meer geld</f2> en meer gebouwen.

In there, there is the widespread idiom “beschikken over” (meaning “have access to”, “have at ones disposal”). I don’t know about the quality of the machine translation at MyMemory, but as a widespread idiom this should be recognised and translated properly.

When the tags were in there, this was returned as:
Everyone <f0> institute </f0><f1></f1><f2> about more money </f2> and more buildings.

The connection between “ieder” and “institute” was broken by the tag, so instead of “every institute” it rendered this as “everyone institute”. Similarly, the composite verb “beschikken over”, was also interrupted by a tag, so the MT treated each piece separately, and apparently left out the verb entirely.

However, after creating a new file without tags, it came back as:
Every institute will have more money and more buildings.
Which may not be my phrasing of choice but otherwise a fine translation.

So I was wondering why the tags get sent out to the machine translation services in the first place? Is it so that all the formatting doesn’t need to be put back in manually afterwards? Wouldn’t it be almost as easy to strip the strings of the tags before sending the query to the MT service?

Considering that OmegaT already needs to recognise tags as such (to treat them differently in the editor pane), wouldn't it be possible to make sending them to the MT service optional?

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


[Edited at 2019-04-02 05:54 GMT]


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Tags diminishing machine translation results






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »