Word 2003 / W7 — file anomaly when converting from PDF?
Thread poster: Tony M
Tony M
Tony M
France
Local time: 22:25
Member
French to English
+ ...
SITE LOCALIZER
Oct 8, 2012

Help! I wonder if anyone can throw any light on this for me?

I have a customer (agency) complaining about a translation I've done, and I half suspect it is an unfounded claim just to try and get out of paying for my work.

The original document was a PDF file (editable text, NOT an image)

The customer supplied me with a .DOCX conversion of this — I don't know how it was done, but suspect OCR might have been involved, as there were some incorrect characters
... See more
Help! I wonder if anyone can throw any light on this for me?

I have a customer (agency) complaining about a translation I've done, and I half suspect it is an unfounded claim just to try and get out of paying for my work.

The original document was a PDF file (editable text, NOT an image)

The customer supplied me with a .DOCX conversion of this — I don't know how it was done, but suspect OCR might have been involved, as there were some incorrect characters.

When I converted this file to .DOC and translated it using my Word (Office XP) running under Windows 7, everything appeared to be fine at my end; the document is a 5-page tabbed list, but has no specially exotic formatting, and all the lines are formatted the same.

However, when I sent the document back to the customer, he complained that 3 of the list items had not been translated (the total list covers 5 widely-spaced pages, around 60 items in all). He sent me back a PDF output of the document, indeed showing 3 list items I'd not seen before.

I was nonetheless able to copy it from his PDF and correctly insert the translations into the DOC. However, the customer then complained that there was FR text left untranslated in the DOC; in fact, simply the original text for items that I had now copied across and translated, which as I can't access it, I couldn't delete it.

The customer now wishes to re-assign the entire translation, on the basis of a few tens of words out of nearly 9,000 — and a slip which I immediately corrected once my attention had been drawn to it (and within the agreed deadline). This is why I strongly suspect a ploy to get out of paying.

So having explained the background, my technical question is, can anyone explain how text could be invisible to me in both a PDF file and a DOCX / DOC file converted from it, yet be visible by the customer on their system?

I have tried everything I can think of, like stripping out all formatting, globally changing the 'ink' colour to black, and showing 'hidden text'. What strikes me as odd is that the text isn't even visible in the initial PDF which is the ultimate 'original' file.

Although one of the missing items does occur at the bottom of a page, the others were in the middle of the pages — so I don't think it can be any kind of cropping problem...

Any comments or advice would be greatly appreciated, as I'm tearing my hair out here with a customer who is threatening not to pay me quite a large sum of money.


PS: Further investigation has showed that I can 'select all' and copy the text from the original PDF file, and that the missing items ARE present — but associated with something that looks like a hyperlink (which is not visible in either the original PDF or the DOC conversion of it). So it looks as if this text may have been invisible because it was 'buried' in a link.

PPS: Mystery solved!

In fact, the customer's PDF > DOC conversion was the culprit: the missing items (only) had for some unknown reason been converted into text boxes, and as I happened to have graphics objects hidden in my 'Options', I was unable to see them!



[Edited at 2012-10-08 11:58 GMT]
Collapse


 
Tina Vonhof (X)
Tina Vonhof (X)
Canada
Local time: 14:25
Dutch to English
+ ...
Compliments Oct 8, 2012

Hello Tony,

My compliments on your investigative talents! Persistence paid off for you.

I generally don't have much faith in converted documents: things can be left out or inserted or moved that can trip you up badly. I would much rather work with the original pdf-file, even if that means starting from scratch.

I hope your client will appreciate what you have done and pay you accordingly.


 
Tony M
Tony M
France
Local time: 22:25
Member
French to English
+ ...
TOPIC STARTER
SITE LOCALIZER
PDFs Oct 8, 2012

Thanks Tina!

Yes, I agree, and had I know the trouble it was going to cause, I'd have preferred to have done it myself.

It was in fact only after selecting, copying, and pasting the text from the PDF into a fresh DOC (the work of a couple of seconds) that I discovered the problem. If only the customer had done that in the first place, life would have been a lot simpler for both of us!


 
Peter Linton (X)
Peter Linton (X)  Identity Verified
Local time: 21:25
Swedish to English
+ ...
Similar experience Oct 8, 2012

I had a similar experience – I was offered a 3-page PDF file. I converted it to text, all looked straightforward. But the customer complained that I had omitted page 2. Investigation showed that pages 1 and 3 were text, but page 2 was a graphics snapshot of the text. Why anybody should do that is a mystery, but it does illustrate the potential hazards in PDF files.

 
Tony M
Tony M
France
Local time: 22:25
Member
French to English
+ ...
TOPIC STARTER
SITE LOCALIZER
I sympathize, Peter! Oct 8, 2012

It really can be a headache soemtimes, and as Tina says, it's often better just to start over!

What really caught me out here was that the original PDF did not contain any obvious graphics, so I wasn't expecting to find any text boxes. I did have a 'page number' one at the foot of each page, but for some reason, THOSE ones were visible OK, but not the 'main' ones with the important text in!

I often go down the OCR route, which can be a pain in the neck, as it can create
... See more
It really can be a headache soemtimes, and as Tina says, it's often better just to start over!

What really caught me out here was that the original PDF did not contain any obvious graphics, so I wasn't expecting to find any text boxes. I did have a 'page number' one at the foot of each page, but for some reason, THOSE ones were visible OK, but not the 'main' ones with the important text in!

I often go down the OCR route, which can be a pain in the neck, as it can create all sorts of bizarre formatting all over the place; however, simply copying-and-pasting the content (where the PDF file even allows it) can be just as bad, as I've often found that titles, headers, etc. get shifted out of order.

For some customers who are simply too lazy to supply me with the source files, I've had to start charging a supplement for all this pre-processing, where larer files are invovled.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word 2003 / W7 — file anomaly when converting from PDF?






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »