PDF to xml conversion

—

Pending

💰 INR 12500–37500 👤 Unknown 🕒 6d ago status: new

Required Skills

XML PDF Image Processing Technical Documentation ABBYY FineReader Adobe Acrobat

Project Description

I need a meticulous PDF-to-XML specialist to take a series of text-heavy documents and turn them into clean, schema-compliant XML. Each PDF contains embedded images, complex tables, and numbered footnotes, so the markup must preserve every element’s position and hierarchy exactly as it appears in the source file. You will receive: • A folder of searchable PDFs (all English, 10–30 pages each) • The target XML structure / DTD • A brief style guide that shows how images, tables, and footnotes should be referenced Your task is to: 1. Extract content from each PDF without losing formatting or hidden characters. 2. Tag body text, headings, lists, images, tables, and footnotes to match the supplied DTD. 3. Run an internal QC pass so the delivered XML validates immediately on my end (I use Oxygen XML Editor for final checks). 4. Return a mirrored folder structure containing the finished XML files plus any linked image assets. Acceptance criteria • 100 % validation against the provided DTD • Images linked with correct file names and extension • Tables rendered as true table markup, not images • Footnotes cross-referenced to their in-text callouts • No stray fonts, soft line breaks, or OCR artefacts If you are comfortable using tools such as ABBYY FineReader, Acrobat, and Oxygen (or equivalents) and can commit to consistent, detail-focused output, I’d love to hear how quickly you can turn around the first batch and what workflow you prefer.

Actions

↗ View on Freelancer