OCRmyPDF wandelt unter Bullseye nicht in PDF/A um

alerce · Beitrag von **alerce** » 19.08.2021 05:48:10

Unter Buster wurden beliebige pdf-Dateien
entweder mit

ocrmypdf input.pdf output.pdf

oder mit

ocrmypdf --force-ocr input.pdf output.pdf

in PDF/A umgewandelt.

Das funktioniert unter Bullseye nicht mehr:

Code: Alles auswählen

/media/festplatte/Arbeitsfläche$ ocrmypdf --force-ocr input.pdf output.pdf
Scanning contents: 100%|███████████████████████| 1/1 [00:00<00:00, 126.30page/s]
Using Tesseract OpenMP thread limit 3
    1 page already has text! - rasterizing text and running OCR anyway          
OCR: 100%|██████████████████████████████████| 1.0/1.0 [00:01<00:00,  1.86s/page]
Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
Optimize ratio: 1.00 savings: 0.0%
Output file is a PDF/A-2B (as expected)
The output file size is 4.68× larger than the input file.
Possible reasons for this include:
The argument --force-ocr was issued, causing transcoding.
The optional dependency 'jbig2' was not found, so some image optimizations could not be attempted.

Zwar wird bestätigt: Output file is a PDF/A-2B (as expected)

Öffnet man die Datei und schaut sich die Eigenschaften an, so ergibt sich PDF-1.7. Auch eine Überprüfung mit einem Online-Tool zeigt nichts anderes: https://www.pdfen.com/pdf-a-validator

Zur Installation von OCRmyPDF: https://wiki.ubuntuusers.de/OCRmyPDF/