1
0
Fork 0
pdfbox/readme.md
Andy Bunce 9f0bed7cd8
Some checks failed
Run BaseX Tests / test (push) Failing after 28s
[mod] tidy
2025-02-10 17:17:30 +00:00

1.3 KiB

Pdfbox

A BaseX interface for Pdfbox version 3. It is packaged using the Expath format, and is tested against BaseX 10.7 and 11.7. Note: currently (v0.1.5) also works on V9.7

  • The Pdfbox 3 FAQ may be useful.

Features

The features focus on extracting information from PDFs rather than creation or editing.

  • read PDF page count.
  • read any PDF outline and return as map(s) or XML.
  • read pagelabels.
  • read page text.
  • save pdf page range to a new pdf.
  • save image of rendered pdf page.

Install

Pre-built pdfbox-x.y.z.zar files are available on the releases page. They can be installed using the standard respository functions or using the GUI.

Usage

import module namespace pdfbox="org.expkg_zone58.Pdfbox3";

pdfbox:with-pdf("...path/to/pdf.pdf",
 function($pdf){
  (1 to pdfbox:page-count($pdf))!pdfbox:page-text($pdf,.)
 }
)

Build

  • scripts/make-xar.xq packages the required jars and xqm files to a xar file in the dist folder.

Action support

The workflow ci-basex.yaml builds and tests the package. This can be used as an action on github, or on a local gitea installation.