apb/pdfbox

No description

basex pdf

Find a file

Andy Bunce 68a6b99b85 All checks were successful Run BaseX Tests / test (push) Successful in 1m2s Details Test BaseX Installation / test-basex (push) Successful in 28s Details [mod] use action		2025-02-12 15:26:51 +00:00
.gitea	[mod] use action	2025-02-12 15:26:51 +00:00
.github	[mod] use action	2025-02-12 15:26:51 +00:00
.vscode	[add] image save	2024-04-11 12:45:23 +01:00
jars	[fix] ignore	2025-02-03 14:34:48 +00:00
samples.pdf	[add] github workflow	2025-01-03 16:03:13 +00:00
scripts	[add]basex action	2025-02-12 11:49:50 +00:00
src	[fix] test	2025-02-10 18:01:04 +00:00
.gitignore	[mod] git ignore	2025-02-03 14:31:30 +00:00
.xqdoca	[mod] update to pdfbox 3.0.3	2025-01-25 22:19:46 +00:00
changelog.md	[mod] tidy	2025-02-10 17:17:30 +00:00
doc.md	[add] doc	2025-02-11 21:17:21 +00:00
LICENSE	[mod] back to v11	2025-02-10 12:24:34 +00:00
package.json	[fix] version	2025-02-10 21:49:32 +00:00
readme.md	[add] doc	2025-02-11 21:17:21 +00:00

readme.md

Pdfbox

A BaseX interface for the Apache Pdfbox library version 3.

The Apache PDFBox® library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents.

This interface is packaged in the Expath format. A test suite is available and workflow actions run this on BaseX 10.7 and 11.7.

Note

Currently (v0.1.5) works with BaseX 9.7, but this may change with future versions.

The Apache Pdfbox 3 FAQ may be useful.

Features

The features focus on extracting information from PDFs rather than creation or editing.

read PDF page count.
read any PDF outline and return as map(s) or XML.
read pagelabels.
read page text.
save pdf page range to a new pdf.
save image of rendered pdf page.

AI (Deepseek) generated documentation

Install

Pre-built pdfbox-x.y.z.zar files are available on the releases page. They can be installed using the standard respository functions or using the GUI.

Usage

import module namespace pdfbox="org.expkg_zone58.Pdfbox3";

pdfbox:with-pdf("...path/to/pdf.pdf",
 function($pdf){
  (1 to pdfbox:page-count($pdf))!pdfbox:page-text($pdf,.)
 }
)

Build

scripts/make-xar.xq packages the required jars and xqm files to a xar file in the dist folder.

Action support

The workflow ci-basex.yaml builds and tests the package. This can be used as an action on github, or on a local gitea installation.