1
0
Fork 0
No description
Find a file
Andy Bunce 759b8c6c7e
All checks were successful
Run BaseX Tests / test (push) Successful in 34s
[fix] test
2025-02-10 18:01:04 +00:00
.gitea/workflows [fix] test 2025-02-10 18:01:04 +00:00
.github/workflows [mod] tidy 2025-02-10 17:17:30 +00:00
.vscode [add] image save 2024-04-11 12:45:23 +01:00
jars [fix] ignore 2025-02-03 14:34:48 +00:00
samples.pdf [add] github workflow 2025-01-03 16:03:13 +00:00
scripts [mod] change custom package.json key 2025-02-10 17:58:29 +00:00
src [fix] test 2025-02-10 18:01:04 +00:00
.gitignore [mod] git ignore 2025-02-03 14:31:30 +00:00
.xqdoca [mod] update to pdfbox 3.0.3 2025-01-25 22:19:46 +00:00
changelog.md [mod] tidy 2025-02-10 17:17:30 +00:00
LICENSE [mod] back to v11 2025-02-10 12:24:34 +00:00
package.json [mod] change custom package.json key 2025-02-10 17:58:29 +00:00
readme.md [mod] tidy 2025-02-10 17:17:30 +00:00

Pdfbox

A BaseX interface for Pdfbox version 3. It is packaged using the Expath format, and is tested against BaseX 10.7 and 11.7. Note: currently (v0.1.5) also works on V9.7

  • The Pdfbox 3 FAQ may be useful.

Features

The features focus on extracting information from PDFs rather than creation or editing.

  • read PDF page count.
  • read any PDF outline and return as map(s) or XML.
  • read pagelabels.
  • read page text.
  • save pdf page range to a new pdf.
  • save image of rendered pdf page.

Install

Pre-built pdfbox-x.y.z.zar files are available on the releases page. They can be installed using the standard respository functions or using the GUI.

Usage

import module namespace pdfbox="org.expkg_zone58.Pdfbox3";

pdfbox:with-pdf("...path/to/pdf.pdf",
 function($pdf){
  (1 to pdfbox:page-count($pdf))!pdfbox:page-text($pdf,.)
 }
)

Build

  • scripts/make-xar.xq packages the required jars and xqm files to a xar file in the dist folder.

Action support

The workflow ci-basex.yaml builds and tests the package. This can be used as an action on github, or on a local gitea installation.