[mod] docs
This commit is contained in:
parent
0ae74baba3
commit
4bcac97ae0
4 changed files with 40 additions and 41 deletions
57
doc.md
57
doc.md
|
@ -45,18 +45,12 @@ import module namespace pdfbox="org.expkg_zone58.Pdfbox3";
|
|||
---
|
||||
|
||||
### Opening a PDF Document
|
||||
To open a PDF document, use the `pdfbox:open` function. This function can handle local files, URLs, or binary data.
|
||||
To open a PDF document, use the `pdfbox:open` function. This function can handle local files, URLs, or binary data.
|
||||
|
||||
```xquery
|
||||
let $pdf := pdfbox:open("path/to/document.pdf")
|
||||
```
|
||||
|
||||
If the PDF is encrypted, you can provide a password:
|
||||
|
||||
```xquery
|
||||
let $pdf := pdfbox:open("path/to/encrypted.pdf", map{"password": "your_password"})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Closing a PDF Document
|
||||
|
@ -109,15 +103,19 @@ let $author := pdfbox:property($pdf, "author")
|
|||
```
|
||||
|
||||
Supported properties include:
|
||||
- `pageCount`: Number of pages.
|
||||
- `title`: Document title.
|
||||
- `#bookmarks` :Number of bookmarks
|
||||
- `#labels` :Number of labels
|
||||
- `#pages` : Number of pages
|
||||
- `author`: Document author.
|
||||
- `creator`: Document creator.
|
||||
- `producer`: Document producer.
|
||||
- `subject`: Document subject.
|
||||
- `keywords`: Document keywords.
|
||||
- `creationDate`: Document creation date.
|
||||
- `modificationDate`: Document modification date.
|
||||
- `creator`: Document creator.
|
||||
- `keywords`: Document keywords.
|
||||
- `labels`: Document labels formated as a string.
|
||||
`modificationDate`: Document modification date.
|
||||
- `producer`: Document producer.
|
||||
- `specification` PDF spec version used in the document.
|
||||
- `subject`: Document subject.
|
||||
- `title`: Document title.
|
||||
|
||||
---
|
||||
|
||||
|
@ -133,22 +131,15 @@ The outline is returned as a sequence of maps, where each map represents a bookm
|
|||
---
|
||||
|
||||
### Saving a PDF Document
|
||||
To save a PDF document to the filesystem, use the `pdfbox:save` function.
|
||||
To save a PDF document to the filesystem, use the `pdfbox:pdf-save` function.
|
||||
|
||||
```xquery
|
||||
let $savedPath := pdfbox:save($pdf, "path/to/save/document.pdf")
|
||||
let $savedPath := pdfbox:pdf-save($pdf, "path/to/save/document.pdf")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Handling Encrypted PDFs
|
||||
If the PDF is encrypted, you can provide a password when opening the document.
|
||||
|
||||
```xquery
|
||||
let $pdf := pdfbox:open("path/to/encrypted.pdf", map{"password": "your_password"})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
|
@ -165,7 +156,7 @@ let $labels := pdfbox:labels($pdf)
|
|||
To get the size of a specific page, use the `pdfbox:page-media-box` function.
|
||||
|
||||
```xquery
|
||||
let $size := pdfbox:page-media-box($pdf, 1) (: Get size of page 1 :)
|
||||
let $size := pdfbox:page-media-box($pdf, 1) (: Get size of page 0, the cover :)
|
||||
```
|
||||
|
||||
---
|
||||
|
@ -177,10 +168,17 @@ You can generate a CSV-style report of properties for multiple PDFs using the `p
|
|||
let $report := pdfbox:report(("path/to/doc1.pdf", "path/to/doc2.pdf"))
|
||||
```
|
||||
|
||||
The report includes properties like `title`, `author`, `pageCount`, etc., for each PDF.
|
||||
The report includes all properties by default, such as `title`, `author`, `#pages` , etc., for each PDF.
|
||||
|
||||
---
|
||||
## Advanced Usage
|
||||
|
||||
### Handling Encrypted PDFs
|
||||
If the PDF is encrypted, you can provide a password when opening the document.
|
||||
|
||||
```xquery
|
||||
let $pdf := pdfbox:open("path/to/encrypted.pdf", map{"password": "your_password"})
|
||||
```
|
||||
## Error Handling
|
||||
The library includes error handling to manage issues such as failed PDF loads or unsupported operations. Errors are thrown with descriptive messages to help diagnose problems.
|
||||
|
||||
|
@ -194,12 +192,3 @@ try {
|
|||
```
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
The `Pdfbox3.xqm` library, distributed as a XAR file with included PDFBox JAR files, provides a comprehensive interface for working with PDF documents in XQuery. By leveraging Apache PDFBox, it offers powerful features for text extraction, image rendering, and document manipulation. With this guide, you should be able to integrate PDF processing into your XQuery applications effectively.
|
||||
|
||||
For more detailed information, refer to the [Apache PDFBox documentation](https://pdfbox.apache.org/docs/3.0.0/javadocs/) and the [BaseX documentation](https://docs.basex.org/).
|
||||
|
||||
---
|
||||
|
||||
This user guide provides a starting point for using the `Pdfbox3.xqm` library. For further assistance, consult the official documentation or reach out to the community for support.
|
|
@ -51,6 +51,11 @@ pdfbox:with-pdf("...path/to/pdf.pdf",
|
|||
* `scripts/make-xar.xq` packages the required `jar`s and `xqm` files to a `xar` file in the `dist` folder.
|
||||
|
||||
The `package.json` is (ab)used as a configuration source. Non standard information is held in the `expkg_zone58` section. This is experimental and may change.
|
||||
|
||||
`package.json` contains script to run
|
||||
1. The XAR build.
|
||||
2. The tests
|
||||
3. The documentation
|
||||
### Action support
|
||||
|
||||
The workflow `ci-basex.yaml` builds and tests the package. This can be used as an action on [github](https://github.com/features/actions), or on a local [gitea](https://docs.gitea.com/usage/actions/overview) or [forgejo](https://forgejo.org/) installation.
|
||||
|
|
|
@ -1,9 +1,13 @@
|
|||
# Example PDFs with pageLabels and outlines
|
||||
|
||||
## Sources
|
||||
* [BaseX100.pdf](https://files.basex.org/releases/10.0/BaseX100.pdf)
|
||||
* [icelandic-dictionary.pdf](http://css4.pub/2015/icelandic/dictionary.pdf)
|
||||
* [page-numbers.pdf](https://www.w3.org/WAI/WCAG22/working-examples/pdf-page-numbers/page-numbers).
|
||||
* [page-numbers-password.pdf](https://www.w3.org/WAI/WCAG22/working-examples/pdf-page-numbers/page-numbers).
|
||||
* [Sentience-in-Cephalopod-Molluscs-and-Decapod-Crustaceans](https://www.lse.ac.uk/News/News-Assets/PDFs/2021/Sentience-in-Cephalopod-Molluscs-and-Decapod-Crustaceans-Final-Report-November-2021.pdf)
|
||||
* [Legal RAG Hallucinations](https://law.stanford.edu/wp-content/uploads/2024/05/Legal_RAG_Hallucinations.pdf)
|
||||
| Name | bookmarks | labels | password |source |
|
||||
|------|-----------|--------|----------|---|
|
||||
|[BaseX100.pdf](BaseX100.pdf)||☑||https://files.basex.org/releases/10.0/BaseX100.pdf|
|
||||
|[icelandic-dictionary.pdf](icelandic-dictionary.pdf)|☑|| |http://css4.pub/2015/icelandic/dictionary.pdf|
|
||||
|[page-numbers.pdf](https://www.w3.org/WAI/WCAG22/working-examples/pdf-page-numbers/page-numbers)||☑||https://www.w3.org/WAI/WCAG22/working-examples/pdf-page-numbers/page-numbers|
|
||||
|[page-numbers-password.pdf](page-numbers-password.pdf)||☑|☑(password)|https://www.w3.org/WAI/WCAG22/working-examples/pdf-page-numbers/page-numbers|
|
||||
|[Sentience-in-Cephalopod-Molluscs-and-Decapod-Crustaceans](Sentience-in-Cephalopod-Molluscs-and-Decapod-Crustaceans.pdf)|☑|||https://www.lse.ac.uk/News/News-Assets/PDFs/2021/Sentience-in-Cephalopod-Molluscs-and-Decapod-Crustaceans-Final-Report-November-2021.pdf|
|
||||
|[Legal RAG Hallucinations](Legal_RAG_Hallucinations.pdf)|☑|||https://law.stanford.edu/wp-content/uploads/2024/05/Legal_RAG_Hallucinations.pdf|
|
||||
|
||||
|
||||
|
|
|
@ -177,7 +177,8 @@ declare %private variable $pdfbox:property-map:=map{
|
|||
"modificationDate": (PDDocument:getDocumentInformation#1,
|
||||
PDDocumentInformation:getModificationDate#1,
|
||||
pdfbox:gregToISO#1),
|
||||
"labels": pdfbox:labels-as-strings#1
|
||||
|
||||
"labels": pdfbox:labels-as-string#1
|
||||
};
|
||||
|
||||
(:~ Defined property names, sorted :)
|
||||
|
|
Loading…
Add table
Reference in a new issue