This toolbox exports content from a Confluence instance (Cloud or Data Center) into a static, navigable HTML archive and converts it into professional, hierarchical PDF documents. Key Features:
- Visual Fidelity: Fetches rendered HTML (
export_view) to preserve macros, layouts, and formatting. - Navigation: Injects a fully functional, static navigation sidebar into every HTML page.
- Offline Browsing: Localizes images and links, and downloads all attachments (PDFs, Office docs, etc.) for complete offline access.
- Sort Order: Recursively scans the tree to ensure the manual sort order from Confluence is preserved.
- Metadata Injection: Automatically adds Page Title, Author, and Modification Date to the top of every page.
- Versioning: Creates timestamped output folders (e.g.,
2025-11-21 1400 Space IT) for clean history management. - Professional PDF: Merges the content into a single PDF with TOC, Bookmarks, and mixed Portrait/Landscape orientation.
confluenceDumpToHTML.py: The main downloader. Connects to Confluence, scrapes content, and creates the folder structure.htmlToDoc.py: The publisher. Converts the downloaded HTML folder into a single PDF or a Master-HTML file for LLMs.confluence_products.ini: Configuration file for API URLs (Cloud vs. Data Center).styles/: Contains CSS files.site.css(if present) is applied automatically.pdf_settings.cssconfigures the PDF layout (A4/Letter, Margins).
Follow these steps to create your first PDF export of a single page tree.
Install requirements and set your credentials.
pip install -r requirements.txt
# Windows Users: Install GTK3 Runtime for PDF generation!
Linux/Mac:
export CONFLUENCE_TOKEN="YourPersonalAccessToken"
Windows (Powershell):
$env:CONFLUENCE_TOKEN="YourPersonalAccessToken"
Run the dumper for a specific page tree. This will create a new folder in output/.
# Example for Data Center
python3 confluenceDumpToHTML.py --base-url "[https://confluence.corp.com](https://confluence.corp.com)" --profile dc --context-path "/wiki" -o "./output" tree -p "123456"
Look into the output folder. You will see a new folder like 2025-01-27 0900 My Page Title. Use this path for the PDF generator.
python3 htmlToDoc.py --site-dir "./output/2025-01-27 0900 My Page Title" --pdf
Result: You now have a ... .pdf inside that folder.
This script supports both Confluence Cloud and Confluence Data Center.
⚠️ Note on Cloud Verification: The Cloud support has been ported to the new architecture but was primarily developed and tested against a Confluence Data Center environment.
Define API paths in confluence_products.ini. Authentication is handled via Environment Variables:
- Cloud:
CONFLUENCE_USER(Email) andCONFLUENCE_TOKEN(API Token). - Data Center:
CONFLUENCE_TOKEN(Personal Access Token).⚠️ Troubleshooting Note for Data Center: If authentication fails, ensure you are connected to the VPN and that your admin allows Personal Access Tokens (PAT).
Downloads pages, builds the index, and creates a clean HTML base.
python3 confluenceDumpToHTML.py [OPTIONS] <COMMAND> [ARGS]
space: Dumps an entire space. (-sp SPACEKEY)tree: Dumps a specific page and its descendants. (-p PAGEID)single: Dumps a single page. (-p PAGEID)label: "Forest Mode". Dumps all pages with a specific label as root trees. (-l LABEL)- Use
--exclude-labelto prune specific subtrees (e.g. 'archived').
- Use
all-spaces: Dumps all visible spaces.
-o,--outdir: Base output directory.-t,--threads: Number of download threads (e.g.,-t 8).--css-file: Path to custom CSS (applied after standard styles).
The Problem: Some Confluence pages (e.g. complex Table Filters) fail to render via API due to server-side timeouts or heavy client-side JavaScript. The Solution:
- Open the page in Chrome/Edge.
- Save as "Webpage, Single File (*.mhtml)".
- Save it as
manual_overrides/[PageID].mhtml. - Run the dumper with
--manual-overrides-dir "./manual_overrides". The script will extract the rendered state from the MHTML, clean it, and inject it into the pipeline.
Allows re-organizing the structure (Index) locally without touching Confluence.
- Generate Editor:
python3 create_editor.py --site-dir "./output/2025-01-01 Space IT" - Edit: Open
editor_sidebar.html. Use Drag & Drop to move pages/folders. - Save: Click "Copy Markdown", paste into
sidebar_edit.md. - Apply:
python3 patch_sidebar.py --site-dir "./output/2025-01-01 Space IT"
Converts the dumped pages into a single document.
python3 htmlToDoc.py --site-dir "./output/2025-01-01 Space IT" --pdf
--pdf: Generate PDF (via WeasyPrint).--html: Generate single-file Master HTML (for LLM context windows).--preview: Generate debug HTML (linked to local CSS).
The layout is controlled by CSS files in the styles/ folder of your export:
pdf_settings.css: Configure Page Size (A4/Letter), Orientation, and Margins.site.css: General styles (detected automatically).