Python Playwright export html file as single page PDF

2 min read 29-09-2024
Python Playwright export html file as single page PDF


In today's fast-paced digital world, it's common to require converting web pages into PDF format for easier sharing, printing, or archiving. Python's Playwright library, designed for browser automation, provides a seamless way to accomplish this. In this article, we will discuss how to export an HTML file as a single-page PDF using Playwright in Python.

Understanding the Problem

The core issue here is the need to convert an HTML file into a single-page PDF using Python Playwright. Below is an example of how this can be achieved using Playwright's capabilities.

from playwright.sync_api import sync_playwright

def export_html_to_pdf(url, output_file):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto(url)
        page.pdf(path=output_file, format='A4')
        browser.close()

# Example usage:
export_html_to_pdf('file:///path/to/your/file.html', 'output.pdf')

Explanation of the Code

  • Importing Playwright: The script begins by importing the necessary modules from Playwright. We utilize sync_playwright for synchronous operations.

  • Defining the Function: The export_html_to_pdf function takes in two parameters: url, which is the location of the HTML file, and output_file, where the PDF will be saved.

  • Launching the Browser: Inside the function, we launch a Chromium browser instance.

  • Creating a New Page: A new page is created in the browser, where we navigate to the HTML file using the goto method.

  • Generating the PDF: Finally, we call the pdf method to export the current page as a PDF file. The path parameter specifies the filename and the format can be defined to control the dimensions of the PDF.

  • Closing the Browser: After the export is complete, we close the browser.

Additional Insights

  1. Handling Local HTML Files: Ensure the HTML file's path is absolute when using the file:// protocol. For example, replace /path/to/your/file.html with the actual path on your system.

  2. Customizing PDF Options: The page.pdf() method offers various options, including:

    • format: Set the paper format (e.g., A4, Letter).
    • margin: Define the margins (e.g., 'top' '10px', 'bottom': '10px').
    • print_background: A boolean to include background graphics.
  3. Asynchronous Version: If you prefer an asynchronous approach, consider using async_playwright() with async functions to improve performance.

Practical Example

Imagine you have a company report in HTML format that you'd like to distribute to stakeholders. By utilizing the above code, you can create a professional-looking PDF in just a few lines. For instance:

export_html_to_pdf('file:///home/user/documents/report.html', 'company_report.pdf')

Conclusion

Exporting an HTML file to a single-page PDF with Python Playwright is straightforward, allowing for easy conversion and dissemination of web content. The ability to customize the output makes this method especially useful for developers and content creators alike.

Useful Resources

By following the steps outlined in this article, you can efficiently convert HTML files to PDF format, enhancing your productivity and the quality of your deliverables. Happy coding!