本文介绍使用python提取pdf中的表格到excel中,包含pdf的拆分、pdf提取到excel、合并excel。
一、拆分pdf
将一个大的pdf按页数拆分为多个小的pdf:
# pip install PyPDF2import os, pdfplumber, PyPDF2# 分割pdf
def split_pdf(input_pdf_path, num_splits):# Create a PDF reader objectpdf_reader = PyPDF2.PdfReader(open(input_pdf_path, 'rb'))total_pages = len(pdf_reader.pages)# Calculate the number of pages per splitpages_per_split = total_pages // num_splits# Get the directory and base name of the input PDFbase_dir = os.path.dirname(input_pdf_path)base_name = os.path.splitext(os.path.basename(input_pdf_path))[0]