Use or its successor: A Pure-Python library built as a PDF toolkit. It is capable of: * splitting documents page by page, * merging documents page by page, (and much more) Here's a sample program that works with both versions. #!/usr/bin/env python import sys try: from PyPDF2 import PdfFileReader, PdfFileWriter except ImportError: from pyPdf import PdfFileReader, PdfFileWriter def pdf_cat(input_files, output_stream): input_streams = [] try: # First open all the files, then produce the output file, and # finally close the input files. This is necessary because # the data isn't read from the input files until the write # operation. Thanks to # for input_file in input_files: input_streams.append(open(input_file)) writer = PdfFileWriter() for reader in map(PdfFileReader, input_streams): for n in range(reader.getNumPages()): writer.addPage(reader.getPage(n)) writer.write(output_stream) finally: for f in input_streams: f.close() if __name__ == '__main__': pdf_cat(sys.argv[1:], sys.stdout). The newer library has a PdfMerger class, which can be used like so. Example: from PyPDF2 import PdfFileMerger pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf'] merger = PdfFileMerger() for pdf in pdfs: merger.append(open(pdf, 'rb')) with open('result.pdf', 'wb') as fout: merger.write(fout) The append method seems to require a lazy file object. That is it doesn't read the file immediately. It seems to wait until the write method is invoked. If you use a scoped open (i.e. Clinical cardiac electrophysiology josephson pdf merge. With) it appends blank pages to the resultant file, as the input file is closed at that point. The easiest way to avoid this if file handle lifetime is an issue, is to pass append file name strings and allow it to handle file lifetime. From PyPDF2 import PdfFileMerger pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf'] merger = PdfFileMerger() for pdf in pdfs: merger.append(pdf) merger.write('result.pdf'). Is it possible, using Python, to merge seperate PDF files? Here,, gives an solution. Similarly: from pyPdf import PdfFileWriter, PdfFileReader def append_pdf(input,output): [output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)] output = PdfFileWriter() append_pdf(PdfFileReader(file('C: sample.pdf','rb')),output) append_pdf(PdfFileReader(file('c: sample1.pdf','rb')),output) append_pdf(PdfFileReader(file('c: sample2.pdf','rb')),output) append_pdf(PdfFileReader(file('c: sample3.pdf','rb')),output) output.write(file('c: combined.pdf','wb')). The can do this quite easily, assuming you don't need to preserve bookmarks and annotations, and your PDFs aren't encrypted. Is an example concatenation script, and is an example page subsetting script. The relevant part of the concatenation script -- assumes inputs is a list of input filenames, and outfn is an output file name: from pdfrw import PdfReader, PdfWriter writer = PdfWriter() for inpfn in inputs: writer.addpages(PdfReader(inpfn).pages) writer.write(outfn) As you can see from this, it would be pretty easy to leave out the last page, e.g. Something like: writer.addpages(PdfReader(inpfn).pages[:-1]) Disclaimer: I am the primary pdfrw author.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |