pymupdf extract all text from pdf

Solutions on MaxInterview for pymupdf extract all text from pdf by the best coders in the world

showing results for - "pymupdf extract all text from pdf"

12 May 2019

1import sys, fitz
2fname = sys.argv[1]  # get document filename
3doc = fitz.open(fname)  # open document
4out = open(fname + ".txt", "wb")  # open text output
5for page in doc:  # iterate the document pages
6    text = page.get_text().encode("utf8")  # get plain text (is in UTF-8)
7    out.write(text)  # write text of page
8    out.write(bytes((12,)))  # write page delimiter (form feed 0x0C)
9out.close()

source