How to Handle PDF in Python

Ashish Katri

10 months ago

Handle PDF in Python
Handle PDF in Python
We all must be familiar with PDFs. In fact, they are one of the most important and widely used digital media.  PDF stands for Portable Document Format. It uses .pdf extension.

Why Python for PDF processing?

As you know PDF processing comes under text analytics.
Most of the Text Analytics libraries or frameworks are designed in Python only. This gives leverage to text analytics. One more thing you can never process a pdf directly in existing frameworks of Machine Learning or Natural Language Processing. Unless they are proving an explicit interface for this, we have to convert pdf to text first.
So, in this article I will take you through some of the basic process of pdf in python.
Python can also read PDF files and print out the content after extracting the text from it.
For that we have to first install the required module which is PyPDF2. Below is the command to install the module. You should have pip already installed in your python environment.

pip install pypdf2
When it successfully gets installed, we can read PDF files using the methods available in the module.
import PyPDF2

pdfName = 'path\InsideAlMl.pdf'
read_pdf = PyPDF2.PdfFileReader(pdfName)
page = read_pdf.getPage(0)
page_content = page.extractText()
print page_content
When we run the above program, we get the following output
How to Do Face Animation Using Ai
(Mo-Cap, for short) is the process of recording with camera real-life movements of people for the purpose of recreating those exact movements in a computer-generated scene. As someone who is fascinated by the use of this tech in game development for creating animations, I was thrilled to see the massive improvements brought to this tech with the help of Deep Learning.
In this article, I want to share a quick overview of the recently published NeurIPS paper “First Order Motion Model for Image Animation” by A. Siarohin et. al. and demonstrate how its application to the Game Animation Industry will be “game-changing”.

Reading Multiple Pages

To read a pdf with multiple pages and print each of the pages with a page number we use the loop with getPageNumber() function. In the below example we the PDF file which has two pages. The contents are printed under two separate page headings.

import PyPDF2

pdfName = 'Path\Tutorialspoint2.pdf'
read_pdf = PyPDF2.PdfFileReader(pdfName)

for i in xrange(read_pdf.getNumPages()):
    page = read_pdf.getPage(i)
    print 'Page No - ' +
str(1+read_pdf.getPageNumber(page))
    page_content = page.extractText()
    print page_content
When we run the above program, we get the following output
Page No - 1 MotionScan Technology
It was way back in 2011 when the game L.A. Noire came out with absolutely amazing life-like facial animations that seemed so ahead of every other game. Now, almost a decade later, we still haven’t seen many other games come anywhere close to matching its level in terms of delivering realistic facial expressions.

Page No – 2
First Order Motion Model for Image Animation
full-text PDF: https://arxiv.org/pdf/2003.00196.pdf
In this research work, the authors present a Deep Learning Framework to create animations from a source image of a face, by following the motion of another face in a driving video, similar to the MotionScan technology. They propose a self-supervised a training method that can use unlabeled data-set of videos of a particular category to learn the important dynamics that define motion. Then, then show how these motion dynamics can be combined with a static image to generate a motion video.
I hope after you enjoyed reading this article and finally, you came to know about How to Handle PDF in Python.
For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.
Thanks for reading…
Happy Learning…

Submit Review

BOT
Agent(Online)
We're Online!

Chat now for any query