Home » Blog » PDF Tips » How to Extract Links, URL & HyperLinks from PDF File

How to Extract Links, URL & HyperLinks from PDF File

author
Published By Nilesh Kumar
Debasish Pramanik
Approved By Debasish Pramanik
Published On August 23rd, 2023
Reading Time 3 Minutes Reading
Category PDF Tips

A PDF file contains various media components like images, text, links, and videos. It makes the PDF file a versatile file format. Sometimes, users want to extract specific components from the PDF file. In this article, we will discuss different methods to extract links from PDF files. Whether you call them URL, hyperlinks, or web links, you can export them in the form of a separate file.

Manual Methods to Extract Links from PDF File

There is no such official method to extract hyperlinks from PDF file. Adobe does not provide any tool to do this. Therefore, users often use python and javascript to export url from PDF documents. Of course, non-programmers will have issues with these methods. So, we have tried to simplify the process as much as possible.

Here is the process to extract links from PDF file:

  1. Use Python with libraries like PyPDF2 or pdfplumber to extract text from the PDF.
  2. Then, use regular expressions to search for URLs within the extracted text. The `re` library in Python is useful for this.

The python code is given below:

import PyPDF2
import re
pdf_path = "your_pdf_file.pdf"

#1 Open the PDF file

pdf_file = PyPDF2.PdfFileReader(open(pdf_path, 'rb'))

#2 Extract text from the PDF

pdf_text = ""
for page_num in range(pdf_file.getNumPages()):
pdf_text += pdf_file.getPage(page_num).extractText()

#3 Use regular expression to find URLs

urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', pdf_text)

#4 Print the URLs

for url in urls:
print(url)

Best Way to Extract Hyperlinks from PDF File

There are third-party tools available for extracting links and URLs from PDF files, such as the PDF Extractor Software. This tool is easy to use and requires no programming knowledge. It can export your URLs in PDF, DOC, or DOCX file format. In fact, you can even fetch links from specific PDF pages using this tool. After all, it provides multiple page options:

  • All Pages
  • Odd Pages
  • Even Pages
  • Page Range
  • Page Number

Users can download the free demo version of this software from below:

Steps to Extract Links from PDF Files:

  • Firstly, download the software and install it on your computer.
  • Run the tool and import your PDF files with Add Files button.

import your pdf files

  • Select the destination path with the Change button.

select the destination path

  • In the Extract tab, click on the Hyperlink tab.

extract url from pdf

  • Check the Extract Hyperlinks box and choose the file format (PDF, DOC, DOCX) in which you want to export the hyperlinks.

choose the format to save the links

  • Now, select the pages from which you want to extract the links. You can choose all pages, specific pages, or page ranges.

select the PDF pages

  • Finally, click the Extract button to save the links in the destination path.

extract hyperlinks from pdf

If the PDF file has attachments, you can even extract attachments from PDF using this tool.

Conclusion

There are two methods to extract links from PDF documents. Users can choose the method that suits them best. But do remember that the effectiveness of these methods can vary depending on the complexity of the PDF file and how the links are embedded in the document. Links can also be hidden behind buttons or icons and may require more advanced techniques to extract them.

  author

By Nilesh Kumar

As a Chief Technical Writer, I know the technical issues faced by home and professional users. So, I decided to share all my knowledge via this blog. I love to help you with challenges while dealing with technical jargon.