Interesting Python Projects With Code for Beginners – Part 2

Gaurav Sharma Last Updated : 19 Jan, 2022

6 min read

This article was published as a part of the Data Science Blogathon.

Introduction

In my previous article, I discussed three python projects with codes and explained them in detail. Also gave you some examples which you can try. All these projects were beginner-friendly. This time, we will look at some more python projects with codes again. And the more projects you will make, the more you will get better in the programming and the language.

Python Projects with Codes

Image Source: https://realpython.com

Let’s get started!

1. Text Extraction using OpenCV and OCR

OpenCV is a library of programming functions used mainly for computer vision tasks. With this, you can process images, resize images, object detection, etc. We will see how to extract text in a snap using contours.

Install these:

pip install pytesseract

pip install opencv-python

Python-tesseract is Google’s Tessaract-OCR engine used to get text from images. You will need this to execute a tesseract file and Download it from here.

Now let’s begin with the text extractions step by step:

1. Convert the image to Gray using cv2.COLOR_BGR2GRAY.

cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)

2. Finding contours in the image:

To find contours use cv2.findContours(). It takes three parameters: the source image, contour retrieval mode, contour approximation method. This will return a python list of all contours. Contour is nothing but a NumPy array of (x,y) coordinates of boundary points in the object.

3. Apply OCR.

By looping through each contour, take x,y and width, height using cv2.boundingRect() function. Then draw a rectangle function in image using cv2.rectange(). This has five parameters: input image, (x, y), (x+w, y+h), boundary colour for rectangle, size of the boundary.

4. Crop the rectangular region and pass that to tesseract to extract text. Save your content in a file by opening it in append mode.

For more details, go through code comments also.

Code:

import cv2
import pytesseract
# path to Tesseract-OCR in your computer
pytesseract.pytesseract.tesseract_cmd = 'path_to_tesseract.exe'
img = cv2.imread("input.png") #input image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)    # Converting image to gray scale
# performing OTSU threshold
ret, img_thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)

# give structure shape and kernel size
# kernel size increases or decreases the area of the rectangle to be detected.
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18))
#dilation on the threshold image
dilation = cv2.dilate(img_thresh , rect_kernel, iterations = 1)
img_contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL,
                                                cv2.CHAIN_APPROX_NONE)
im2 = img.copy()
file = open("Output.txt", "w+") #text file to save results
file.write("")
file.close()
#loop through each contour
for contour in img_contours:
    x, y, w, h = cv2.boundingRect(contour)
    rect = cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cropped_image = im2[y:y + h, x:x + w] #crop the text block
    file = open("Output.txt", "a")
    text = pytesseract.image_to_string(cropped_image) #applying OCR
    file.write(text)
    file.write("n")
    file.close()

Input image:

Python Projects with Codes Output image:

Python Projects with Codes

2. Convert your PDF File to Audio Speech

Say you have some book as PDF to read, but you are feeling too lazy to scroll; how good it would be then if that PDF is converted to an audiobook. So, let’s implement this using python.

We will need these two packages:

pyttsx3: It is for Text to Speech, and it will help the machine speak.

PyPDF2: It is a PDF toolkit. It is capable of extracting document information, merging documents, etc.

Install them using these commands:

pip install pyttsx3
pip install PyPDF2

Steps:

Import the required modules.
Use PdfFileReader() to read PDF file.
getPage() method is used to select the page to be read from.
Extract the text using extract text().
By using pyttx3, speak out the text.

Code:

# import the modules
import PyPDF2
import pyttsx3
  
# path of your PDF file
path = open('Book.pdf', 'rb')
  
# PdfFileReader object
pdfReaderObj = PyPDF2.PdfFileReader(path)
  
# the page with which you want to start
from_page = pdfReaderObj.getPage(12)
content = from_page.extractText()
# reading the text
speak = pyttsx3.init()
speak.say(content)
speak.runAndWait()

That’s it! It will do the job. This small code is beneficial to you when you don’t want to read; you can hear.

Next, you can provide a GUI to this project using tikinter or anything else. You can give a GUI to enter the pdf path, the page number to start from, a stop button. Try this!

Let’s move to the next project.

3. Reading mails and downloading attachments from the mailbox

Let’s understand what the benefit of reading the mailbox with Python is. So, let’s suppose if we are working on a project where some data comes daily in word or excel, which is required for the script as input or to Machine learning model as input. So, if you have to download this data file daily and give it to the hand, it will be hectic. But if we can automate this step, read this file, and download the required attachment, it would be a great help. So, let’s implement this.

We will use pywin32 to implement automatic attachment download from a particular mail. It can access Windows applications like Excel, PowerPoint, Word, Outlook, etc., to perform some actions. We will focus on Outlook and download attachments from the outlook mailbox.

Note: This does not need authentication like user email id or password. It can access Outlook that is already logged in to your machine. (Keep the outlook app open while running the script).

In the above example, we chose smtplib because it can only send emails and not download attachments. So, we will go with pywin32 to download attachments from Outlook, and it will be pretty straightforward. Let’s look at the code.

Command to install: pip install pywin32

Import module

import win32com.client

Now, establish a connection to Outlook.

outlook = win32com.client.Dispatch(“Outlook.Application”).GetNamespace(“MAPI”)

Let’s try to access Inbox:

inbox = outlook.GetDefaultFolder(number)

This function takes a number/integer as input which will tell the index of the inbox folder in our outlook app.

To check the index of all folders, just run this code snippet:

import win32com.client
outlook=win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
for i in range(50):
  try:
    box = outlook.GetDefaultFolder(i)
    name = box.Name
    print(i, name)
  except:
    pass

Output:

3 Deleted Items
4 Outbox
5 Sent Items
6 Inbox
9 Calendar

As you can see in the output Inbox index is 6. So we will use 6 in the function.

inbox = outlook.GetDefaultFolder(6)

If you want to print the subject of all the emails in the inbox, use this:

messages = inbox.Items
# get the first email
message = messages.GetFirst()
# to loop through all the email in the inbox 
while True:
  try:
    print(message.subject) # get the subject of the email
    message = messages.GetNext() 
  except:
    message = messages.GetNext()

There are other properties also like “message. subject”, “message. senton”, which can be used accordingly.

Downloading Attachment

If you want to print all the names of attachments in a mail:

for attachment in message.Attachments:
    print(attachment.FileName)

Let’s download an attachment (an excel file with extension .xlsx) from a specific sender.

import win32com.client
import re
import os
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
message = messages.GetFirst()
while True:
  try:
    if re.search('Data Report', str(message.Subject).lower()) != None and  re.search("ABC prasad", str(message.Sender).lower()) != None:
      attachments = message.Attachments
      for attachment in message.Attachments:
        if ".xlsx" in attachment.FileName or ".XLSX" in attachment.FileName:
         attachment_name = str(attachment.FileName).lower()
        attachment.SaveASFile(os.path.join(download_folder_path, attachment_name))
    else:
      pass
    message = messages.GetNext()
  except:
    message = messages.GetNext()
exit

Explanation

This is the complete code to download an attachment from Outlook inbox. Inside try block, you can change conditions. For example, I am searching for those mails which have subjects such as Data Report and Sender name “ABC prasad”. So, it will iterate from the first mail in the inbox, and if the condition gets true, it will then look if that particular mail has an attachment with the extension .xlsx or .XLSX. So you can change all these things subject, sender, file type and download the file you want. Once it finds the file, it is saved to a path given as “download_folder_path”.

End Notes

We discussed three projects in a previous article and three in this article. I hope these python projects with codes helped you to polish your skill set. Just do some hands-on and try these; you will enjoy coding them. I hope you find this article helpful. Let’s connect on Linkedin.

Thanks for reading 🙂

Happy coding!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Gaurav Sharma

Love Programming, Blog writing and Poetry

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Interesting Python Projects With Code for Beginners – Part 2

Introduction

1. Text Extraction using OpenCV and OCR

2. Convert your PDF File to Audio Speech

3. Reading mails and downloading attachments from the mailbox

Downloading Attachment

Explanation

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

Interesting Python Projects With Code for Beginners – Part 2

Introduction

1. Text Extraction using OpenCV and OCR

2. Convert your PDF File to Audio Speech

3. Reading mails and downloading attachments from the mailbox

Downloading Attachment

Explanation

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques