Mastering Image and Video Segmentation with SAM 2

Soumyadarshan Dash Last Updated : 10 Feb, 2025

10 min read

This guide will walk you through what Segment Anything Model 2 is, how it works, and how you’ll utilize it to portion objects in pictures and videos. It offers state-of-the-art execution and adaptability in fragmenting objects into pictures, making it an important resource for a assortment of computer vision applications. This directly points to supplying a nitty-gritty, step-by-step walkthrough for setting up and utilizing SAM 2 to perform picture division. By taking this direct, you will be able to produce division covers for pictures utilizing both box and point prompts.

Learning Objectives

Describe the key features and applications of the Segment Anything Model 2 SAM 2 in image and video segmentation.
Successfully configure a CUDA-enabled environment, install necessary dependencies, and clone the Segment Anything Model 2 repository for image segmentation tasks.
Apply SAM 2 to generate segmentation masks for images using both box and point prompts and visualize the results effectively.
Evaluate how SAM 2 can revolutionize photo and video editing by enabling real-time segmentation, automating complex tasks, and democratizing content creation for a broader audience.

This article was published as a part of the Data Science Blogathon.

Prerequisites
What is SAM 2?
Setting Up and Utilizing SAM 2 for Image Segmentation
Key Points to Remember When Working SAM 2
Impressive Potential of SAM 2
Conclusion
Frequently Asked Questions

Prerequisites

Some time recently you begin, guarantee you’ve got a CUDA-enabled GPU for quicker handling. Also, verify that you have Python installed on your machine. This guide assumes you have some basic knowledge of Python and image processing concepts.

What is SAM 2?

Segment Anything Model 2 is an progressed instrument for picture division created by Facebook AI Inquire about (Reasonable). On July 29th, 2024, Meta AI discharged SAM 2, an progressed picture and video division establishment show. SAM 2 empowers clients to supply focuses or boxes in an picture or video to create division covers for particular objects.

Click here to access it

Key Features of SAM 2

Advanced Mask Generation: SAM 2 generates high-quality segmentation masks based on user inputs, such as points or bounding boxes.
Flexibility: The model supports both image and video segmentation.
Speed and Efficiency: With CUDA support, SAM 2 can perform segmentation tasks rapidly, making it suitable for real-time applications.

Core Components of SAM 2

Image Encoder: Encodes the input image for processing.
Prompt Encoder: Converts user-provided points or boxes into a format the model can use.
Mask Decoder: Generates the final segmentation mask based on the encoded inputs.

Applications of SAM 2

Let us now look into the applications of SAM 2 below:

Photo and Video Editing: SAM 2 allows for precise object segmentation, enabling detailed edits and creative effects in photos and videos.
Autonomous Vehicles: In autonomous driving, SAM 2 can be used to identify and track objects like pedestrians, vehicles, and road signs in real-time.
Medical Imaging: SAM 2 can assist in segmenting anatomical structures in medical images, aiding in diagnostics and treatment planning.

What is Image Segmentation?

Image segmentation is a computer vision technique that involves dividing an image into multiple segments or regions to simplify its analysis. Each segment represents a different object or part of an object within the image, making it easier to identify and analyze specific elements.

Types of Image Segmentation

Semantic Segmentation: Classifies each pixel into a predefined category.
Instance Segmentation: Differentiates between different instances of the same object category.
Panoptic Segmentation: Combines semantic and instance segmentation.

Setting Up and Utilizing SAM 2 for Image Segmentation

We’ll guide you through the process of setting up the Segment Anything Model 2 (SAM 2) in your environment and utilizing its powerful capabilities for precise image segmentation tasks. From ensuring your GPU is ready to configuring the model and applying it to real images, each step will be covered in detail to help you harness the full potential of SAM 2.

Step 1: Check GPU Availability and Set Up the Environment

First, let’s ensure that your environment is properly set up, starting with checking for GPU availability and setting the current working directory.

# Check GPU availability and CUDA version
!nvidia-smi
!nvcc --version

# Import necessary modules
import os

# Set the current working directory
HOME = os.getcwd()
print("HOME:", HOME)

Explanation

!nvidia-smi and !nvcc –version: These commands check if your framework incorporates a CUDA-enabled GPU and show the CUDA form.
os.getcwd(): This work gets the current working catalog, which can be utilized for overseeing record ways.

Step 2: Clone the SAM 2 Repository and Install Dependencies

Next, we need to clone the SAM 2 repository from GitHub and install the required dependencies.

# Clone the SAM 2 repository
!git clone https://github.com/facebookresearch/segment-anything-2.git

# Change to the repository directory
%cd segment-anything-2

# Install the SAM 2 package
!pip install -e .

# Install additional packages
!pip install supervision jupyter_bbox_widget

Explanation

!git clone: Clones the SAM 2 repository to your local machine.
%cd: Changes the directory to the cloned repository.
!pip install -e .: Installs the SAM 2 package in editable mode.
!pip install supervision jupyter_bbox_widget: Installs additional packages required for visualization and bounding box widget support.

Step 3: Download Model Checkpoints

Model checkpoints are essential, as they contain the trained parameters of SAM 2. We will download multiple checkpoints for different model sizes.

# Create a directory for checkpoints
!mkdir -p checkpoints

# Download the model checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -P checkpoints

Explanation

!mkdir -p checkpoints: Creates a directory for storing model checkpoints.
!wget -q … -P checkpoints: Downloads the model checkpoints into the checkpoints directory. Different checkpoints represent models of varying sizes and capabilities.

Step 4: Download Sample Images

For demonstration purposes, we’ll use some sample images. You can also use your images by following similar steps.

# Create a directory for data
!mkdir -p data

# Download sample images
!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg -P data

Explanation

!mkdir -p data: Creates a directory for storing sample images.
!wget -q … -P data: Downloads the sample images into the data directory.

Step 5: Set Up the SAM 2 Model and Load an Image

Now, we will set up the SAM 2 model, load an image, and prepare it for segmentation.

import cv2
import torch
import numpy as np
import supervision as sv

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator

# Enable CUDA if available
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()

if torch.cuda.get_device_properties(0).major >= 8:
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

# Set the device to CUDA
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define the model checkpoint and configuration
CHECKPOINT = "checkpoints/sam2_hiera_large.pt"
CONFIG = "sam2_hiera_l.yaml"

# Build the SAM 2 model
sam2_model = build_sam2(CONFIG, CHECKPOINT, device=DEVICE, apply_postprocessing=False)

# Create the automatic mask generator
mask_generator = SAM2AutomaticMaskGenerator(sam2_model)

# Load an image for segmentation
IMAGE_PATH = "/content/WhatsApp Image 2024-08-02 at 14.17.11_2b223e01.jpg"
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Generate segmentation masks
sam2_result = mask_generator.generate(image_rgb)

Explanation

CUDA Setup: Enables CUDA for faster processing and sets the device to GPU if available.
Model Setup: Builds the SAM 2 model using the specified configuration and checkpoint.
Image Loading: Loads and converts the sample image to RGB format.
Mask Generation: Uses the automatic mask generator to generate segmentation masks for the loaded image.

Step 6: Visualize the Segmentation Masks

We will now visualize the segmentation masks generated by SAM 2.

# Annotate the masks on the image
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections.from_sam(sam_result=sam2_result)
annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the original and segmented images side by side
sv.plot_images_grid(
    images=[image_bgr, annotated_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

# Extract and plot individual masks
masks = [
    mask['segmentation']
    for mask in sorted(sam2_result, key=lambda x: x['area'], reverse=True)
]

sv.plot_images_grid(
    images=masks[:16],
    grid_size=(4, 4),
    size=(12, 12)
)

Explanation:

Mask Annotation: Annotates the segmentation masks on the original image.
Visualization: Plots the original and segmented images side by side and also plots individual masks.

Step7: Use Box Prompts for Segmentation

Box prompts allow us to specify regions of interest in the image for segmentation.

# Define the SAM 2 Image Predictor
predictor = SAM2ImagePredictor(sam2_model)

# Reload the image
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Encode the image for bounding box input
import base64

def encode_image(filepath):
    with open(filepath, 'rb') as f:
        image_bytes = f.read()
    encoded = str(base64.b64encode(image_bytes), 'utf-8')
    return "data:image/jpg;base64,"+encoded

# Enable custom widget manager in Colab
IS_COLAB = True

if IS_COLAB:
    from google.colab import output
    output.enable_custom_widget_manager()

from jupyter_bbox_widget import BBoxWidget

# Create a bounding box widget
widget = BBoxWidget()
widget.image = encode_image(IMAGE_PATH)

# Display the widget
widget

Explanation

Image Predictor: Defines the SAM 2 image predictor.
Image Encoding: Encodes the image for use with the bounding box widget.
Widget Setup: Sets up a bounding box widget for specifying regions of interest.

Step8: Get Bounding Boxes and Perform Segmentation

After specifying the bounding boxes, we can use them to generate segmentation masks.

# Get the bounding boxes from the widget
boxes = widget.bboxes
boxes = np.array([
    [
        box['x'],
        box['y'],
        box['x'] + box['width'],
        box['y'] + box['height']
    ] for box in boxes
])

[{'x': 457, 'y': 341, 'width': 0, 'height': 0, 'label': ''},
 {'x': 205, 'y': 79, 'width': 0, 'height': 1, 'label': ''}]

# Set the image in the predictor
predictor.set_image(image_rgb)

# Generate masks using the bounding boxes
masks, scores, logits = predictor.predict(
    box=boxes,
    multimask_output=False
)

# Convert masks to binary format
masks = np.squeeze(masks)

# Annotate and visualize the masks
box_annotator = sv.BoxAnnotator(color=sv.Color.white())
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

detections = sv.Detections(
    xyxy=boxes,
    mask=masks.astype(bool)
)

source_image = box_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the annotated images
sv.plot_images_grid(
    images=[source_image, segmented_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

Get Bounding Boxes and Perform Segmentation

Explanation

Bounding Boxes: Retrieves the bounding boxes specified using the widget.
Mask Generation: Uses the bounding boxes to generate segmentation masks.
Visualization: Annotates and visualizes the masks on the original image.

Step9: Use Point Prompts for Segmentation

Point prompts allow us to specify individual points of interest for segmentation.

# Create point prompts based on bounding boxes
input_point = np.array([
    [
        box['x'] + (box['width'] // 2),
        box['y'] + (box['height'] // 2)
    ] for box in widget.bboxes
])
input_label = np.array([1] * len(input_point))

# Generate masks using the point prompts
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True
)

# Convert masks to binary format
masks = np.squeeze(masks)

# Annotate and visualize the masks
point_annotator = sv.PointAnnotator(color_lookup=sv.ColorLookup.INDEX)
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

detections = sv.Detections(
    xyxy=sv.mask_to_xyxy(masks=masks),
    mask=masks.astype(bool)
)

source_image = point_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the annotated images
sv.plot_images_grid(
    images=[source_image, segmented_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

Explanation

Point Prompts: Creates point prompts based on the bounding boxes.
Mask Generation: Uses the point prompts to generate segmentation masks.
Visualization: Annotates and visualizes the masks on the original image.

Key Points to Remember When Working SAM 2

Let us now look into few important key points below:

Revolutionizing Photo and Video Editing

Potential to transform the photo and video editing industry.
Future enhancements may include improved precision, lower computational requirements, and advanced AI integration.

Real-Time Segmentation and Editing

Evolution could lead to real-time segmentation and editing capabilities.
Allows seamless alterations in videos and images with minimal effort.

Creative Possibilities for All

Opens up new creative possibilities for both professionals and amateurs.
Simplifies the manipulation of visual content, the creation of stunning effects, and the production of high-quality media.

Automating Complex Tasks

Automates intricate segmentation tasks.
Significantly accelerates workflows, making sophisticated editing more accessible and efficient.

Democratizing Content Creation

Makes high-level editing tools available to a broader audience.
Empowers storytellers and inspires innovation across various sectors, including entertainment, advertising, and education.

Impact on VFX Industry

Enhances visual effects (VFX) production by streamlining complex processes.
Reduces the time and effort required for creating intricate VFX, enabling more ambitious projects and improving overall quality.

Impressive Potential of SAM 2

The Segment Anything Model 2 (SAM 2) stands poised to revolutionize the fields of photo and video editing by introducing significant advancements in precision and computational efficiency. By integrating advanced AI capabilities, SAM 2 will enable more intuitive user interactions and real-time segmentation and editing, allowing seamless alterations with minimal effort. This groundbreaking technology promises to democratize content creation, empowering both professionals and amateurs to manipulate visual content, create stunning effects, and produce high-quality media with ease.

As SAM 2 automates complex segmentation tasks, it will accelerate workflows and make sophisticated editing accessible to a wider audience. This transformation will inspire innovation across various industries, from entertainment and advertising to education. In the realm of visual effects (VFX), SAM 2 will streamline intricate processes, reducing the time and effort needed to create elaborate VFX. This will enable more ambitious projects, elevate the quality of visual storytelling, and open up new creative possibilities in the VFX world.

Conclusion

By following this guide, you have learned how to set up and use the Segment Anything Model 2 (SAM 2) for image segmentation using both box and point prompts. SAM 2 provides powerful and flexible tools for segmenting objects in images, making it a valuable asset for various computer vision tasks. Feel free to experiment with your images and explore the capabilities of SAM 2 further.

Key Takeaways

SAM 2 is an advanced tool developed by Meta AI that enables precise and flexible image and video segmentation using both box and point prompts.
The model can significantly enhance photo and video editing by automating complex segmentation tasks, making it more accessible and efficient.
Setting up SAM 2 requires a CUDA-enabled GPU and a basic understanding of Python and image processing concepts.
SAM 2’s capabilities open new possibilities for both professionals and amateurs in content creation, offering real-time segmentation and creative control.
The model has the potential to transform various industries, including visual effects, entertainment, advertising, and education, by democratizing high-level editing tools.

Frequently Asked Questions

Q1. What is SAM 2?

A. SAM 2, or Section Anything Show 2, is a picture and video division show created by Meta AI that permits clients to produce division covers for particular objects by giving box or point prompts.

Q2. What are the prerequisites for utilizing SAM 2?

A. To use SAM 2, you need a CUDA-enabled GPU for faster processing and Python installed on your machine. Basic knowledge of Python and image processing concepts is also helpful.

Q3. How do I set up SAM 2?

A. Set up SAM 2 by checking GPU availability, cloning the SAM 2 repository from GitHub, installing required dependencies, and downloading model checkpoints and sample images for testing.

Q4. What types of prompts can be used with SAM 2 for segmentation?

A. SAM 2 supports both box prompts and point prompts. Box prompts involve specifying regions of interest using bounding boxes, while point prompts involve selecting specific points in the image.

Q5. How can SAM 2 impact photo and video editing?

A. SAM 2 can revolutionize photo and video altering by mechanizing complex division assignments, empowering real-time altering, and making advanced altering apparatuses available to a broader gathering of people, in this manner improving imaginative conceivable outcomes and workflow proficiency.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Soumyadarshan Dash

Hello there! I'm Soumyadarshan Dash, a passionate and enthusiastic person when it comes to data science and machine learning. I'm constantly exploring new topics and techniques in this field, always striving to expand my knowledge and skills. In fact, upskilling myself is not just a hobby, but a way of life for me.

Free Courses

4.7

Building Multi Agent Systems with Strands Agents

Design scalable multi-agent architectures with Strands.

4.8

Nano Course: Dreambooth-Stable Diffusion for Custom Images

Learn to create custom images with Dreambooth Stable Diffusion technology

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

Mastering Image and Video Segmentation with SAM 2

Learning Objectives

Table of contents

Prerequisites

What is SAM 2?

Key Features of SAM 2

Core Components of SAM 2

Applications of SAM 2

What is Image Segmentation?

Setting Up and Utilizing SAM 2 for Image Segmentation

Step 1: Check GPU Availability and Set Up the Environment

Explanation

Step 2: Clone the SAM 2 Repository and Install Dependencies

Explanation

Step 3: Download Model Checkpoints

Explanation

Step 4: Download Sample Images

Explanation

Step 5: Set Up the SAM 2 Model and Load an Image

Explanation

Step 6: Visualize the Segmentation Masks

Explanation:

Step7: Use Box Prompts for Segmentation

Explanation

Step8: Get Bounding Boxes and Perform Segmentation

Explanation

Step9: Use Point Prompts for Segmentation

Explanation

Key Points to Remember When Working SAM 2

Revolutionizing Photo and Video Editing

Real-Time Segmentation and Editing

Creative Possibilities for All

Automating Complex Tasks

Democratizing Content Creation

Impact on VFX Industry

Impressive Potential of SAM 2

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Building Multi Agent Systems with Strands Agents

Nano Course: Dreambooth-Stable Diffusion for Custom Images

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques