Introduction
This information will stroll you thru what Section Something Mannequin 2 is, the way it works, and the way you’ll put it to use to portion objects in photos and movies. It provides state-of-the-art execution and adaptableness in fragmenting objects into photos, making it an vital useful resource for a assortment of pc imaginative and prescient functions. This instantly factors to supplying a nitty-gritty, step-by-step walkthrough for organising and using SAM 2 to carry out image division. By taking this direct, it is possible for you to to supply division covers for photos using each field and level prompts.
Studying Targets
- Describe the important thing options and functions of the Section Something Mannequin 2 SAM 2 in picture and video segmentation.
- Efficiently configure a CUDA-enabled setting, set up mandatory dependencies, and clone the Section Something Mannequin 2 repository for picture segmentation duties.
- Apply SAM 2 to generate segmentation masks for pictures utilizing each field and level prompts and visualize the outcomes successfully.
- Consider how SAM 2 can revolutionize photograph and video modifying by enabling real-time segmentation, automating complicated duties, and democratizing content material creation for a broader viewers.
This text was revealed as part of the Knowledge Science Blogathon.
Conditions
A while not too long ago you start, assure you’ve bought a CUDA-enabled GPU for faster dealing with. Additionally, confirm that you’ve got Python put in in your machine. This information assumes you’ve some primary information of Python and picture processing ideas.
What’s SAM 2?
Section Something Mannequin 2 is an progressed instrument for image division created by Fb AI Inquire about (Affordable). On July twenty ninth, 2024, Meta AI discharged SAM 2, an progressed image and video division institution present. SAM 2 empowers shoppers to provide focuses or bins in an image or video to create division covers for specific objects.
Click on right here to entry it.
Key Options of SAM 2
- Superior Masks Era: SAM 2 generates high-quality segmentation masks primarily based on consumer inputs, similar to factors or bounding bins.
- Flexibility: The mannequin helps each picture and video segmentation.
- Velocity and Effectivity: With CUDA help, SAM 2 can carry out segmentation duties quickly, making it appropriate for real-time functions.
Core Parts of SAM 2
- Picture Encoder: Encodes the enter picture for processing.
- Immediate Encoder: Converts user-provided factors or bins right into a format the mannequin can use.
- Masks Decoder: Generates the ultimate segmentation masks primarily based on the encoded inputs.
Purposes of SAM 2
Allow us to now look into the functions of SAM 2 under:
- Picture and Video Enhancing: SAM 2 permits for exact object segmentation, enabling detailed edits and inventive results in images and movies.
- Autonomous Autos: In autonomous driving, SAM 2 can be utilized to establish and observe objects like pedestrians, autos, and street indicators in real-time.
- Medical Imaging: SAM 2 can help in segmenting anatomical buildings in medical pictures, aiding in diagnostics and therapy planning.
What’s Picture Segmentation?
Picture segmentation is a pc imaginative and prescient approach that entails dividing a picture into a number of segments or areas to simplify its evaluation. Every phase represents a distinct object or a part of an object inside the picture, making it simpler to establish and analyze particular components.
Kinds of Picture Segmentation
- Semantic Segmentation: Classifies every pixel right into a predefined class.
- Occasion Segmentation: Differentiates between completely different situations of the identical object class.
- Panoptic Segmentation: Combines semantic and occasion segmentation.
Setting Up and Using SAM 2 for Picture Segmentation
We’ll information you thru the method of organising the Section Something Mannequin 2 (SAM 2) in your setting and using its highly effective capabilities for exact picture segmentation duties. From making certain your GPU is able to configuring the mannequin and making use of it to actual pictures, every step might be coated intimately that will help you harness the complete potential of SAM 2.
Step 1: Examine GPU Availability and Set Up the Surroundings
First, let’s be certain that your setting is correctly arrange, beginning with checking for GPU availability and setting the present working listing.
# Examine GPU availability and CUDA model
!nvidia-smi
!nvcc --version
# Import mandatory modules
import os
# Set the present working listing
HOME = os.getcwd()
print("HOME:", HOME)
Clarification
- !nvidia-smi and !nvcc –model: These instructions examine in case your framework incorporates a CUDA-enabled GPU and present the CUDA type.
- os.getcwd(): This work will get the present working catalog, which may be utilized for overseeing document methods.
Step 2: Clone the SAM 2 Repository and Set up Dependencies
Subsequent, we have to clone the SAM 2 repository from GitHub and set up the required dependencies.
# Clone the SAM 2 repository
!git clone https://github.com/facebookresearch/segment-anything-2.git
# Change to the repository listing
%cd segment-anything-2
# Set up the SAM 2 bundle
!pip set up -e .
# Set up further packages
!pip set up supervision jupyter_bbox_widget
Clarification
- !git clone: Clones the SAM 2 repository to your native machine.
- %cd: Modifications the listing to the cloned repository.
- !pip set up -e .: Installs the SAM 2 bundle in editable mode.
- !pip set up supervision jupyter_bbox_widget: Installs further packages required for visualization and bounding field widget help.
Step 3: Obtain Mannequin Checkpoints
Mannequin checkpoints are important, as they include the skilled parameters of SAM 2. We’ll obtain a number of checkpoints for various mannequin sizes.
# Create a listing for checkpoints
!mkdir -p checkpoints
# Obtain the mannequin checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -P checkpoints
Clarification
- !mkdir -p checkpoints: Creates a listing for storing mannequin checkpoints.
- !wget -q … -P checkpoints: Downloads the mannequin checkpoints into the checkpoints listing. Totally different checkpoints characterize fashions of various sizes and capabilities.
Step 4: Obtain Pattern Photos
For demonstration functions, we’ll use some pattern pictures. You can even use your pictures by following comparable steps.
# Create a listing for knowledge
!mkdir -p knowledge
# Obtain pattern pictures
!wget -q https://media.roboflow.com/notebooks/examples/canine.jpeg -P knowledge
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg -P knowledge
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg -P knowledge
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg -P knowledge
Clarification
- !mkdir -p knowledge: Creates a listing for storing pattern pictures.
- !wget -q … -P knowledge: Downloads the pattern pictures into the info listing.
Step 5: Set Up the SAM 2 Mannequin and Load an Picture
Now, we are going to arrange the SAM 2 mannequin, load a picture, and put together it for segmentation.
import cv2
import torch
import numpy as np
import supervision as sv
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
# Allow CUDA if obtainable
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
if torch.cuda.get_device_properties(0).main >= 8:
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
# Set the machine to CUDA
DEVICE = torch.machine('cuda' if torch.cuda.is_available() else 'cpu')
# Outline the mannequin checkpoint and configuration
CHECKPOINT = "checkpoints/sam2_hiera_large.pt"
CONFIG = "sam2_hiera_l.yaml"
# Construct the SAM 2 mannequin
sam2_model = build_sam2(CONFIG, CHECKPOINT, machine=DEVICE, apply_postprocessing=False)
# Create the automated masks generator
mask_generator = SAM2AutomaticMaskGenerator(sam2_model)
# Load a picture for segmentation
IMAGE_PATH = "/content material/WhatsApp Picture 2024-08-02 at 14.17.11_2b223e01.jpg"
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
# Generate segmentation masks
sam2_result = mask_generator.generate(image_rgb)
Clarification
- CUDA Setup: Allows CUDA for sooner processing and units the machine to GPU if obtainable.
- Mannequin Setup: Builds the SAM 2 mannequin utilizing the desired configuration and checkpoint.
- Picture Loading: Masses and converts the pattern picture to RGB format.
- Masks Era: Makes use of the automated masks generator to generate segmentation masks for the loaded picture.
Step 6: Visualize the Segmentation Masks
We’ll now visualize the segmentation masks generated by SAM 2.
# Annotate the masks on the picture
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections.from_sam(sam_result=sam2_result)
annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)
# Plot the unique and segmented pictures facet by facet
sv.plot_images_grid(
pictures=[image_bgr, annotated_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)

# Extract and plot particular person masks
masks = [
mask['segmentation']
for masks in sorted(sam2_result, key=lambda x: x['area'], reverse=True)
]
sv.plot_images_grid(
pictures=masks[:16],
grid_size=(4, 4),
dimension=(12, 12)
)

Clarification:
- Masks Annotation: Annotates the segmentation masks on the unique picture.
- Visualization: Plots the unique and segmented pictures facet by facet and in addition plots particular person masks.
Step7: Use Field Prompts for Segmentation
Field prompts enable us to specify areas of curiosity within the picture for segmentation.
# Outline the SAM 2 Picture Predictor
predictor = SAM2ImagePredictor(sam2_model)
# Reload the picture
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
# Encode the picture for bounding field enter
import base64
def encode_image(filepath):
with open(filepath, 'rb') as f:
image_bytes = f.learn()
encoded = str(base64.b64encode(image_bytes), 'utf-8')
return "knowledge:picture/jpg;base64,"+encoded
# Allow customized widget supervisor in Colab
IS_COLAB = True
if IS_COLAB:
from google.colab import output
output.enable_custom_widget_manager()
from jupyter_bbox_widget import BBoxWidget
# Create a bounding field widget
widget = BBoxWidget()
widget.picture = encode_image(IMAGE_PATH)
# Show the widget
widget

Clarification
- Picture Predictor: Defines the SAM 2 picture predictor.
- Picture Encoding: Encodes the picture to be used with the bounding field widget.
- Widget Setup: Units up a bounding field widget for specifying areas of curiosity.
Step8: Get Bounding Bins and Carry out Segmentation
After specifying the bounding bins, we will use them to generate segmentation masks.
# Get the bounding bins from the widget
bins = widget.bboxes
bins = np.array([
[
box['x'],
field['y'],
field['x'] + field['width'],
field['y'] + field['height']
] for field in bins
])
[{'x': 457, 'y': 341, 'width': 0, 'height': 0, 'label': ''},
{'x': 205, 'y': 79, 'width': 0, 'height': 1, 'label': ''}]
# Set the picture within the predictor
predictor.set_image(image_rgb)
# Generate masks utilizing the bounding bins
masks, scores, logits = predictor.predict(
field=bins,
multimask_output=False
)
# Convert masks to binary format
masks = np.squeeze(masks)
# Annotate and visualize the masks
box_annotator = sv.BoxAnnotator(colour=sv.Coloration.white())
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections(
xyxy=bins,
masks=masks.astype(bool)
)
source_image = box_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)
# Plot the annotated pictures
sv.plot_images_grid(
pictures=[source_image, segmented_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)

Clarification
- Bounding Bins: Retrieves the bounding bins specified utilizing the widget.
- Masks Era: Makes use of the bounding bins to generate segmentation masks.
- Visualization: Annotates and visualizes the masks on the unique picture.
Step9: Use Level Prompts for Segmentation
Level prompts enable us to specify particular person factors of curiosity for segmentation.
# Create level prompts primarily based on bounding bins
input_point = np.array([
[
box['x'] + (field['width'] // 2),
field['y'] + (field['height'] // 2)
] for field in widget.bboxes
])
input_label = np.array([1] * len(input_point))
# Generate masks utilizing the purpose prompts
masks, scores, logits = predictor.predict(
point_coords=input_point,
point_labels=input_label,
multimask_output=True
)
# Convert masks to binary format
masks = np.squeeze(masks)
# Annotate and visualize the masks
point_annotator = sv.PointAnnotator(color_lookup=sv.ColorLookup.INDEX)
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections(
xyxy=sv.mask_to_xyxy(masks=masks),
masks=masks.astype(bool)
)
source_image = point_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)
# Plot the annotated pictures
sv.plot_images_grid(
pictures=[source_image, segmented_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)

Clarification
- Level Prompts: Creates level prompts primarily based on the bounding bins.
- Masks Era: Makes use of the purpose prompts to generate segmentation masks.
- Visualization: Annotates and visualizes the masks on the unique picture.
Key Factors to Keep in mind When Working SAM 2
Allow us to now look into few vital key factors under:
Revolutionizing Picture and Video Enhancing
- Potential to remodel the photograph and video modifying business.
- Future enhancements might embody improved precision, decrease computational necessities, and superior AI integration.
Actual-Time Segmentation and Enhancing
- Evolution may result in real-time segmentation and modifying capabilities.
- Permits seamless alterations in movies and pictures with minimal effort.
Artistic Prospects for All
- Opens up new inventive prospects for each professionals and amateurs.
- Simplifies the manipulation of visible content material, the creation of gorgeous results, and the manufacturing of high-quality media.
Automating Complicated Duties
- Automates intricate segmentation duties.
- Considerably accelerates workflows, making refined modifying extra accessible and environment friendly.
Democratizing Content material Creation
- Makes high-level modifying instruments obtainable to a broader viewers.
- Empowers storytellers and evokes innovation throughout numerous sectors, together with leisure, promoting, and schooling.
Affect on VFX Business
- Enhances visible results (VFX) manufacturing by streamlining complicated processes.
- Reduces the effort and time required for creating intricate VFX, enabling extra bold tasks and bettering general high quality.
Spectacular Potential of SAM 2
The Section Something Mannequin 2 (SAM 2) stands poised to revolutionize the fields of photograph and video modifying by introducing important developments in precision and computational effectivity. By integrating superior AI capabilities, SAM 2 will allow extra intuitive consumer interactions and real-time segmentation and modifying, permitting seamless alterations with minimal effort. This groundbreaking know-how guarantees to democratize content material creation, empowering each professionals and amateurs to control visible content material, create gorgeous results, and produce high-quality media with ease.
As SAM 2 automates complicated segmentation duties, it is going to speed up workflows and make refined modifying accessible to a wider viewers. This transformation will encourage innovation throughout numerous industries, from leisure and promoting to schooling. Within the realm of visible results (VFX), SAM 2 will streamline intricate processes, lowering the effort and time wanted to create elaborate VFX. This may allow extra bold tasks, elevate the standard of visible storytelling, and open up new inventive prospects within the VFX world.
Conclusion
By following this information, you’ve realized how you can arrange and use the Section Something Mannequin 2 (SAM 2) for picture segmentation utilizing each field and level prompts. SAM 2 gives highly effective and versatile instruments for segmenting objects in pictures, making it a priceless asset for numerous pc imaginative and prescient duties. Be happy to experiment along with your pictures and discover the capabilities of SAM 2 additional.
Key Takeaways
- SAM 2 is a complicated device developed by Meta AI that permits exact and versatile picture and video segmentation utilizing each field and level prompts.
- The mannequin can considerably improve photograph and video modifying by automating complicated segmentation duties, making it extra accessible and environment friendly.
- Establishing SAM 2 requires a CUDA-enabled GPU and a primary understanding of Python and picture processing ideas.
- SAM 2’s capabilities open new prospects for each professionals and amateurs in content material creation, providing real-time segmentation and inventive management.
- The mannequin has the potential to remodel numerous industries, together with visible results, leisure, promoting, and schooling, by democratizing high-level modifying instruments.
Often Requested Questions
A. SAM 2, or Part Something Present 2, is a image and video division present created by Meta AI that allows shoppers to supply division covers for specific objects by giving field or level prompts.
A. To make use of SAM 2, you want a CUDA-enabled GPU for sooner processing and Python put in in your machine. Primary information of Python and picture processing ideas can also be useful.
A. Arrange SAM 2 by checking GPU availability, cloning the SAM 2 repository from GitHub, putting in required dependencies, and downloading mannequin checkpoints and pattern pictures for testing.
A. SAM 2 helps each field prompts and level prompts. Field prompts contain specifying areas of curiosity utilizing bounding bins, whereas level prompts contain choosing particular factors within the picture.
A. SAM 2 can revolutionize photograph and video altering by mechanizing complicated division assignments, empowering real-time altering, and making superior altering apparatuses obtainable to a broader gathering of individuals, on this method bettering imaginative conceivable outcomes and workflow proficiency.
The media proven on this article is just not owned by Analytics Vidhya and is used on the Creator’s discretion.