Identifying objects in real-time object detection tools like YOLO, SSD, DETR, etc., has always been the key to monitoring the movement and actions of various objects within a certain frame region. Several industries, such as traffic management, shopping malls, security, and personal protective equipment, have utilized this mechanism for tracking, monitoring, and gaining analytics. But the greatest challenge in such models are the anchor boxes or bounding boxes which often lose track of a certain object when a different object overlays over the the object we were tracking which causes the change in the identification tags of certain objects, such taggings could cause unwanted increment in tracking systems especially when it comes to analytics. Further in this article, we will be talking about how Re-ID in YOLO can be adopted.
Re-ID or identification of objects would play an important role here. Re-ID in YOLO would enable us to preserve the identity of the tracked object. Several deep learning approaches can track and Re-ID together. Re-identification allows for the short-term recovery of lost tracks in tracking. It is usually done by comparing the visual similarity between objects using embeddings, which are generated by a different model that processes cropped object images. However, this adds extra latency to the pipeline, which can cause issues with latency or FPS rates in real-time detections.
Researchers often train these embeddings on large-scale person or object Re-ID datasets, allowing them to capture fine-grained details like clothing texture, colour, or structural features that stay consistent despite changes in pose and lighting. Several deep learning approaches have combined tracking and Re-ID in earlier work. Popular tracker models include DeepSORT, Norfair, FairMOT, ByteTrack, and others.
Some older strategies store each ID locally along with its corresponding frame and picture snippet. The system then reassigns IDs to certain objects based on visual similarity. However, this strategy consumes significant time and memory. Additionally, because this manual Re-ID logic doesn’t handle changes in viewpoint, background clutter, or resolution degradation well. It lacks the robustness needed for scalable or real-time systems.
ByteTrack’s core idea is really simple. Instead of ignoring all low-confidence detections, it retains the non-background low-score boxes for a second association pass, which boosts track consistency under occlusion. After the initial detection stage, the system partitions boxes into high-confidence, low-confidence (but non-background), and background (discarded) sets.
First, it matches high-confidence boxes to both active and recently lost tracklets using IoU or optionally feature-similarity affinities, applying the Hungarian algorithm with a strict threshold. The system then uses any unmatched high-confidence detections to either spawn new tracks or queue them for a single-frame retry.
In the secondary pass, the system matches low-confidence boxes to the remaining tracklet predictions using a lower threshold. This step recovers objects whose confidence has dropped due to occlusion or appearance shifts. If any tracklets still remain unmatched, the system moves them into a “lost” buffer for a certain duration, allowing it to reincorporate them if they reappear. This generic two-stage framework integrates seamlessly with any detector model (YOLO, Faster-RCNN, etc.) and any association metric, delivering 50–60 FPS with minimal overhead.
However, ByteTrack still suffers identity switches when objects cross paths, disappear for longer periods, or undergo drastic appearance changes. Adding a dedicated Re-ID embedding network can mitigate these errors, but at the cost of an extra 15–25 ms per frame and increased memory usage.
If you want to refer to the ByteTrack GitHub, click here: ByteTrack
DeepSORT enhances the classic SORT tracker by fusing deep appearance features with motion and spatial cues to significantly reduce ID switches, especially under occlusions or sudden motion changes. To see how DeepSORT builds on SORT, we need to understand the four core components of SORT:
SORT achieves real-time performance on modern hardware due to its speed, but it relies solely on motion and spatial overlap. This often causes it to swap object identities when they cross paths, become occluded, or remain blocked for extended periods. To address this, DeepSORT trains a discriminative feature embedding network offline—typically using large-scale person Re-ID datasets—to generate 128-D appearance vectors for each detection crop. During association, DeepSORT computes a combined affinity score that incorporates:
Because the cosine metric remains stable even when motion cues fail, such as during long‑term occlusions or abrupt changes in velocity, DeepSORT can correctly reassign the original track ID once an object re‑emerges.
FairMOT is a truly single‑shot multi‑object tracker which simultaneously performs object detection and Re‑identification in one unified network, delivering both high accuracy and efficiency. When an input image is fed into FairMOT, it passes through a shared backbone and then splits into two homogeneous branches: the detection branch and the Re‑ID branch. The detection branch adopts an anchor‑free CenterNet‑style head with three sub‑heads – Heatmap, Box Size, and Center Offset.
Parallel to this, the Re‑ID branch projects the same intermediate features into a lower‑dimensional embedding space, generating discriminative feature vectors that capture object appearance.
After producing detection and embedding outputs for the current frame, FairMOT begins its two-stage association process. In the first stage, it propagates each prior tracklet’s state using a Kalman filter to predict its current position. Then, it compares those predictions with the new detections in two ways. It computes appearance affinities as cosine distances between the stored embeddings of each tracklet and the current frame’s Re-ID vectors. At the same time, it calculates motion affinities using the Mahalanobis distance between the Kalman-predicted bounding boxes and the fresh detections. FairMOT fuses these two distance measures into a single cost matrix and solves it using the Hungarian algorithm to link existing tracks to new detections, provided the cost stays below a preset threshold.
Suppose any track remains unassigned after this first pass due to abrupt motion or weak appearance cues. FairMOT invokes a second, IoU‑based matching stage. Here, the spatial overlap (IoU) between the previous frame’s boxes and unmatched detections is evaluated; if the overlap exceeds a lower threshold, the original ID is retained, otherwise a new track ID is issued. This hierarchical matching—first appearance + motion, then pure spatial—allows FairMOT to handle both subtle occlusions and rapid reappearances while keeping computational overhead low (only ~8 ms extra per frame compared to a vanilla detector). The result is a tracker that maintains high MOTA and ID‑F1 on challenging benchmarks, all without the heavy separate embedding network or complex anchor tuning required by many two‑stage methods.
Before starting with the changes made to this efficient re-identification strategy, we have to understand how the object-level features are retrieved in YOLO and BotSORT.
BoT‑SORT (Robust Associations Multi‑Pedestrian Tracking) was introduced by Aharon et al. in 2022 as a tracking‑by‑detection framework that unifies motion prediction and appearance modeling, along with explicit camera motion compensation, to maintain stable object identities across challenging scenarios. It combines three key innovations: an enhanced Kalman filter state, GMC, and IoU‑Re-ID fusion. BoT‑SORT achieves superior tracking metrics on standard MOT benchmarks.
You can read the research paper from here.
This modular design also allows hybrid tracking systems where different tracking logic (e.g., occlusion recovery or reactivation thresholds) can be embedded directly in each object instance.
This dual-threshold approach allows greater flexibility in tuning for specific scenes—e.g., high occlusion (lower appearance threshold), or high motion blur (lower IoU threshold).
The YAML file looks as follows:-
tracker_type: botsort # Use BoT‑SORT
track_high_thresh: 0.25 # IoU threshold for first association
track_low_thresh: 0.10 # IoU threshold for second association
new_track_thresh: 0.25 # Confidence threshold to start new tracks
track_buffer: 30 # Frames to wait before deleting lost tracks
match_thresh: 0.80 # Appearance matching threshold
### CLI Example
# Run BoT‑SORT tracking on a video using the default YAML config
yolo track model=yolov8n.pt tracker=botsort.yaml source=path/to/video.mp4 show=True
### Python API Example
from ultralytics import YOLO
from ultralytics.trackers import BOTSORT
# Load a YOLOv8 detection model
model = YOLO('yolov8n.pt')
# Initialize BoT‑SORT with Re-ID support and GMC
args = {
'with_Re-ID': True,
'gmc_method': 'homography',
'proximity_thresh': 0.7,
'appearance_thresh': 0.5,
'fuse_score': True
}
tracker = BOTSORT(args, frame_rate=30)
# Perform tracking
results = model.track(source='path/to/video.mp4', tracker=tracker, show=True)
You can read more about compatible YOLO trackers here.
The system usually performs re-identification by comparing visual similarities between objects using embeddings. A separate model typically generates these embeddings by processing cropped object images. However, this approach adds extra latency to the pipeline. Alternatively, the system can use object-level features directly for re-identification, eliminating the need for a separate embedding model. This change improves efficiency while keeping latency virtually unchanged.
Resource: YOLO in Re-ID Tutorial
Colab Notebook: Link to Colab
Do try to run your videos to see how Re-ID in YOLO works. In the Colab NB, we have to just replace the path of “occluded.mp4” with your video path 🙂
To see all of the diffs in context and grab the complete botsort.py patch, check out the Link to Colab and this Tutorial. Be sure to review it alongside this guide so you can follow each change step‑by‑step.
Changes Made:
Changes Made:
Changes Made:
Changes Made:
Adjust the botsort.yaml parameters for improved occlusion handling and matching tolerance:
Changes Made:
Changes Made:
With these concise modifications, Ultralytics YOLO with BoT‑SORT now natively supports feature-based re-identification without adding a second Re-ID network, achieving robust identity preservation with minimal performance overhead. Feel free to experiment with the thresholds in Step 5 to tailor matching strictness to your application.
Also read: Roboflow’s RF-DETR: Bridging Speed and Accuracy in Object Detection
⚠️ Note: These changes are not part of the official Ultralytics release. They need to be implemented manually to enable efficient re-identification.
Here, the water hydrant(id8), the woman near the truck(id67), and the truck(id3) on the left side of the frame have been re-identified accurately.
While some objects are identified correctly(id4, id5, id60), a few police officers in the background received different IDs, possibly due to frame rate limitations.
The ball(id3) and the shooter(id1) are tracked and identified well, but the goalkeeper(id2 -> id8), occluded by the shooter, was given a new ID due to lost visibility.
A new open‑source toolkit called Trackers is being developed to simplify multi‑object tracking workflows. Trackers will offer:
DeepSORT and SORT are already import-ready in the GitHub repository, and the remaining trackers will be added in subsequent weeks.
Github Link – Roboflow
The comparison section shows that Re-ID in YOLO performs reliably, maintaining object identities across frames. Occasional mismatches stem from occlusions or low frame rates, common in real-time tracking. Adjustable proximity_thresh and appearance_thresh Offer flexibility for varied use cases.
The key advantage is efficiency: leveraging object-level features from YOLO removes the need for a separate Re-ID network, resulting in a lightweight, deployable pipeline.
This approach delivers a robust and practical multi-object tracking solution. Future improvements may include adaptive thresholds, better feature extraction, or temporal smoothing.
Note: These updates aren’t part of the official Ultralytics library yet and must be applied manually, as shown in the shared resources.
Kudos to Yasin, M. (2025) for the insightful tutorial on Tracking with Efficient Re-Identification in Ultralytics. Yasin’s Keep. Check here