- Facebook has open sourced the code for DensePose, a technique that can understand human images in terms of surface-based models.
- The DensePose project includes DensePose-COCO and Densepose-RCNN
- It has been implemented using Facebook’s Detectron framework, and is powered by Caffe2
Imagine a world where you open an apparel application on your phone, tap on clothes you like, and the app shows images of you with those clothes on. Sounds like magic, right? On the contrary, we are very close to seeing this kind of technology turning into a real-life application.
Currently, data scientists are able to annotate images, but the existing approaches locate a sparse set of joints, like the wrists or elbows, which are often used for applications like gesture or action recognition. Facebook’s AI Research division (FAIR) has taken this technique to another level altogether.
In order to map all human pixels in 2D images to a 3D surface-based model of the body, they have pioneered a new approach called DensePose. The current approaches in human pose estimation operate with 10 or 20 human joints (such as wrists, elbows, etc.) whereas DensePose identifies the human body in more than 5000 nodes! The below image illustrates my point pretty well.
As mentioned by the researchers in the paper DensePose: Dense Human Pose Estimation In The Wild, presented at the Computer Vision and Pattern Recognition conference (CVPR) 2018 in Utah, the DensePose project includes:
DensePose-COCO: A large-scale dataset with image-to-surface correspondences. The team has gathered annotations for 50K humans, collecting more than 5 million manually annotated correspondences. The exact same train/validation/test split as in the COCO challenge has been followed. Below is an example of a visualization of annotation from the validation set.
DensePose-RCNN: This is a variant of Mask-RCNN, with Feature Pyramid Network and Region-of-Interest Pooling followed by fully-convolutional processing (architecture shown below). This is done to obtain dense part labels and coordinates within each of the selected regions.
The team has shared a GitHub repository in which they have open sourced the code to train and evaluate DensePose-RCNN. Also the notebooks used to visualize the collected DensePose-COCO dataset have been provided. This technique has been implemented using Facebook’s own Detectron framework and is powered by Caffe2.
Below is a video in which they have provided an overview of the technique.
Our Take on this
I can see this technique being put to good use for improving virtual reality experiences or for motion capture devices. And not just that, it can help doctors make decisions regarding physical ailments in patients, accelerate the recent advancement in sports science, among other things.
As usual, the code is available on code for you to play around with. Can you improve on what they’ve released? Where else can you apply this transcendent technique? Share your thoughts in the comments section below.
Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!