guest_blog — Published On July 23, 2020 and Last Modified On July 28th, 2020
Advanced Algorithm Deep Learning NLP Text Unstructured Data

Overview

Introduction

The MobileBERT architectures

MobileBERT : Architecture

Architecture visualization of transformer blocks within (a) BERT, (b) MobileBERT teacher, and (c) MobileBERT student. The green trapezoids marked with “Linear” are referred to as bottlenecks. Source

Linear

 

Multi-Head Attention

MobileBERT : Multi-head Attention

Stacked FFN

Operational optimizations

Image for post

NoNorm equation to replace batch normalization operation in the transformer blocks. The “dot” denotes Hadamara product — element-wise multiplication between the two vectors

The motivation of teacher and student size

Proposed knowledge distillation objectives

Image for post

Feature map transfer objective function. T is the sequence length, N the feature map size, and l the layer index.

Image for post

Attention map transfer objective function. T is the sequence length, A the number of attention heads, and l the layer index.

Image for post

Knowledge transfer techniques. (a) Auxiliary knowledge transfer, (b) joint knowledge transfer, (c) progressive knowledge transfer. Source

Experimental results

MobileBERT : Experimental results on the GLUE benchmark

Experimental results on the GLUE benchmark. Source

It’s, therefore, safe to conclude that it’s possible to create a distilled model which both can be performant and fast on resource-limited devices!

It’s been fine-tuned by itself on GLUE which proves that it’s possible to create a task agnostic model through the proposed distillation process!

Conclusion

If you found this summary helpful in understanding the broader picture of this particular research paper, please consider reading my other articles! I’ve already written a bunch and more will definitely be added. I think you might find this one interesting👋🏼🤖

About the Author

Author Viktor Karlsson – Software Engineer

I am a Software Engineer and MSc of Machine Learning with a growing interest in NLP. Trying to stay on top of recent developments within the ML field in general, and NLP in particular. Writing to learn!

About the Author

guest_blog

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *