GhostFaceNets: Efficient Face Recognition on Edge Devices

Saiprakash Payyavula 25 Apr, 2024
15 min read


GhostFaceNets is a revolutionary facial recognition technology that uses affordable operations without compromising accuracy. Inspired by attention-based models, it revolutionizes facial recognition technology. This blog post explores GhostFaceNets through captivating visuals and insightful illustrations, aiming to educate, motivate, and spark creativity. The journey is not just a blog post, but a unique exploration of the limitless possibilities of GhostFaceNets. Join us on this exciting journey to discover the world of GhostFaceNets.

Learning Objectives

  • Comprehend the underlying challenges and motivations driving the development of lightweight FR models tailored for low computational devices (eg: edge).
  • Articulate the indepth architectural elements of GhostFaceNets, including the Ghost modules, the DFC attention branch, and the specific adaptations introduced to the backbone GhostNets architectures.
  • Discuss the advantages of GhostFaceNets compared to traditional face recognition models, in terms of efficiency, accuracy, and computational complexity.
  • Acknowledge the major contributions made by GhostFaceNets to the field of face recognition and face verification, and imagine its potential applications across different real-time scenarios.

This article was published as a part of the Data Science Blogathon.

Introduction of GhostFaceNets

In today’s era of ubiquitous computing and the IOT, FR technology plays an important role in different applications, including seamless user authentication, personalized experiences, and stronger security measures. However, traditional facial recognition systems consumes high computational resources, rendering them unsuitable for deployment on low computation devices with limited resources. This is where GhostFaceNets comes into play, that promises to revolutionize how we approach and implement facial recognition technology.

Evolution of Lightweight Face Recognition Models

As the demand for edge computing and real-time applications soared, the need for efficient and lightweight models became paramount. Researchers and engineers alike sought to strike a delicate balance between model complexity and performance, giving rise to a plethora of lightweight architectures tailored for specific tasks, including face recognition.

Deep learning algorithms like Convolutional Neural Networks (CNNs) have revolutionized face recognition research, enhancing accuracy compared to traditional methods. However, these models often struggle to balance performance and complexity, especially for real-world applications and resource-constrained devices. The Labeled Faces in the Wild dataset is the gold standard for evaluating new FR models, with Light CNN architectures reducing parameters and computational complexity. Despite these advancements, the most accurate reported performance on LFW is 99.33%.

ShiftFaceNet introduced a “Shift” operation to reduce the number of parameters in image classification models, resulting in a 2-degree accuracy drop. Other models built upon image classification backbones, such as MobileFaceNets, ShuffleFaceNet, VarGFaceNet, and MixFaceNets, have shown improved trade-offs between performance and complexity. MobileFaceNets achieved 99.55% LFW accuracy with 1M parameters, while ShuffleFaceNet achieved 99.67% LFW accuracy with 2.6M parameters and 557.5 MFLOPs.

VarGFaceNet leveraged VarGNet and achieved 99.85% LFW accuracy with 5M parameters and 1.022 GFLOPs. MixFaceNets achieved 99.68% LFW accuracy with 3.95M parameters and 626.1 MFLOPs. Other notable models include AirFace, QuantFace, and PocketNets, which have achieved 99.27% LFW accuracy with 1 GFLOPs, 99.43% LFW accuracy with 1.1M parameters, and 99.58% LFW accuracy with 0.925M parameters and 587.11 MFLOPs.

Understanding GhostFaceNets Architecture

Building upon the efficient GhostNets architectures (GhostNetV1 and GhostNetV2), the authors propose GhostFaceNets, a new set of lightweight architectures tailored for face recognition and face verification. Several key modifications were made:

  • The Global Average Pooling (GAP) layer, pointwise convolution layer (1×1 convolution), and Fully Connected (FC) layer were replaced with a modified Global Depthwise Convolution (GDC) recognition head to generate discriminative feature vectors.
  • The ReLU activation function used in GhostNets was replaced with PReLU, which alleviates the vanishing gradient problem and improves performance.
  • The conventional Fully Connected layers in the Squeeze-and-Excitation (SE) modules were replaced with convolution layers to improve the discriminative power of GhostFaceNets.
  • The ArcFace loss function was employed for training, to enforce intra-class compactness, inter-class discrepancy, and improves the discriminative power of learned features. To go through Arcface loss function please refer to my previous blog –  click here

The authors designed a set of GhostFaceNets models by varying the training dataset, the width of the GhostNets architectures, and the stride of the first convolution layer (stem). The resulting models outperform most lightweight SOTA models on different benchmarks, as discussed in next sections.

A. GhostNetV1 and Ghost Modules (Feature Map Pattern Redundancy)

GhostNetV1, the backbone architecture of GhostFaceNets, employs a novel concept called Ghost modules to generate a certain percentage (denoted as x%) of the feature maps, while the remaining feature maps are generated using a low-cost linear operation called as depthwise convolution (DWConv).

In a traditional convolutional layer, a 2D filter (kernel) is applied to a 2D channel of the input tensor to generate a 2D channel of the output tensor, directly generating a tensor of feature maps with C’ channels from an input tensor of C channels. However, Ghost modules take a different approach.

The Ghost module generates the first x% of the output tensor channels using a sequential block of three layers: normal convolution, batch normalization, and a nonlinear activation function (default: ReLU). The output is then sent to a second block with depthwise convolution, batch normalization, and ReLU, and the output tensor is completed by stacking the two blocks.


As shown in Figure 1, there are clearly similar and redundant feature map pairs (ghosts) that can be generated using linear operations, reducing computational complexity without decreasing performance. The authors of GhostNetV1 exploit this observation by generating these similar and redundant features using cheap operations, rather than discarding them.


By employing Ghost modules, GhostNetV1 can effectively generate the same number of feature maps as an convolutional layer, with significant reduction in the number of parameters and FLOPs. This allows Ghost modules to be easily integrated into existing neural network architectures to reduce computational complexity.

Implementation with Python Code

Below code is from module from backbones folder

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (
import math

CONV_KERNEL_INITIALIZER = keras.initializers.VarianceScaling(scale=2.0, mode="fan_out", distribution="truncated_normal")

def _make_divisible(v, divisor=4, min_value=None):
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

def activation(inputs):
    return Activation("relu")(inputs)

def se_module(inputs, se_ratio=0.25):
    #get the channel axis
    channel_axis = 1 if K.image_data_format() == "channels_first" else -1
    #filters = channel axis shape
    filters = inputs.shape[channel_axis]

    reduction = _make_divisible(filters * se_ratio)

    #from None x H x W x C to None x C
    se = GlobalAveragePooling2D()(inputs)

    #Reshape None x C to None 1 x 1 x C
    se = Reshape((1, 1, filters))(se)

    #Squeeze by using C*se_ratio. The size will be 1 x 1 x C*se_ratio 
    se = Conv2D(reduction, kernel_size=1, use_bias=True, kernel_initializer=CONV_KERNEL_INITIALIZER)(se)
    # se = PReLU(shared_axes=[1, 2])(se)
    se = Activation("relu")(se)

    #Excitation using C filters. The size will be 1 x 1 x C
    se = Conv2D(filters, kernel_size=1, use_bias=True, kernel_initializer=CONV_KERNEL_INITIALIZER)(se)
    se = Activation("hard_sigmoid")(se)
    return Multiply()([inputs, se])

def ghost_module(inputs, out, convkernel=1, dwkernel=3, add_activation=True):
    # conv_out_channel = math.ceil(out * 1.0 / 2)
    conv_out_channel = out // 2
    # tf.print("[ghost_module] out:", out, "conv_out_channel:", conv_out_channel)
    cc = Conv2D(conv_out_channel, convkernel, use_bias=False, strides=(1, 1), padding="same", kernel_initializer=CONV_KERNEL_INITIALIZER)(
    )  # padding=kernel_size//2
    cc = BatchNormalization(axis=-1)(cc)
    if add_activation:
        cc = activation(cc)

    channel = int(out - conv_out_channel)
    nn = DepthwiseConv2D(dwkernel, 1, padding="same", use_bias=False, depthwise_initializer=CONV_KERNEL_INITIALIZER)(cc)  # padding=dw_size//2
    nn = BatchNormalization(axis=-1)(nn)
    if add_activation:
        nn = activation(nn)
    return Concatenate()([cc, nn])

def ghost_bottleneck(inputs, dwkernel, strides, exp, out, se_ratio=0, shortcut=True):
    nn = ghost_module(inputs, exp, add_activation=True)  # ghost1 = GhostModule(in_chs, exp, relu=True)
    if strides > 1:
        # Extra depth conv if strides higher than 1
        nn = DepthwiseConv2D(dwkernel, strides, padding="same", use_bias=False, depthwise_initializer=CONV_KERNEL_INITIALIZER)(nn)
        nn = BatchNormalization(axis=-1)(nn)
        # nn = Activation('relu')(nn)

    if se_ratio > 0:
        # Squeeze and excite
        nn = se_module(nn, se_ratio)  # se = SqueezeExcite(exp, se_ratio=se_ratio)

    # Point-wise linear projection
    nn = ghost_module(nn, out, add_activation=False)  # ghost2 = GhostModule(exp, out, relu=False)
    # nn = BatchNormalization(axis=-1)(nn)

    if shortcut:
        xx = DepthwiseConv2D(dwkernel, strides, padding="same", use_bias=False, depthwise_initializer=CONV_KERNEL_INITIALIZER)(
        )  # padding=(dw_kernel_size-1)//2
        xx = BatchNormalization(axis=-1)(xx)
        xx = Conv2D(out, (1, 1), strides=(1, 1), padding="valid", use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER)(xx)  # padding=0
        xx = BatchNormalization(axis=-1)(xx)
        xx = inputs
    return Add()([xx, nn])

#1.3 is the width of the GhostNet as in the paper (Table 7)
def GhostNet(input_shape=(224, 224, 3), include_top=True, classes=0, width=1.3, strides=2, name="GhostNet"):
    inputs = Input(shape=input_shape)
    out_channel = _make_divisible(16 * width, 4)
    nn = Conv2D(out_channel, (3, 3), strides=strides, padding="same", use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER)(inputs)  # padding=1
    nn = BatchNormalization(axis=-1)(nn)
    nn = activation(nn)
    dwkernels = [3, 3, 3, 5, 5, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5]
    exps = [16, 48, 72, 72, 120, 240, 200, 184, 184, 480, 672, 672, 960, 960, 960, 512]
    outs = [16, 24, 24, 40, 40, 80, 80, 80, 80, 112, 112, 160, 160, 160, 160, 160]
    use_ses = [0, 0, 0, 0.25, 0.25, 0, 0, 0, 0, 0.25, 0.25, 0.25, 0, 0.25, 0, 0.25]
    strides = [1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1]

    pre_out = out_channel
    for dwk, stride, exp, out, se in zip(dwkernels, strides, exps, outs, use_ses):
        out = _make_divisible(out * width, 4) # [ 20 32 32 52 52 104 104 104 104 144 144 208 208 208 208 208 ]
        exp = _make_divisible(exp * width, 4) # [ 20 64 92 92 156 312 260 240 240 624 872 872 1248 1248 1248 664 ]
        shortcut = False if out == pre_out and stride == 1 else True
        nn = ghost_bottleneck(nn, dwk, stride, exp, out, se, shortcut)
        pre_out = out # [ 20 32 32 52 52 104 104 104 104 144 144 208 208 208 208 208 ]

    out = _make_divisible(exps[-1] * width, 4) #664
    nn = Conv2D(out, (1, 1), strides=(1, 1), padding="valid", use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER)(nn)  # padding=0
    nn = BatchNormalization(axis=-1)(nn)
    nn = activation(nn)

    if include_top:
        nn = GlobalAveragePooling2D()(nn)
        nn = Reshape((1, 1, int(nn.shape[1])))(nn)
        nn = Conv2D(1280, (1, 1), strides=(1, 1), padding="same", use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER)(nn)
        nn = BatchNormalization(axis=-1)(nn)
        nn = activation(nn)
        nn = Conv2D(classes, (1, 1), strides=(1, 1), padding="same", use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER)(nn)
        nn = K.squeeze(nn, 1)
        nn = Activation("softmax")(nn)

    return Model(inputs=inputs, outputs=nn, name=name)

B. GhostNetV2

GhostNetV2 introduces significant improvements to the Ghost module of GhostNetV1, aiming to capture long-range dependencies more effectively. The key innovation is the incorporation of a novel attention-based layer called the DFC attention branch, designed to generate attention maps with global receptive fields using convolutions. Unlike traditional self-attention layers, the DFC attention branch achieves high efficiency while capturing dependencies between pixels across different spatial locations. This efficiency is crucial for hardware compatibility and inference speed, as many prior attention modules relied on computationally intensive tensor operations.


GhostNetV2’s architecture features a new bottleneck structure, allowing the Ghost module and DFC attention branch to operate in parallel. It gathers information from various viewpoints and aggregating it into the final output. This feature-wise product ensures comprehensive coverage of input data across various patches.


The DFC attention branch consists of five operations: downsampling, convolution, horizontal and vertical fully connected (FC) layers, and sigmoid activation(Refer the above image). To mitigate computational overhead, we utilize native average pooling for downsampling and bilinear interpolation for upsampling. Decomposing the FC layer into horizontal and vertical components reduces complexity while capturing long-range dependencies along both dimensions.

Overall, GhostNetV2 represents a significant advancement in attention-based models, offering improved efficiency and effectiveness in capturing long-range dependencies. Visual aids such as diagrams illustrating the architecture and operations of the DFC attention branch can improve understanding and engagement for readers. Place these diagrams strategically within the text to complement the explanations and facilitate comprehension.

Implementation with Python Code

Below code is from module from backbones folder

!pip install keras_cv_attention_models
import tensorflow as tf
from tensorflow import keras
from keras_cv_attention_models.attention_layers import (
from keras_cv_attention_models.download_and_load import reload_model_weights

    "ghostnetv2_1x": {"imagenet": "4f28597d5f72731ed4ef4f69ec9c1799"},
    "ghostnet_1x": {"imagenet": "df1de036084541c5b8bd36b179c74577"},

def ghost_module(inputs, out_channel, activation="relu", name=""):
    ratio = 2
    hidden_channel = int(tf.math.ceil(float(out_channel) / ratio))
    primary_conv = conv2d_no_bias(inputs, hidden_channel, name=name + "prim_")
    primary_conv = batchnorm_with_activation(primary_conv, activation=activation, name=name + "prim_")
    cheap_conv = depthwise_conv2d_no_bias(primary_conv, kernel_size=3, padding="SAME", name=name + "cheap_")
    cheap_conv = batchnorm_with_activation(cheap_conv, activation=activation, name=name + "cheap_")
    return keras.layers.Concatenate()([primary_conv, cheap_conv])

def ghost_module_multiply(inputs, out_channel, activation="relu", name=""):
    nn = ghost_module(inputs, out_channel, activation=activation, name=name)

    # shortcut = keras.layers.AvgPool2D(pool_size=2, strides=2, padding="SAME")(inputs)
    shortcut = keras.layers.AvgPool2D(pool_size=2, strides=2)(inputs)
    shortcut = conv2d_no_bias(shortcut, out_channel, name=name + "short_1_")
    shortcut = batchnorm_with_activation(shortcut, activation=None, name=name + "short_1_")
    shortcut = depthwise_conv2d_no_bias(shortcut, (1, 5), padding="SAME", name=name + "short_2_")
    shortcut = batchnorm_with_activation(shortcut, activation=None, name=name + "short_2_")
    shortcut = depthwise_conv2d_no_bias(shortcut, (5, 1), padding="SAME", name=name + "short_3_")
    shortcut = batchnorm_with_activation(shortcut, activation=None, name=name + "short_3_")
    shortcut = activation_by_name(shortcut, "sigmoid", name=name + "short_")
    shortcut = tf.image.resize(shortcut, tf.shape(inputs)[1:-1], antialias=False, method="bilinear")
    return keras.layers.Multiply()([shortcut, nn])

def ghost_bottleneck(
    inputs, out_channel, first_ghost_channel, kernel_size=3, strides=1, se_ratio=0, shortcut=True, use_ghost_module_multiply=False, activation="relu", name=""
    if shortcut:
        shortcut = depthwise_conv2d_no_bias(inputs, kernel_size, strides, padding="same", name=name + "short_1_")
        shortcut = batchnorm_with_activation(shortcut, activation=None, name=name + "short_1_")
        shortcut = conv2d_no_bias(shortcut, out_channel, name=name + "short_2_")
        shortcut = batchnorm_with_activation(shortcut, activation=None, name=name + "short_2_")
        shortcut = inputs

    if use_ghost_module_multiply:
        nn = ghost_module_multiply(inputs, first_ghost_channel, activation=activation, name=name + "ghost_1_")
        nn = ghost_module(inputs, first_ghost_channel, activation=activation, name=name + "ghost_1_")

    if strides > 1:
        nn = depthwise_conv2d_no_bias(nn, kernel_size, strides, padding="same", name=name + "down_")
        nn = batchnorm_with_activation(nn, activation=None, name=name + "down_")

    if se_ratio > 0:
        nn = se_module(nn, se_ratio=se_ratio, divisor=4, activation=("relu", "hard_sigmoid_torch"), name=name + "se_")

    nn = ghost_module(nn, out_channel, activation=None, name=name + "ghost_2_")
    return keras.layers.Add(name=name + "output")([shortcut, nn])

def GhostNetV2(
    num_ghost_module_v1_stacks=2,  # num of `ghost_module` stcks on the head, others are `ghost_module_multiply`, set `-1` for all using `ghost_module`
    input_shape=(224, 224, 3),
    inputs = keras.layers.Input(input_shape)
    stem_width = make_divisible(stem_width * width_mul, divisor=4)
    nn = conv2d_no_bias(inputs, stem_width, 3, strides=stem_strides, padding="same", name="stem_")
    nn = batchnorm_with_activation(nn, activation=activation, name="stem_")

    """ stages """
    kernel_sizes = [3, 3, 3, 5, 5, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5]
    first_ghost_channels = [16, 48, 72, 72, 120, 240, 200, 184, 184, 480, 672, 672, 960, 960, 960, 960]
    out_channels = [16, 24, 24, 40, 40, 80, 80, 80, 80, 112, 112, 160, 160, 160, 160, 160]
    se_ratios = [0, 0, 0, 0.25, 0.25, 0, 0, 0, 0, 0.25, 0.25, 0.25, 0, 0.25, 0, 0.25]
    strides = [1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1]

    for stack_id, (kernel, stride, first_ghost, out_channel, se_ratio) in enumerate(zip(kernel_sizes, strides, first_ghost_channels, out_channels, se_ratios)):
        stack_name = "stack{}_".format(stack_id + 1)
        out_channel = make_divisible(out_channel * width_mul, 4)
        first_ghost_channel = make_divisible(first_ghost * width_mul, 4)
        shortcut = False if out_channel == nn.shape[-1] and stride == 1 else True
        use_ghost_module_multiply = True if num_ghost_module_v1_stacks >= 0 and stack_id >= num_ghost_module_v1_stacks else False
        nn = ghost_bottleneck(
            nn, out_channel, first_ghost_channel, kernel, stride, se_ratio, shortcut, use_ghost_module_multiply, activation=activation, name=stack_name

    nn = conv2d_no_bias(nn, make_divisible(first_ghost_channels[-1] * width_mul, 4), 1, strides=1, name="pre_")
    nn = batchnorm_with_activation(nn, activation=activation, name="pre_")

    if num_classes > 0:
        nn = keras.layers.GlobalAveragePooling2D(keepdims=True)(nn)
        nn = conv2d_no_bias(nn, 1280, 1, strides=1, use_bias=True, name="features_")
        nn = activation_by_name(nn, activation, name="features_")
        nn = keras.layers.Flatten()(nn)
        if dropout > 0 and dropout < 1:
            nn = keras.layers.Dropout(dropout)(nn)
        nn = keras.layers.Dense(num_classes, dtype="float32", activation=classifier_activation, name="head")(nn)

    model = keras.models.Model(inputs, nn, name=model_name)
    add_pre_post_process(model, rescale_mode="torch")
    reload_model_weights(model, PRETRAINED_DICT, "ghostnetv2", pretrained)
    return model

def GhostNetV2_1X(input_shape=(224, 224, 3), num_classes=1000, activation="relu", classifier_activation="softmax", pretrained="imagenet", **kwargs):
    return GhostNetV2(**locals(), model_name="ghostnetv2_1x", **kwargs)

""" GhostNet V1 """

def GhostNet(
    num_ghost_module_v1_stacks=-1,  # num of `ghost_module` stcks on the head, others are `ghost_module_multiply`, set `-1` for all using `ghost_module`
    input_shape=(224, 224, 3),
    return GhostNetV2(**locals())

def GhostNet_1X(input_shape=(224, 224, 3), num_classes=1000, activation="relu", classifier_activation="softmax", pretrained="imagenet", **kwargs):
    return GhostNet(**locals(), model_name="ghostnet_1x", **kwargs)

The Ghost module in GhostNetV1 incorporates the DFC attention branch, while GhostNetV2 employs it.

C. GhostFaceNets Architecture

Building upon the GhostNetV1 architecture, the authors of GhostFaceNets made several key modifications to tailor the model for face recognition and face verification tasks.

GhostFaceNets are a significant advancement in lightweight face recognition and face verification models, incorporating key modifications to improve performance and efficiency. One notable improvement is the use of a modified Ghost Depthwise Convolution layer, replacing the Global Average Pooling layer in image classification models. This allows the network to learn varying weights for different feature map units, enhancing discriminative power and performance.

GhostFaceNets use the Parametric Rectified Linear Unit (PReLU) activation function instead of ReLU, enabling negative activations for complex nonlinear functions learning, improving network performance in face recognition tasks. Convolutions replace conventional FC layers in Squeeze-and-Excitation modules.

GhostFaceNets introduce a novel attention mechanism within SE modules, improving channel interdependencies at minimal computational cost. This mechanism adjusts channel weight to prioritize important features and reduces sensitivity to less relevant ones, offering flexibility in downsampling strategies.

GhostFaceNets variants design with configurable backbones, width multipliers, and stride parameters for generalization and adaptability. Experiments with hyperparameters and training datasets, including MS1MV2 and MS1MV3, optimize performance using ArcFace training loss function, minimizing intra-class gap and enhancing inter-class differentiation.

Requirements to Run Python Code

Please use the below requirements to run the code, python version is 3.9.12: 

  • TensorFlow==2.8.0
  • Keras==2.8.0
  • keras_cv_attention_models
  • glob2
  • pandas
  • tqdm
  • scikit-image

Implementation with Python Code

Below code is from module from main folder.

import tensorflow as tf
from tensorflow import keras
import tensorflow.keras.backend as K

def __init_model_from_name__(name, input_shape=(112, 112, 3), weights="imagenet", **kwargs):
    name_lower = name.lower()
    """ Basic model """
    if name_lower == "ghostnetv1":
        from backbones import ghost_model

        xx = ghost_model.GhostNet(input_shape=input_shape, include_top=False, width=1, **kwargs)
    elif name_lower == "ghostnetv2":
        from backbones import ghostv2

        xx = ghostv2.GhostNetV2(stem_width=16,
                                num_ghost_module_v1_stacks=2,  # num of `ghost_module` stcks on the head, others are `ghost_module_multiply`, set `-1` for all using `ghost_module`
                                input_shape=(112, 112, 3),

        return None
    xx.trainable = True
    return xx

def buildin_models(
    input_shape=(112, 112, 3),
    if isinstance(stem_model, str):
        xx = __init_model_from_name__(stem_model, input_shape, weights, **kwargs)
        name = stem_model
        name =
        xx = stem_model

    if bn_momentum != 0.99 or bn_epsilon != 0.001:
        print(">>>> Change BatchNormalization momentum and epsilon default value.")
        for ii in xx.layers:
            if isinstance(ii, keras.layers.BatchNormalization):
                ii.momentum, ii.epsilon = bn_momentum, bn_epsilon
        xx = keras.models.clone_model(xx)

    inputs = xx.inputs[0]
    nn = xx.outputs[0]

    if add_pointwise_conv:  # Model using `pointwise_conv + GDC` / `pointwise_conv + E` is smaller than `E`
        filters = nn.shape[-1] // 2 if add_pointwise_conv == -1 else 512  # Compitable with previous models...
        nn = keras.layers.Conv2D(filters, 1, use_bias=False, padding="valid", name="pw_conv")(nn)
        nn = keras.layers.BatchNormalization(momentum=bn_momentum, epsilon=bn_epsilon, name="pw_bn")(nn)
        if pointwise_conv_act.lower() == "prelu":
            nn = keras.layers.PReLU(shared_axes=[1, 2], name="pw_" + pointwise_conv_act)(nn)
            nn = keras.layers.Activation(pointwise_conv_act, name="pw_" + pointwise_conv_act)(nn)
    """ GDC """
    nn = keras.layers.DepthwiseConv2D(nn.shape[1], use_bias=False, name="GDC_dw")(nn)
    nn = keras.layers.BatchNormalization(momentum=bn_momentum, epsilon=bn_epsilon, name="GDC_batchnorm")(nn)
    if dropout > 0 and dropout < 1:
        nn = keras.layers.Dropout(dropout)(nn)
    nn = keras.layers.Conv2D(emb_shape, 1, use_bias=use_bias, kernel_initializer="glorot_normal", name="GDC_conv")(nn)
    nn = keras.layers.Flatten(name="GDC_flatten")(nn)
    embedding = keras.layers.BatchNormalization(momentum=bn_momentum, epsilon=bn_epsilon, scale=scale, name="pre_embedding")(nn)
    embedding_fp32 = keras.layers.Activation("linear", dtype="float32", name="embedding")(embedding)

    basic_model = keras.models.Model(inputs, embedding_fp32,
    return basic_model

def add_l2_regularizer_2_model(model, weight_decay, custom_objects={}, apply_to_batch_normal=False, apply_to_bias=False):
    if 0:
        regularizers_type = {}
        for layer in model.layers:
            rrs = [kk for kk in layer.__dict__.keys() if "regularizer" in kk and not kk.startswith("_")]
            if len(rrs) != 0:
                # print(, layer.__class__.__name__, rrs)
                if layer.__class__.__name__ not in regularizers_type:
                    regularizers_type[layer.__class__.__name__] = rrs

    for layer in model.layers:
        attrs = []
        if isinstance(layer, keras.layers.Dense) or isinstance(layer, keras.layers.Conv2D):
            # print(">>>> Dense or Conv2D",, "use_bias:", layer.use_bias)
            attrs = ["kernel_regularizer"]
            if apply_to_bias and layer.use_bias:
        elif isinstance(layer, keras.layers.DepthwiseConv2D):
            # print(">>>> DepthwiseConv2D",, "use_bias:", layer.use_bias)
            attrs = ["depthwise_regularizer"]
            if apply_to_bias and layer.use_bias:
        elif isinstance(layer, keras.layers.SeparableConv2D):
            attrs = ["pointwise_regularizer", "depthwise_regularizer"]
            if apply_to_bias and layer.use_bias:
        elif apply_to_batch_normal and isinstance(layer, keras.layers.BatchNormalization):
            if layer.scale:
        elif apply_to_batch_normal and isinstance(layer, keras.layers.PReLU):
            attrs = ["alpha_regularizer"]

        for attr in attrs:
            if hasattr(layer, attr) and layer.trainable:
                setattr(layer, attr, keras.regularizers.L2(weight_decay / 2))
    return keras.models.clone_model(model)

def replace_ReLU_with_PReLU(model, target_activation="PReLU", **kwargs):
    from tensorflow.keras.layers import ReLU, PReLU, Activation

    def convert_ReLU(layer):
        # print(
        if isinstance(layer, ReLU) or (isinstance(layer, Activation) and layer.activation == keras.activations.relu):
            if target_activation == "PReLU":
                layer_name ="_relu", "_prelu")
                print(">>>> Convert ReLU:",, "-->", layer_name)
                # Default initial value in mxnet and pytorch is 0.25
                return PReLU(shared_axes=[1, 2], alpha_initializer=tf.initializers.Constant(0.25), name=layer_name, **kwargs)
            elif isinstance(target_activation, str):
                layer_name ="_relu", "_" + target_activation)
                print(">>>> Convert ReLU:",, "-->", layer_name)
                return Activation(activation=target_activation, name=layer_name, **kwargs)
                act_class_name = target_activation.__name__
                layer_name ="_relu", "_" + act_class_name)
                print(">>>> Convert ReLU:",, "-->", layer_name)
                return target_activation(**kwargs)
        return layer

    input_tensors = keras.layers.Input(model.input_shape[1:])
    return keras.models.clone_model(model, input_tensors=input_tensors, clone_function=convert_ReLU)

def convert_to_mixed_float16(model, convert_batch_norm=False):
    policy = keras.mixed_precision.Policy("mixed_float16")
    policy_config = keras.utils.serialize_keras_object(policy)
    from tensorflow.keras.layers import InputLayer, Activation
    from tensorflow.keras.activations import linear, softmax

    def do_convert_to_mixed_float16(layer):
        if not convert_batch_norm and isinstance(layer, keras.layers.BatchNormalization):
            return layer
        if isinstance(layer, InputLayer):
            return layer
        if isinstance(layer, Activation) and layer.activation == softmax:
            return layer
        if isinstance(layer, Activation) and layer.activation == linear:
            return layer

        aa = layer.get_config()
        aa.update({"dtype": policy_config})
        bb = layer.__class__.from_config(aa)
        return bb

    input_tensors = keras.layers.Input(model.input_shape[1:])
    mm = keras.models.clone_model(model, input_tensors=input_tensors, clone_function=do_convert_to_mixed_float16)
    if model.built:
        mm.compile(optimizer=model.optimizer, loss=model.compiled_loss, metrics=model.compiled_metrics)
        # mm.optimizer, mm.compiled_loss, mm.compiled_metrics = model.optimizer, model.compiled_loss, model.compiled_metrics
        # mm.built = True
    return mm

def convert_mixed_float16_to_float32(model):
    from tensorflow.keras.layers import InputLayer, Activation
    from tensorflow.keras.activations import linear

    def do_convert_to_mixed_float16(layer):
        if not isinstance(layer, InputLayer) and not (isinstance(layer, Activation) and layer.activation == linear):
            aa = layer.get_config()
            aa.update({"dtype": "float32"})
            bb = layer.__class__.from_config(aa)
            return bb
        return layer

    input_tensors = keras.layers.Input(model.input_shape[1:])
    return keras.models.clone_model(model, input_tensors=input_tensors, clone_function=do_convert_to_mixed_float16)

def convert_to_batch_renorm(model):
    def do_convert_to_batch_renorm(layer):
        if isinstance(layer, keras.layers.BatchNormalization):
            aa = layer.get_config()
            aa.update({"renorm": True, "renorm_clipping": {}, "renorm_momentum": aa["momentum"]})
            bb = layer.__class__.from_config(aa)
            bb.set_weights(layer.get_weights() + bb.get_weights()[-3:])
            return bb
        return layer

    input_tensors = keras.layers.Input(model.input_shape[1:])
    return keras.models.clone_model(model, input_tensors=input_tensors, clone_function=do_convert_to_batch_renorm)

Key Features and Benefits of GhostFaceNets

  • Lightweight and Efficient: GhostFaceNets leverage efficient GhostNet architectures and modules, ideal for real-time, mobile, and embedded deployment.
  • Accurate and Robust: They deliver accurate and robust face recognition and verification performance, outperforming many state-of-the-art models on different benchmarks.
  • Modified GDC Recognition Head: The modified GDC recognition head generates discriminative feature vectors, enhancing the model’s performance.
  • PReLU Activation: The use of PReLU as the nonlinear activation function alleviates the vanishing gradient problem. It also improves performance compared to ReLU.
  • Attention-based Enhancements: Incorporating the DFC attention branch in GhostNetV2 enhances performance by capturing long-range dependencies and contextual information.

Experimental Validation and Performance Metrics

The authors of GhostFaceNets rigorously tested the model’s performance on different benchmark datasets, including the widely-acclaimed Labeled Faces in the Wild (LFW) and YouTube Faces (YTF) datasets. The results were great, with GhostFaceNets achieving state-of-the-art performance while maintaining a majorly smaller model size and lower computational complexity compared to existing face recognition models.

Applications and Future Prospects

GhostFaceNets opens up a world of possibilities like:

  • Face recognition applications on edge devices.
  • From secure user authentication on mobile devices to intelligent surveillance systems.
  • The potential applications are vast and diverse.

As the demand for edge computing and real-time face recognition continues to grow, GhostFaceNets represents a major step forward in the field, paving the way for future advancements and innovations. Researchers and engineers can build upon this groundbreaking work, exploring new architectures, optimization techniques, and applications to further push the boundaries of efficient and accurate face recognition.


GhostFaceNets is a groundbreaking engineering innovation that uses deep learning techniques and edge computing to create lightweight face recognition models. It uses ghost modules to deliver accurate and robust recognition capabilities while maintaining a computationally efficient footprint. As the world embraces ubiquitous computing and the Internet of Things, GhostFaceNets is a beacon of innovation. Integrating face recognition technology into daily life to improve experiences and security without sacrificing performance or efficiency.

Key Takeaways

  • GhostFaceNets is a groundbreaking advancement in lightweight face recognition. It balances efficiency and accuracy, making it ideal for deploying facial recognition technology on devices with limited computational resources.
  • The architecture enhances face recognition efficiency and effectiveness by incorporating Ghost modules, DFC attention branch, and PReLU activation. It also ensuring accuracy without compromising effectiveness.
  • DFC attention branch in GhostNetV2 efficiently captures long-range dependencies, enhancing contextual understanding with minimal computational burden.
  • GhostFaceNets excel on benchmarks, with compact sizes and efficient computation, ideal for real-world applications.

Frequently Asked Questions

Q1. How does GhostFaceNets achieve efficiency in face recognition?

A. GhostFaceNets achieves efficiency through innovative architectural enhancements, leveraging Ghost modules, modified GDC recognition heads, and attention-based mechanisms like the DFC attention branch. These optimizations reduce computational complexity while maintaining accuracy.

Q2. What sets GhostFaceNets apart from traditional face recognition models?

A. GhostFaceNets distinguishes itself by balancing efficiency and accuracy. Unlike traditional models requiring substantial computational resources, GhostFaceNets uses lightweight architectures and attention mechanisms to achieve high performance on edge devices.

Q3. What are some key features of GhostFaceNets architecture?

A. GhostFaceNets architecture includes Ghost modules for efficient feature map generation and modified GDC recognition heads for discriminative feature vectors. It also employs PReLU activation and attention-based mechanisms like the DFC attention branch for capturing dependencies.

Q4. How was GhostFaceNets validated and evaluated?

A. GhostFaceNets excelled on LFW and YTF, showing better performance with smaller sizes and less complexity.


The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers