Guest Blog — September 16, 2020
Beginner Machine Learning NLP Python Technique

Introduction

Machine Learning Cyber Security

machine learning cyber security

An example of how the dataset looks like — the 3 entries listed above are referring to the same software

 

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer# Import the dataset 
df=pd.read_csv("your dataset") # Extract the Manufacturer into List 
field_extracted = df['softwarename']# initialize the TF-IDF 
vectorizer = TfidfVectorizer(analyzer='char')
vectors = vectorizer.fit_transform(field_extracted)
feature_names = vectorizer.get_feature_names()
dense = vectors.todense()
denselist = dense.tolist()
result = pd.DataFrame(denselist, columns=feature_names)

machine learning cyber security

The result from the TF-IDF scripts above (with a mix of different languages e.g. Korean, Chinese)
import math
from collections import Counter# Function of calculating Entropy 
def eta(data, unit='natural'):
    base = {
        'shannon' : 2.,
        'natural' : math.exp(1),
        'hartley' : 10.
    }if len(data) <= 1:
        return 0counts = Counter()for d in data:
        counts[d] += 1ent = 0probs = [float(c) / len(data) for c in counts.values()]
    for p in probs:
        if p > 0.:
            ent -= p * math.log(p, base[unit])return ententropy  = [eta(x) for x in field_extracted]

About the Author

Elaine Hung
Elaine is a machine learning enthusiast, digital forensic and incident response consultant. Interested in applying ML and NLP on cyber security topics.

Leave a Reply Your email address will not be published. Required fields are marked *