What were you thinking when you choose your Data Scientist Profile?

Mrinal Singh Last Updated : 07 Mar, 2022

6 min read

This article was published as a part of the Data Science Blogathon.

Source: Unsplash.com

There are many data science fields, you will have to work closely with your business to identify issues. You will get many articles telling tips for creating a solid profile, but no one will tell you which profile you should pick as your entire professional career depends upon it.

In today’s article, I want to share four significant reasons for going through different data scientist profiles before you frankly select one for you:

Reason 1: Cultivating Self-Awareness

https://unsplash.com/@jareddrice

– I want you to think about who you are now when it comes to data science.
— I want you to think about your goals regarding data science and how you would like your data scientist profile to change over the next 6 months.

Become a specialist in one thing or a generalist? or some mix? There are career benefits and disadvantages to each nonetheless of whether you’re in academia or industry.

Reason 2: Illustrate the Importance of Standardization in Visualization

https://unsplash.com/@goumbik

I wanted to reveal standardizing visualizations of users as a mix of characteristics. (You should think about how you will do it, and then also question yourself whether you think a standardized visualization has any significance.)

In this particular case:

(a) Standardizing The X-Axis: I used the main buckets that I thought were approximately some of the skills one lacks as a data scientist. I’m not tied to these buckets, but it seemed helpful in the starting days, and we can revise this going forward.

The chosen buckets- “Data Viz,” “Software Engineer,” “Math, “Statistics,” “Machine Learning(ML),” “Communication skills,” and “Field expertise” are convenient and contestable.

Also, I said, “maybe software engineer should be CS, I don’t know,” and then didn’t really make a decision, and you didn’t seem to mind (thanks!), but it did result in some people having different labels than others.

I pointed out that we had to evaluate whether the labels would be ordered or not. One way would be to go from left to right in terms of harder to softer skills. But felt stating Software Engineering was a more complex (more technical) skill than ML or Mathematics was problematic.

Alternatively, we could believe ordering according to the “data science pipeline,” starting with engineering, moving towards analysis with math, statistics, ML (would have to choose an order), and then moving into visualization, reporting, storytelling, and communication.

The complexity of the pipeline makes left to proper ordering non-obvious. So rather than resolve this at the moment because I could see it going either of several ways, I decided to not think of them as requested.

So once we think we are not interpreting them as instructed, we have to be careful not to see patterns that aren’t there but are just a manifestation of the (arbitrarily) selected order.

Also, some people in the industry might feel that I wasn’t being granular or broad enough, depending on their structure of reference. So I believe this is flawed, but again you have to start somewhere, and usually someplace reasonably uncomplicated, and that’s part of EDA!

(b) Standardizing The Y-Axis: I drew my profile on the panel and showed my data scientist profile when I completed my bachelor’s and how it changed after working on a great data science team learning from my collaborators and colleagues.

Here the comparison is before and after. I decided not to label the scale because I didn’t want my notion of expertise to influence you. One man’s specialty is another man’s poser.

A student just learning this stuff has a different scale than someone who has been doing this for years. Each would have a different interpretation of “expertise,” reflecting over-or under-confidence.

So we have to accept that our scales will be subjective if we label them. (We should think about what it would mean to standardize the scale. How would we do it? What would the consequences of it be? How do we define “expert”?)

Reason 3: Our First Step to Thinking about Data Science Teams

I want you to join a data science community. One way to think about going about it would be to combine complementary profiles. It helps you understand the role, meet like-minded people and learn beforehand.

Reason 4: Demonstrate your Thought Process before you do EDA

It’s a mix of intuition and math/stats know-how. I first came up with a simple, standardized visualization, which I could then compare different profiles. The lack of standardization means I would try to focus on relative conditions. Did I know what I would see before I did it? No. But I had a hunch that some of the following would happen:
(a) I’d discover something new
(b) I’d witness natural clusters of profiles. Some people are similar to each other. (Think: what does “similar” mean? What is the “distance” between two profiles? How do I measure similarity?)
(c) I’d obtain a sense of the distribution across profiles
(d) I’d begin getting an intuition for joining a data science community.
(e) I’d begin thinking of machine learning or analysis problems I could potentially work on with this data set or a generalized version of it.

Just let your imagination go here as a data scientist. How would you use these profiles or something along these bars as a method to think about or construct functional teams?

My Meta-thoughts And Analysis Before You Show The Results

My thoughts about this, who I am as a data scientist, my strengths relative to others, and what I contribute to a team have been shaped and influenced by many conversations I’ve had with my collaborator, mentors and friends.

Final Things for you to Think About

Thought experiment: Generalize this problem by visualizing a team rather than a person.

Thought experiment: Some data sets could be millions of users/humans. (unlikely to be a set of millions of potential data scientists!). So how would you think about scaling this process? Is there a difference in what you would do if the numbers were self-reported vs. logged user actions on a website?

Think of a social networking or online dating website to get concrete about this. How would you explore a data set of users and their attributes? If the attributes were self-reported attributes like “how happy are you on a scale of 1-10″, how would you handle the subjectivity of “10”? How would you visualize it, cluster it, represent the distribution over it?

Scaling also suggests that you start by sampling and doing it by eye yourself to gain intuition, but then build an algorithm to automate. (This is an example of machine learning)

Also, remind yourself that I asked you to question standardization and think about how having un-standardized input might impact all this. Does the importance of standardization change for you when we are dealing with smaller data sets vs millions?

Final Words

I hope this article was helpful for you to understand the importance of visualization and EDA before you select any Data Scientist Profile.

Thanks for reading my article on data scientist profile, and have a good day 🙂

Read the latest articles on our blog.

About Author

I am a Data Scientist with a Bachelors’s degree in computer science specializing in Machine Learning, Artificial Intelligence, and Computer Vision. Mrinal is also a freelance blogger, author, and geek with five years of experience in his work. With a background working through most areas of computer science, I am currently pursuing Masters in Applied Computing with a specialization in AI from the University of Windsor, and I am a Freelance content writer and content analyst.

Connect with me on my social media profiles and follow me for a quick virtual cup of coffee.

LinkedIn | Github | Email | Medium | Instagram | Facebook | Portfolio

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Mrinal Singh

Data Scientist and a Technical Writer! I will give you the best of Open-Source and AI.

Talks about #chatgpt, #opensource, #contentcreation, #communitybuilding, and #artificialintelligence

Technical Writer | Data Science, ML, AI, Open-Source | Do More with Data - Litmus

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

What were you thinking when you choose your Data Scientist Profile?

Reason 1: Cultivating Self-Awareness

Reason 2: Illustrate the Importance of Standardization in Visualization

Reason 3: Our First Step to Thinking about Data Science Teams

Reason 4: Demonstrate your Thought Process before you do EDA

Final Things for you to Think About

Final Words

About Author

LinkedIn | Github | Email | Medium | Instagram | Facebook | Portfolio

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

What were you thinking when you choose your Data Scientist Profile?

Reason 1: Cultivating Self-Awareness

Reason 2: Illustrate the Importance of Standardization in Visualization

Reason 3: Our First Step to Thinking about Data Science Teams

Reason 4: Demonstrate your Thought Process before you do EDA

Final Things for you to Think About

Final Words

About Author

LinkedIn | Github | Email | Medium | Instagram | Facebook | Portfolio

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques