What were you thinking when you choose your Data Scientist Profile?
There are many data science fields, you will have to work closely with your business to identify issues. You will get many articles telling tips for creating a solid profile, but no one will tell you which profile you should pick as your entire professional career depends upon it.
In today’s article, I want to share four significant reasons for going through different data scientist profiles before you frankly select one for you:
Reason 1: Cultivating Self-Awareness
– I want you to think about who you are now when it comes to data science.
— I want you to think about your goals regarding data science and how you would like your data scientist profile to change over the next 6 months.
Become a specialist in one thing or a generalist? or some mix? There are career benefits and disadvantages to each nonetheless of whether you’re in academia or industry.
Reason 2: Illustrate the Importance of Standardization in Visualization
I wanted to reveal standardizing visualizations of users as a mix of characteristics. (You should think about how you will do it, and then also question yourself whether you think a standardized visualization has any significance.)
In this particular case:
(a) Standardizing The X-Axis: I used the main buckets that I thought were approximately some of the skills one lacks as a data scientist. I’m not tied to these buckets, but it seemed helpful in the starting days, and we can revise this going forward.
The chosen buckets- “Data Viz,” “Software Engineer,” “Math, “Statistics,” “Machine Learning(ML),” “Communication skills,” and “Field expertise” are convenient and contestable.
Also, I said, “maybe software engineer should be CS, I don’t know,” and then didn’t really make a decision, and you didn’t seem to mind (thanks!), but it did result in some people having different labels than others.
I pointed out that we had to evaluate whether the labels would be ordered or not. One way would be to go from left to right in terms of harder to softer skills. But felt stating Software Engineering was a more complex (more technical) skill than ML or Mathematics was problematic.
Alternatively, we could believe ordering according to the “data science pipeline,” starting with engineering, moving towards analysis with math, statistics, ML (would have to choose an order), and then moving into visualization, reporting, storytelling, and communication.
The complexity of the pipeline makes left to proper ordering non-obvious. So rather than resolve this at the moment because I could see it going either of several ways, I decided to not think of them as requested.
So once we think we are not interpreting them as instructed, we have to be careful not to see patterns that aren’t there but are just a manifestation of the (arbitrarily) selected order.
Also, some people in the industry might feel that I wasn’t being granular or broad enough, depending on their structure of reference. So I believe this is flawed, but again you have to start somewhere, and usually someplace reasonably uncomplicated, and that’s part of EDA!
(b) Standardizing The Y-Axis: I drew my profile on the panel and showed my data scientist profile when I completed my bachelor’s and how it changed after working on a great data science team learning from my collaborators and colleagues.
Here the comparison is before and after. I decided not to label the scale because I didn’t want my notion of expertise to influence you. One man’s specialty is another man’s poser.
A student just learning this stuff has a different scale than someone who has been doing this for years. Each would have a different interpretation of “expertise,” reflecting over-or under-confidence.
So we have to accept that our scales will be subjective if we label them. (We should think about what it would mean to standardize the scale. How would we do it? What would the consequences of it be? How do we define “expert”?)
Reason 3: Our First Step to Thinking about Data Science Teams
I want you to join a data science community. One way to think about going about it would be to combine complementary profiles. It helps you understand the role, meet like-minded people and learn beforehand.
Reason 4: Demonstrate your Thought Process before you do EDA
It’s a mix of intuition and math/stats know-how. I first came up with a simple, standardized visualization, which I could then compare different profiles. The lack of standardization means I would try to focus on relative conditions. Did I know what I would see before I did it? No. But I had a hunch that some of the following would happen:
(a) I’d discover something new
(b) I’d witness natural clusters of profiles. Some people are similar to each other. (Think: what does “similar” mean? What is the “distance” between two profiles? How do I measure similarity?)
(c) I’d obtain a sense of the distribution across profiles
(d) I’d begin getting an intuition for joining a data science community.
(e) I’d begin thinking of machine learning or analysis problems I could potentially work on with this data set or a generalized version of it.
Just let your imagination go here as a data scientist. How would you use these profiles or something along these bars as a method to think about or construct functional teams?
My Meta-thoughts And Analysis Before You Show The Results
My thoughts about this, who I am as a data scientist, my strengths relative to others, and what I contribute to a team have been shaped and influenced by many conversations I’ve had with my collaborator, mentors and friends.
Final Things for you to Think About
Thought experiment: Generalize this problem by visualizing a team rather than a person.
Thought experiment: Some data sets could be millions of users/humans. (unlikely to be a set of millions of potential data scientists!). So how would you think about scaling this process? Is there a difference in what you would do if the numbers were self-reported vs. logged user actions on a website?
Think of a social networking or online dating website to get concrete about this. How would you explore a data set of users and their attributes? If the attributes were self-reported attributes like “how happy are you on a scale of 1-10″, how would you handle the subjectivity of “10”? How would you visualize it, cluster it, represent the distribution over it?
Scaling also suggests that you start by sampling and doing it by eye yourself to gain intuition, but then build an algorithm to automate. (This is an example of machine learning)
Also, remind yourself that I asked you to question standardization and think about how having un-standardized input might impact all this. Does the importance of standardization change for you when we are dealing with smaller data sets vs millions?
I hope this article was helpful for you to understand the importance of visualization and EDA before you select any Data Scientist Profile.
Thanks for reading my article on data scientist profile, and have a good day 🙂
Read the latest articles on our blog.
I am a Data Scientist with a Bachelors’s degree in computer science specializing in Machine Learning, Artificial Intelligence, and Computer Vision. Mrinal is also a freelance blogger, author, and geek with five years of experience in his work. With a background working through most areas of computer science, I am currently pursuing Masters in Applied Computing with a specialization in AI from the University of Windsor, and I am a Freelance content writer and content analyst.
Connect with me on my social media profiles and follow me for a quick virtual cup of coffee.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.