The Dilution and Death of the Data Scientist
Why one title to rule them all is no longer a valid approach
A few weeks ago, I stumbled upon a social media post where one individual vented their frustrations on the large variety of titles currently present in the data field. The comment had gained quite a bit of traction and shed some light on the perplexing state of data science. While I was unable to track down the original content, I’ll paraphrase the message I remember:
“Data Scientist. Machine Learning Engineer. Research Scientist. Applied Scientist. Machine Learning Scientist. There are too many titles for Data Scientists! We should end the confusion and just call everyone a Data Scientist.”
In the past, this may have sufficed, but with how the field stands today, I’m afraid I can not agree with this position. With the face of data science evolving, I believe granular titling is more necessary than ever. To understand why I will revisit the perception of what data science was in the past and propose how we should think about the term today. I will outline why the suggestion around uniform titling is only a catalyst for confusion. Finally, I will speculate why the preceding post’s author could have an ulterior motive, even if subconsciously, in their messaging.
Historical Context
In the earlier days of the data gold rush, there was a silent but well-defined set of expectations within the data science community as to what the role entailed. The scope was narrow, with the focal point being machine learning and modeling. As the field expanded to include additional proficiencies, many people, myself included, recoiled a bit and started to act as gatekeepers to those who could and could not work as data scientists. Do they know statistics? No? Not a data scientist! Have they built a machine learning model? No? Not a data scientist! In retrospect, this view was flawed and railed against the very thing we all deeply wanted; a world where data science could profoundly impact organizations. Instead of balking, we should have focused our energy on helping companies understand these nuances within data science. But how could we when our stance on data science was defensive? Was our definition of data science defective?
What is Data Science?
If we toss the historical context aside and only focus on the two words, data and science, the possibilities are numerous; if a person builds experiments with data or runs experiments and collects data, one could argue that they are doing data science. Technically, it doesn’t make sense to partition out any set of data methods and exclaim: “that isn’t real data science!” While this pushes us away from a biased opinion of data science, this expansive vision still has an unfavorable effect. It complicates the perception of data science work amongst others.
Imagine for a moment the mess this creates amongst other departments; based on their exposure and interaction with different data scientists, each person will have a unique perception of how data science could help their organization. Due to this, there are dozens of ideas about what data science entails. Here are a few examples:
- “We need someone to help us set up A/B tests and assess them.”
- “We need someone to do cutting-edge research on images.”
- “We need someone to make us pretty dashboards.”
- “We need someone to build an API interface for live predictions.”
- “We need someone to cluster our customers into groups.”
- “We need someone to construct recommendation systems.”
- “We need someone, anyone, to just pull some data we need.”
- “We need someone to teach this autonomous vehicle to drive.”
- “We need someone to create a deck and present it to the team.”
- “We need someone who is a natural language processing expert.”
- “We need someone who will tell us if this customer will leave.”
- “We need someone who can train robots to replace our employees.”
Unfortunately, this range of data science perspectives has opened a floodgate that crushes many data scientists. While there are unicorns in the field, it is unrealistic to expect a single data scientist to have the skills and interest to solve all of the above problems handily. To fix this, we need to find a way to organize work that helps companies understand, at the very least, the capabilities of their data teams so they can adequately seek the proper assistance to make their products better. The quickest path to achieve this is to move away from a single ambiguous data scientist title; to reenvision data science on a more comprehensive spectrum.
The Data Science Spectrum
If we selected two axes to plot data science expertise onto, one possibility is to use a range of job responsibilities from business -> engineering against applied -> research. Although not in equal strides, certain technical aptitudes correlate with each axis — don’t overthink it, though; this is an informal construction to make a quick point! If we were to plot out some relevant titles, it might look something like this:
Individuals who focus on applied business work fall into the bottom left corner, historically having data or business analyst titles. In comparison, the bottom right corner could represent software engineers. Along the top axis, we would find a variety of research scientists. We can pepper in other standard titles in their rough locations. Statisticians, perhaps those focused on inference, are more business-oriented than most but are more likely to skew towards research than an analyst. Decision scientists exist in a similar space. Machine learning engineers would be more research-focused than software engineers, and machine learning scientists would also be research-oriented, except with less engineering expertise.
However, the question remains: where would we place the data scientist title? Even though our argument suggests data science as a field should encompass the entire chart, the current distribution of individuals with a data scientist title should have a focal point on the spectrum. My answer is that it depends on the year — the typical job responsibilities of a data scientist have been changing, and the title has slowly migrated from a central location to the bottom left; something like below:
What is causing this drift? Are the data scientists of yore just now realizing that being more applied and business-focused is the key to success, or are there additional forces at play?
Death to the Data Scientists of Yore
I would speculate that the drift is due to two significant factors. The first factor was a massive rise in news coverage and popularity. Data became a centerpiece of business articles emphasizing how to build a successful company. The data scientist title was topping many job lists, and HBR declared it the sexiest job of the 21st century. Of course, companies didn’t want to miss the boat, or at the very least, they didn’t want their investors to believe they had missed the boat. Many pushed to hire more data-focused roles or attempted to transform pre-existing functions to ensure they were active in this “data revolution.” Some companies had the expertise to pull this off, but many lacked leadership with enough exposure to data initiatives and struggled to grow these “new” organizations. With data science becoming the latest rage, the data scientist title quickly exploded, diluting the initial identity of the role.
The second factor, which is a consequence of the first, was that many individuals felt required to rebrand themselves in a world that seemed to be more heavily rewarding “data science” than the currently less shiny world of analytics. As the role gained popularity, many data analysts, business analysts, and business intelligence individuals started labeling themselves as data scientists, either officially due to company title changes or disingenuously to gain a better impression of their credentials.
The two factors above worked together to create a vicious cycle, provoking the death of the former data scientist title.
Data Science is Alive
While the identity of the data scientist may have perished many years ago, data science is very much alive. As a way to address title dilution, more descriptive roles started to pop up — titles explicitly mentioned machine learning, titles noted the scale of engineering work required, and titles determined whether the position was more research-focused or user-focused. Each of these additions has helped candidates seek new jobs relevant to them. Even though there is still education to be done, other teams within a company are slowly starting to understand this additional granularity, further unlocking their potential to use data effectively.
I believe the spectrum of data science outlined above paints a more expressive and healthier picture of the entire landscape than ever before. In the past, in-fighting amongst the old guard and new data science inductees caused unnecessary pain. However, as we look to the future, it is essential to remind ourselves that we each have strengths to grow the field of data science and opportunities to grow as individuals. We should not interpret more accurate titling as a cause for confusion but celebration.
Final note: Who would suggest sticking to one title?
My original focus was to take a spicier stance against the post’s author, criticizing their underlying intent. As I wrote down those thoughts, I ended up with a more holistic view of the titling landscape as laid out above. However, I still would like to address who would benefit from non-existent title granularity.
While my recollection of the original message is somewhat fuzzy, I am relatively sure of two things:
- The post did not mention data analysts, business analysts, business intelligence, and other typical roles associated with “data science.”
- The person posting this complaint had, at least on Linkedin, the title of a data scientist.
These may seem like minor details, but they indicate an unhealthy and egocentric mindset about dispersing titles. Let me unravel that a bit.
If you’re making a post emphasizing the number of titles, why would you leave some of them out? You would think that laying out more tags would further prove your point on the overall confusion that titling brings. On top of that, why are all the missing ones from the lower left quadrant — roles that consumed the data scientist title and made it their own in the first place?
It appears that history is trying to repeat itself. It is human nature to upsell one’s skills, but the tactic this time can’t be the same as before. After all, the roles across the rest of the spectrum are not quite as ambiguous as “data scientist.” You would have to squint pretty hard to confuse a data analyst with a research scientist! However, that is what we see; a pursuit in being perceived as an all-powerful wizard of data. Some people want to collapse all the skills on the spectrum back into a single title to accomplish that. Even worse, there is a far greater incentive to dilute the overall spectrum when one has fewer skills across the board altogether.
The truth is if you are discouraged with your current title, why not just make up another one? It is the current meta and requires less smoke and mirrors. However, you could also start learning the skills you want others to perceive you as having, but that might require some additional work. It is a toss-up, but the choice is up to you!
P.S. — if anyone happens to know the social media post I’m referencing, please leave me a message so I can correctly attribute the origin and update this article. I am sure there are additional iterations of the same statement, and I’d love to know about those as well!