November 7, 2023 Will Data Scientists be Replaced by Artificial Intelligence?
It’s been nearly a year since ChatGPT took the world by storm, impressing industry experts and casual users alike with its ability to quickly generate paragraphs of text in response to a wide array of prompts. ChatGPT can write code, poems, songs, summarize information, invent stories, and answer quizzes, and AI image generators like Dall-E and Stable Diffusion produce hyper-realistic images. With the advent and broad adoption of these tools, many articles have been published speculating whether they will spell the end of entire professions. Generative AI’s ability to metabolize large amounts of data and ascertain patterns makes it powerful. Its accessibility to anyone with an internet connection makes it more useful, and potentially more threatening, than other Large Language Models (LLMs) of the past. As AI rewrites roles and shifts expectations for knowledge workers in many industries, what effect will it have in the field of data science?
ChatGPT will be a compliment to data science, not replace it. Its ability to automate many of the lower-level tasks that are foundational to higher-level data science problem solving makes it useful in speeding up work, but it won’t (yet) replace human ingenuity. For example, while AI can be used for data collection (scraping the web), data scientists often need to prepare the data. That means getting it into the right format and defining what the task is by assigning labels. Let’s quickly run through some of AI strengths and limitations to see how it might impact the jobs of data scientists:
- As noted, AI works fast. It can do complex tasks in seconds that would take humans hours or days to complete.
- AI is accessible and available to work 24/7 (barring outages).
- AI can assist Data Scientists in producing hundreds or thousands of options or variations of models to ultimately select the best one.
- Generative AI’s penchant for what’s called hallucination – making up facts and figures – makes it risky to use for professional purposes.
- ChatGPT is not open source and cannot be run locally. This means you have to upload your potentially private data to a web server, which is not safe. Although there are models that are open source and can be run locally, they are not as good yet.
- The most advanced Large Language Models and Natural Language Processing tools today are no doubt impressive, but all of their intelligence comes from the same place – data in the past. Genuine novelty remains out of reach (for the time being).
Data professionals fall into two main groups – Data Engineers and Data Scientists. Data Engineers extract and prepare data for use by Data Scientists, who use that data to design, build and test advanced models based on machine learning algorithms. As AI takes over many of the lower-level data processing tasks traditionally performed by Data Engineers, they will need to shift their skills towards data science. The upside is that AI’s speed will enable humans to focus on other tasks that require more creativity, like the design of novel algorithms. AI also enables one to perform more testing and evaluation of existing algorithms since everything can be run faster now.
To prepare for the future of Data Science enabled by generative AI, data scientists must embrace these new AI tools and learn how to use them. Just as the invention of cameras didn’t kill art, but instead opened up new creative opportunities, AI holds the same potential. Humans still need to define the task, and they also need to do most of the testing and verification (only some of that can be properly automated). Data science starts with figuring out the problem you want to solve, defining it and writing the problem statement. These are key skills for data scientists to hone in today’s fast-changing world. While AI tools will make data scientists more powerful in their work, the ability to think critically and strategically about how to use the data remains essential–and not easily replaced–by AI.
Do you need help with your data science project but don’t know where to begin? Download our eBook for free to get started today.