What is Data Science?

Data science is a much debated and comparatively new term in the realm of data analysis and statistics. As the students of the MSA class of 2016 graduate, many of us will start working under the designation of ‘data scientist.’ But, as William Shakespeare said, “What’s in a name?” I was curious to explore more about what data science really is, and what it is not.
To understand the term ‘data science’ better, let us take a look at the following four attributes.

  1. Interdisciplinary field: data science is an interdisciplinary field where concepts of statistics, computer science, and mathematics overlap. Data is in the core of this discipline, and what we do with the data using the tools and techniques offered by the overlapping fields define the field of data science.
    Data Science is an interdisciplinary field where Statistics, Mathematics, and Computer Science Overlaps
    Data Science is an interdisciplinary field where Statistics, Mathematics, and Computer Science overlap
  1. Many layers of data challenge: Data science doesn’t start at advanced data modelling techniques. Rather, it comprises many layers of data challenges including developing data infrastructure, analyzing data, pulling information and insights, automating process flow of information, applying advanced techniques, solving problems using data, and finally communicating the results in an effective way.
    Layers of Data Science Challenge
    Layers of Data Science Challenge


  1. It’s an Art: Data Science doesn’t belong to the pure black and white area of fundamental science where things are either right or wrong. This discipline is more focused on application and results. While data gives us pure facts, generating value from those facts requires creativity, innovative thinking, and detail orientation. That’s the beauty of data science.
  1. It’s not all about big data: With the big data explosion we tend to associate data science with big data. Big data comes within the very first layer of the data science challenge of developing data infrastructure. An organization may or may not have a large volume of data, but they can always make a difference in the way they solve problems by using data science.

Critics have said that there is no distinction between data science and statistics, and data scientists have been considered no more than computer-literate statisticians. This might be a valid point of view, but challenges of the 21st century are different compared to what traditional statisticians have faced in past. The focus is no longer on drawing inference about the population using small samples of data. Data science encompasses the challenges of the new century with help of computer science – which are to store, manipulate, visualize, and apply advanced machine learning algorithms to extract value from a large amount of data.
People may have different opinions about the use of the term ‘data science,’ but the business world has already accepted it. Data science may not be a pure form of science, but I hope these four attributes help to add to the discussion about the definition of data science.
Columnist: Mallika Dey