Data Science is a sometimes difficult-to-define, increasingly popular and buzzword-y professional field that encompasses the following:
People have told me (namely my boss at Vertex Pharmaceuticals and a speaker who came in the other week from 3M) that data science is the intersection of the three fields shown below:
This seems like a reasonably satisfactory formulation of the field. I’ve found that Computer Science knowledge is particularly important for accessing and programming data; whether using APIs, formatting CSVs, or scraping, the vast majority of backend, database-facing work is done programmatically, as is development of dashboards and visualizing.
The other two areas included in the venn diagram are less critical and more interchangeable. For instance, is mathematical and statistical analysis on data not data science without domain expertise? And from a business standpoint, much can be accomplished, especially when the business is not particularly data-savvy, with simply computer science and domain expertise. Most of the work at my internship last summer involved accessing and transforming data and then doing exploratory data analysis, which I would not classify as particularly mathematical. My analysis nevertheless produced useful business insights.
A fourth category I might add to this chart would be “Data Visualization and Social Skills.” An extremely important facet of a data scientist’s job is communicating with businesspeople and folks from other areas who may not be familiar with the techniques and vocabulary behind data science. The two things that have the most utility here are the proper and clear representation of data and the ability to convey quantitative ideas to a non- or less-quantitative audience.