Page 1 of 1

Are you a data scientist?

Posted: Thu May 20, 2021 1:43 pm
by Paul Hockett

Image

It recently struck me that I'd always been tending towards being a data scientist - at least by some current definitions (see Drew Conway's Venn diagram, and above... there's also a more thorough diagrammatic comparison and discussion/debunking available from David Taylor) - initially in the guise of trying to bridge theory and experimental results (aka "experiments with a lot of analysis"), and latterly with more focus on method development, code-base building and open-source/open-data. For me, this has largely been about improving my hacking skills, as this was the main thing missing for me (aside from the derogatory sense, inasmuch as all my code was hacky). In my case, I take the data science box as more-or-less the core overlap of computation (writing code, using existing packages etc.), domain expertise (physics & chemistry for me), and application of analysis tools (stats sure, but fitting and modelling to extract fundamental quantities from experimental results and/or compare in detail with theory and/or model experiments from first principles are really my key aims - machine learning doesn't really feature at all for me thus far).

This also has me thinking on the bigger questions: are we all data scientists now? Is it possible to be a good scientist without being a data scientist of some flavour in the information era? Is this just cover for the old "big computer, small brain" approach in new clothes, with even bigger computers (I really hope not... although AI/ML is very much on that end of the spectrum in many cases for me)? And, perhaps most interestingly, what new capabilities do we open up with a more rounded approach (assuming that there is enough time in the day to learn all the relevant skills)?

A more eloquent definition from Jake Vanderplas (via the Python Data Science Handbook):

While some of the intersection labels are a bit tongue-in-cheek, this diagram captures the essence of what I think people mean when they say "data science": it is fundamentally an interdisciplinary subject. Data science comprises three distinct and overlapping areas: the skills of a statistician who knows how to model and summarize datasets (which are growing ever larger); the skills of a computer scientist who can design and use algorithms to efficiently store, process, and visualize this data; and the domain expertise—what we might think of as "classical" training in a subject—necessary both to formulate the right questions and to put their answers in context.

With this in mind, I would encourage you to think of data science not as a new domain of knowledge to learn, but a new set of skills that you can apply within your current area of expertise. Whether you are reporting election results, forecasting stock returns, optimizing online ad clicks, identifying microorganisms in microscope photos, seeking new classes of astronomical objects, or working with data in any other field, the goal of this book is to give you the ability to ask and answer new questions about your chosen subject area.