THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Buck Woody

Carpe Datum!

Is Data Science “Science”?

I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields.

Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist?

This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April  2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”.  I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there.

Definition of Science

To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas:

  • Study of the physical world
  • Systematic and/or disciplined study of a subject area
  • ...and then they include the things studied, the bodies of knowledge and so on.

The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm.

Definition of Data Science

And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present).

Researching the term's general use I created an amalgam of the definitions this way:

“Studying and applying mathematical and other techniques to derive information from complex data sets.”

Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself.

I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications.

As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability.  These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it.

So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking.

What do you think? Science, or not? Does it matter?

Published Tuesday, October 16, 2012 7:29 AM by BuckWoody

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Chris Nelson said:

Seems to be a title inflation issue promoted by some of the current hype surrounding "Big Data" similar to the "Web 2.0" hype. O'Reilly is one of the current pimps of the "Data Science" meme.

http://shop.oreilly.com/category/get/data-science-kit.do

October 16, 2012 12:22 PM
 

Adam Machanic said:

Chris: dead on. Great term, "title inflation." Similar to how every senior developer decided, about 5 years ago, that they were actually "architects."

October 16, 2012 2:42 PM
 

Chris Nelson said:

Adam: Buck is dead on with his opinions. Pioneers like Codd that created the field could be considered a data scientist. Not someone who's read a few books and won a contest.

October 16, 2012 4:29 PM
 

RichB said:

I think it is a fair title.

However, the limits to where it should be applicable are pretty restrictive.

I suspect that most people who want to use it should really be called something far less flattering.

Far from mere title inflation, this is an insidious attempt to create a 'new' discipline, with all the opportunities to throw away the old key tenets of the previous discipline.

Maybe a good thing, maybe not. Should we fight it or embrace it? Anything that makes people take their data more seriously is a good thing - but a lot of this tends to reek of short termist OO style bung stuff in and see what falls out, rather than take the effort of figuring out what you want to put and and what you want out.

The ultimate code and fix?

October 17, 2012 5:13 AM
 

Ben said:

at best, the person who derives information from data is a statistician - or the guy that uses statistics in the back office to prove what we are doing is right.

Statistics is a science - why need the term data scientist?

and what about good 'ol operations research?  

too much jargon, and title inflation lead to people not using common terms for the same things - which leads to boat loads full, of intercommunication and executives buying the wrong products.  

the most adequate terms are:

Report\cube\ETL\database\etc.. developer

Analyst

Statistician

October 19, 2012 11:59 AM

Leave a Comment

(required) 
(required) 
Submit

About BuckWoody

http://buckwoody.com/BResume.html

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement