Theories Behind Data Science

If you’d like to perform data science there are several theories and principles that you need to understand. And once you understand these theories and principles, it will allow you to learn a certain set of practices, and step by step skills that data scientists do. If you don’t understand these theories and principles, then you won’t be able to understand the practices and skills. So first let me teach you a few theories and principles that are involved, and once you understand the theoretical elements, then I can teach you a simple step-by-step method for doing data science.

Database Theory

Firstly let’s talk about database theory. Database theory is about organising data and organising it in a way that makes storing and retrieving it efficient. Data can be categorised into objects, objects can be put into collections and objects and collections can have relationships between each other and themselves. The one thing you need to know about this theory is that they way you organise your data will impact the effort required to get answers from it.

Agile Manifesto

Now let’s talk about the Agile Manifesto. The Agile Manifesto is a set of principles that ensures high quality outputs in environments subject to high levels of change and ambiguity. Agile methods overcome rapid changes and ambiguity through adopting an iterative development process. It utilises self managed teams and those that are passionate about technological advancements are drawn to it like scientists to big bang theory. The Agile Manifesto looks to remove all cultural barriers between developer, client and end user and focuses on using the latest technology to making things simple but not simpler. The one thing you need to know about this set of principles is that all things change and the longer you take to test your solution in the live environment the higher the risk of failure.

Spiral Dynamics

The last theory I’d like to touch on is Spiral Dynamics Theory. Spiral Dynamics is a theory of human development and behaviour and explains why humans do what we do. It explains the psychology behind why we get out of bed in the morning, why we feel compelled to create things and why we seek to better ourselves and better serve our loved ones. The theory talks about two mental states, one of “facts” and one of “values”. Facts are what we believe. Our beliefs are based on the knowledge we currently have and the environment we are currently in. Values are what we desire. Our desires are driven by our intentions and/or concerns which are also based on the knowledge we currently have and the environment we are currently in. The one thing you need to know about this theory is that our facts and our desires come from what data is presented to us.

Data Scientists

Data Scientists perform data science. They use technology and skills to increase awareness, clarity and direction for those working with data. The data scientist role is here to accommodate the rapid changes that occur in our modern day environment and are bestowed the task of minimising the disruption that technology and data is having on the way we work, play and learn. Data Scientists don’t just present data, data scientists present data with an intelligence awareness of the consequences of presenting that data.

How To Do Data Science

The three components involved in data science are organising, packaging and delivering data (the OPD of data). Organising is where the physical location and structure of the data is planned and executed. Packaging is where the prototypes are build, the statistics is performed and the visualisation is created. Delivering is where the story gets told and the value is obtained. However what separates data science from all other existing roles is that they also need to have a continual awareness of What, How, Who and Why. A data scientist needs to know what will be the output of the data science process and have a clear vision of this output. A data scientist needs to have a clearly defined plan on how will this output be achieved within the restraints of available resources and time. A data scientist needs to deeply understand who the people are that will be involved in creating the output. And most of all the data scientist must know why there is a motivation behind attempting to manifest the creative visualisation.

The 3 step OPD Data Science Process

Step 1. Organise Data.
Organising data involves the physical storage and format of data and incorporated best practices in data management.

Step 2. Package Data.
Packaging data involves logically manipulating and joining the underlying raw data into a new representation and package.

Step 3. Deliver Data.
Delivering data involves ensuring that the message the data has is being accessed by those that need to hear it.

Plus, at all steps have answers to these questions.
  • What is being created?
  • How will it be created?
  • Who will be involved in creating it?
  • Why is it to be created?

The Data Science Model

Data Science in action

Data science in action it is simply about moving people and/or systems between current and new technologies and between beginner and expert skills.

Step 1. Organising Data.

Organising data involves moving people and systems from current to new (left to right) and from beginner to expert (top to bottom). Advancing technologies and skills is the essence of innovation.

Step 2. Packaging Data.

Packaging data is the reverse of organising data and involves moving people and systems from new to current (right to left) and from expert to beginner (bottom to top). This is the art of making things simple but not simpler.

Step 3. Delivering Data.

Delivering data is enabling the movement from one view to another, enabling a beginner to become an expert, enabling current technology to seem new, enabling expert data to be understood by beginners and enabling new technology to seem like it has be a part of your life since you were born.  This is transformational education.