Skip to main content

Introduction

What's the “full stack” (WTF)?

Full stack

The “full stack” describes all the technical components and tools with which data scientists interact from end-to-end, e.g. databases to visualizations and everything in between. Along the way, there are specialist trades e.g., devops, analysts, programmers, database administrators, software engineers, statisticians—the list is long. This guide series is not a source of expertise on each of these trades; however, I've worked on large technical projects with teams that span these specialties and with this guide seek to share ways to combine these approaches for real world problems.

As a leader or founder in this space, it is worthwhile to be versed in these technologies at a high level and rapidly prototype your ideas. You don’t need to master all the parts to get underway; the guide is set up to provide entry points to the sections of interest. That said, the guides and chapters in this series are not completely autonomous, the core components are required for all. You should install the "Core" setup first to get your machine fully operational. The "Quickstart" install is the fastest way to get underway.

> This guide is opinionated, by design

I express strongly-held opinions in this guide. I’ve found the tools herein to be effective and work well together. They provide a starting point for your data science adventures. You may eventually come to prefer alternative editors, package managers, etc..., and the landscape is constantly evolving. Similarly, my recommendations may change over time, as they have in the eight (8) previous versions of this guide; however, the core principles have remained the same.

Core principles

  1. Simple

    ... a simple system is easier to setup and maintain

  2. Reproducible

    ... scientific work and results must be reproducible

  3. Collaborative

    ... it should be easy to work with others, in real time

  4. Accessible

    ... code should be easy for others to run and read, across platforms

  5. Transparent

    ... algorithms, assumptions, and logic must be transparent

Core components

To launch into data science, you should be familiar with the following components:

0. Terminal

> ... to interact with your computer and run commands

Recommendation: Terminal (Mac | Linux) & PowerShell (Windows)

1. Package Manager

    > ... to install and manage software and files

Recommendation: Homebrew (Mac) & Winget (Windows)

2. Version control

> ... to “save” snapshots of your work history and collaborate

Recommendation: Git is distributed and promotes reusability

3. Code editor

> ... to read, write, and edit code and data

Recommendation: Visual Studio is extensible, reliable, and intuitive

4. Programming environment

> ... to shape, analyze, and display data

**Recommendation: Python is versatile (e.g. statistics, maps, models, text, etc ...)

5. Notebook

> ... to explore data and annotate your work

Recommendation: Jupyter supports inline graphs, multiple programming languages, markdown, and is easy to share and use

> ... everything else is pretty much optional

tl;dr This guide takes a a choose-your-own-adventure approach to data science; follow your interests and needs. However, I strongly recommend you install the "Core" components first (Terminal, Package Manager, Git, Visual Studio, Python, and Jupyter) —they will be necessary for all chapters!

> ... alternatively, you can follow along with the “cloud” option on Google Colab.

Caveats

  • Installation guides age quickly. I try to keep up, but help and feedback is appreciated (please submit a ticket or pull request on Github)
  • This Guide leans towards Mac; I spend less time on Windows and Linux
  • Installing software can damage your computer, Use virtual environments (venv) wherever possible to avoid clobbering your settings or bare metal operating system (OS).
  • I recently re-wrote this guide quickly to support colleagues, some sources may not be properly referenced. This will be resolved, in the interim, blanket acknowledgement to “The Interwebs.”

Syntax in this guide

Code is embedded in a grey box. Hover on the right side of the box to reveal a document icon for copy and pasta:

echo "hello world"

Code may be presented in tabs that denote separate commands by language or operating system. For example, the following code block shows how to list files on Unix/MacOS and Windows:

## Show files
ls -la