This week Dr Matthew Brett gives an introduction to data science and its uses in the classroom

I have not tried to justify my assertion that data science is going to be ubiquitous in undergraduate education, but this is the view of the US National Academies of Sciences, Engineering and Medicine in their 2018 report:

The term “data science” is famously hard to define, and for the sake of brevity, I am showing data science using the example of Berkeley’s “Foundations of Data Science” course. Quoting from : “The course is designed for entry-level students from any major. It is designed specifically for students who have not previously taken statistics or computer science courses.”

You can find their free online textbook “Computational and Inferential Thinking” at

After a short introduction to data science, the students leap into running code via a web browser.  They start in by clicking on a button “Interact”.  This attaches their web browsers to the Berkeley servers running the Jupyter Notebook : . This is an award-winning open-source tool that is widely used in industry and academia to make interactive web-based documents that mix code and text.

It is fairly simple to install this system on your laptop.  For examples, you can run the free Anaconda installer, available at .  If you then get the content of the Berkeley course from , you can run the course notebooks, with your laptop running the code and displaying the results in your web browser.

After the students have clicked “Interact”, they are looking at the pages of the textbook, but in a form where they can edit and run the code themselves to see the output in their web browser.

In the second page I refer to, the students plot the counts of character names in each chapter:  .  The plots show some of the story arc; for example, we can immediately see that Tom Sawyer joins the action after chapter 30 of Huckleberry Finn.