Introduction to machine learning in Python

Length: 4 days (32 hours)

Description: Machine learning is changing the world — from predictive typing on your cellphone, to Amazon’s Alexa voice recognition, to the spam detector in our e-mail programs. In this course, we’ll learn the basic principles behind machine learning, and will see how we can put these ideas into practice using Python and its popular “scikit-learn” library.

The course will discuss the main uses of machine learning: Classificaiton, regression, and clustering — including specific use cases, such as classification of text, classification of images, and clustering for outlier detection. We’ll create models, and then test those models to make sure that they aren’t overfit. We’ll look at ways in which we can transform our data for better model results.

We will also discuss some of the transformations needed for successful machine learning, and how we can overcome them, including scaling and one-hot encoding. We’ll then see how these can be automated, using such tools as ColumnTransformer, and how we can package such transformations into a “pipeline.”

While we will discuss a number of machine-learning algorithms in the class, the discussion will be at a relatively high level, and will not go into mathematical detail.

By the end of this course, participants will have not only an understanding and appreciation of what machine learning is and how it works, but how they can use machine learning to solve a variety of problems.

Audience: Participants are expected to have minimal experience with Python: Knowledge of the basic data types, an ability to write loops, familiarity with writing and executing basic functions, and a basic understanding of object-oriented programming. In addition, participants should be familiar with Python’s Pandas library, especially retrieving, selecting, and modifying data in data frames.


  • What is machine learning?
  • Intro to scikit-learn
  • Types of machine learning
    • Classification problems
    • Common classification estimators
    • Choosing an estimator
    • Training and testing models
    • Variance and bias
    • Hyperparameters
    • Programmatic comparison of models
    • Visualization and models
    • Model persistence
  • Regression problems
    • Common regression estimators
    • Testing regression models
    • Post-fitting attributes
    • Scaling
    • Pipelines
    • Ensemble estimators
    • Visualization and models
  • Transforming data
    • One-hot encoding
    • Missing values
    • ColumnsTransformer
  • Classification of documents
  • Classification of images
  • Clustering and unsupervised learning
  • Novelty and outlier detection