Machine Learning for Environmental Engineering
Explore and test state-of-the-art machine learning methods applied to environmental sciences and engineering challenges.
Modules/Weeks
Weekly Effort
Discipline
School
Format
Cost
Course Description
This course aims to develop a solid understanding of state-of-the-art machine learning methods and their application to problems in environmental science and engineering. Potential areas of application include, but are not limited to, remote sensing, environmental modeling, and geophysical fluid dynamics.
The first part of the course will focus on applying "vanilla" machine learning algorithms to simple problems, while introducing key tools such as PyTorch and Jupyter notebooks. We will cover feedforward neural networks, shallow versus deep architectures, regression trees, random forests, and XGBoost through hands-on examples. In parallel, we will discuss essential machine learning concepts, including hyperparameter tuning, batch sizes, optimization techniques, and assumptions about data distributions.
Next, the course will explore more advanced neural network architectures, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
The course will cover probabilistic models and uncertainty quantification using Gaussian Processes, Bayesian Neural Networks, and ensemble methods, emphasizing the distinction between aleatoric and epistemic uncertainties.
Finally, we will transition to cutting-edge topics in generative AI, with a focus on variational autoencoders, diffusion models, as well as transfer learning and metalearning.
This course is available at no cost and includes full access to all instructional materials, videos, and assessments. Learners who successfully complete all course requirements will have the option to purchase a verified certificate of completion for $20.

Course Prerequisites
- Computer language: Python
- College level linear algebra
Technology Requirement
Although a group project is not required for this Columbia+ online asynchronous course, you are welcome to do the project individually for practice. The following technology requirement is optional and only needed if you plan to do the individual project
- Coding Environment: Jupyter Notebook (optional)
- Estimated GPU Cost: $9.99 for 100 Compute Units (optional)
- Based on a conservative estimate, the Google Colab “Pay As You Go” plan at $9.99 for 100 compute units should be sufficient. The coursework is expected to require less than 10 hours of GPU usage.
What You Will Learn
By the end of this course, learners will be able to:
- Experiment with basic machine learning algorithms using PyTorch and notebooks, focusing on "vanilla" models applied to simple environmental problems.
- Gain practical experience on feedforward neural networks, shallow vs. deep networks, regression trees, random forests, and XGBoost through hands-on examples.
- Understand core machine learning techniques such as hyperparameter tuning, batch sizing, optimization techniques, and distributional assumptions underlying different algorithms.
- Explore intermediate deep learning models, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), and their applications to environmental datasets.
- Analyze uncertainty in machine learning, using Gaussian Processes, Bayesian Neural Networks, and ensemble methods for aleatoric and epistemic uncertainties.
- Explore advanced machine learning topics, including generative models (e.g., variational autoencoders, diffusion models), as well as transfer learning and metalearning techniques.
- Module 1: Regression Trees
- Bagging, boosting, random forests
- Gradient boosting
- Limiting overfitting
- Module 2: Neural Networks and Shallow vs Deep Networks
- Shallow feedforward
- Backpropagation
- Deep networks
- Training and overfitting
- Module 3: Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)
- Images and Convolutional Neural Networks
- Pooling, convolutions
- Recurrent neural networks, Long-Short Term Memory
- Module 4: Gaussian Processes
- Bayesian formulation
- Families of Gaussian processes
- Module 5: Generative AI
- Autoencoder
- Variational autoencoder
- GAN - Generative adversarial networks
- Diffusion models
Please note: Lecture videos will be released weekly through December 3, 2025. Please check back each week for new content. Each module includes only one quiz. No new lecture videos will be posted the week of November 24 due to the Thanksgiving holiday.
Some course materials are not available for public viewing due to licensing or privacy considerations. These items may appear as unavailable (e.g. 404 not found). If you encounter unavailable content, please note that you will need to explore alternative resources independently to support your learning. Thank you for your understanding.
Instructors
Pierre Gentine is a Professor in the department of Earth and Environmental Engineering and in the department of Earth and Environmental Sciences. He is director of the National Science Foundation Science and Technology Center "Learning the Earth with Artificial intelligence and Physics" and a director of the Graduate Program in Earth and Environmental Engineering. Dr. Gentine and his group investigate the multiscale nature of the continental hydrologic and carbon cycle, with observations (remote sensing and in situ), models and machine learning.
Dr. Gentine received his undergraduate degree from SupAéro, in France. He earned his PhD in Civil and Environmental Engineering at MIT in 2010. He joined the faculty at Columbia in 2009 as an instructor in applied mathematics and then as a tenure track assistant professor in Earth and Environmental Engineering in 2011.
