Inmas Data Science Workshop
Welcome to the website of the Inmas Workshop on Data Science 2021. This website contains links and files to all relevant content of the workshop.
Overview
This course is designed to provide a glimpse at modern computational approaches for the analysis of data sets. We cover the concepts of supervised learning and unsupervised learning and illustrate the usage of some popular methods in these frameworks by the means of popular toolboxes in Python.
As many data sets that are encountered in practice are inherently high-dimensional, we aim to gain intuition about the geometry of high-dimensional spaces and distributions, and shed light on computational aspects of some of the covered methods.
Workshop Schedule
As a preparation for the workshop, we encourage you to complete the pre-work before the first session on Friday, March 19.
Thurdsday, March 18
- 7:00 PM ET: (Optional) Office Hour: Feedback & help with pre-work
Friday, March 19
Session I
- 2:00 PM - 3:00 PM ET: Framework of Statistical Learning, Feature Design, Regression in High Dimensions
- 3:00 PM - 5:00 PM ET: Project work in small groups
Saturday, March 20
Morning Session (Session II)
- 9:00 AM - 10:00 AM ET: Classification Problems, Natural Language Processing
- 10:00 AM - 12:00 PM ET: Project work in small groups
Afternoon Session (Session III)
- 2:00 PM - 3:00 PM ET: Principal Component Analysis, Clustering
- 3:00 PM - 5:00 PM ET: Project work in small groups
Sunday, March 21
Session IV
- 9:00 AM - 10:00 AM ET: Neural Networks and Deep Learning
- 10:00 AM - 12:00 PM ET: Project work in small groups
All sessions will be held via Zoom.
Instructor: Christian Kümmerle kuemmerle@jhu.edu (Johns Hopkins University)
Teaching Assistants: Daniel Fuentes-Keuthan, Patrick Martin
After-Workshop Office Hours
- Sunday, March 28, 11 AM ET
- Monday, April 5, 7 PM ET
Computational Tools
This workshop will use practice exercises that will make use of the Python language, which is widely used for data science and machine learning due its property as a general purpose programming language and its modularity, which has attracted the development of a variety of powerful libraries.
The most relevant libraries we will use are:
- NumPy: Basic manipulation of vectors and matrices.
- SciPy: Scientific computing, in particular useful for linear algebra, optimization, signal and image processing.
- matplotlib: Visualization and plotting.
- seaborn: Package for visaulization, more high-level than matplotlib.
- scikit-learn: Implementations of a wide range of machine learning
- keras: Interface to deep learning libraries.