DS - Data Science
This course teaches critical concepts and skills in data collection, manipulation, exploration, and analysis, and explores the ethical and social considerations inherent in the today's "big data" revolution, such as privacy, design, reproducibility, and bias. Using real-world datasets, students will explore, visualize, and pose questions about data. Prerequisites: Intermediate Algebra.
In this course we will learn about data analytics through the lens of data wrangling, statistical graphics, modern regression tools, and the ethical issues associated with the entire data pipeline. Our statistical graphics focus will examine both single variable and multivariable plots, using both discrete and continuous variables. We will review simple linear regression before delving into validity of models and transformations, weighted least-squares, multiple linear regression, variable selection, logistic regression, and mixed models. This class will include several writing projects.
Machine and Statistical Learning is a collection of mathematical and statistical techniques used to detect, classify, and infer patterns in large and/or complex data sets. Examples of machine and statistical learning algorithms are all around us: speech recognition on your phone, text prediction in internet searches, medical school placement algorithms, and the prediction of what you may want to watch next on your video stream. This course gives an overview of many concepts, techniques, and algorithms in modern machine and statistical learning including both supervised and unsupervised learning. Topics include linear regression, classification, cross validation, dimension reduction, nonlinear regression, tree-based methods, support vector machines, principal component analysis, artificial neural networks, and clustering. The course will give students the ideas and intuition behind modern machine and statistical learning methods as well as a more formal understanding of how, why, and when they work. The underlying theme in the course is application of the algorithms to real data sets.
This course will explore current techniques in computation for data science, including (but not limited to) parallel and distributed coding algorithms and modern programming languages.
This course will present students with a substantial experience in data analysis. Students will investigate and analyze data from a variety of sources, working both as individuals and in project teams. This serves as a capstone experience for the DS major.