An introduction to programming and using Python, a modern programming language used in data science. Computational will be emphasized through solving problems by writing and testing and debugging programs.
Fundamental statistical concepts used in data science, including types of data, the collection of data, summarizing data, estimation and an introduction to hypothesis testing.
Data analytics is a process that turns data into usable information for answering questions. This course will introduce the process of acquiring, managing and analyzing data. Readily available real-world data sets will be analyzed using supervised and unsupervised learning methods.
Appropriate visualizations of data are a key to revealing patterns and communicating important findings in research. This course will build on statistical and analytical thinking by emphasizing the role and use of visualizations in the analysis of data. Theories, techniques and software for managing, exploring, analyzing, displaying and communicating information about various types of data will be introduced. Visualizations will be produced using readily available real-world data sets.
Organization concepts and terminology of data models and the underlying data structures needed to support them. Presentation of the relational database management system including an introduction to SQL programming, normalization and database design. Introduction to the programming interface to databases.
Fundamentals of the research process including formulating questions to assess data needs, determining how to collect and manage the necessary data, and putting results in the correct context.
An examination of algorithmic bias, legal and privacy issues about data that arise in the phases of a data science project and how data is related to social issues. Case studies from various disciplines will be used to explore these issues.
The tools and techniques of managing and analyzing big data will be covered. Students learn how to use cloud services and data mining techniques for analyzing big data.
An overview of the key concepts of machine learning through practical examples and applications. Programming projects will be used for learning techniques, for interpreting results and understanding scaling up from thousands of records to millions/billions.
Hands-on experience as a part of a data science team covering all phases of a data science project, with a focus on the design and data collection phases.