Course Description
Python is a popular, easy to learn programming language. It is commonly used in the field of data analysis because there are very efficient libraries available to process large amounts of data. This so-called data analysis stack includes libraries such as NumPy, Pandas, and Matplotlib that we will familiarize ourselves with. In this course, an overview is given of the different phases of the data analysis pipeline using Python and its data analysis ecosystem. What is typically done in data analysis? We assume that data is already available, so we only need to download it. After downloading the data it needs to be cleaned to enable further analysis. In the cleaning phase the data is converted to some uniform and consistent format. After which the data can, for instance, be combined or divided into smaller chunks; grouped or sorted; condensed into small number of summary statistics; numerical or string operations can be performed on the data.