This module is optional and has been introduced considering that python is emerging fast as another preferred tool for data scientists. There will be extra sessions for covering this module. Days and timing of classes will be fixed after discussion with the participants.
Python: Basic data types and data structures in python; Loops and Conditionals in python; Defining functions. Numpy: Arrays; Basic arrays operations; Comparison operators and value testing for arrays; Array item selection and manipulation; Statistics; Random Numbers. Pandas: 10 minutes with pandas—a quick start; Data structures in pandas; Essential basic functionality of pandas including indexing and accessing data; Working with missing data. Data visualization: Graphics with matplotlib with ggplot and with seaborn.
Machine learning with scikit-learn: Classification: Modeling using SVM, nearest neighbors, random forest and xgboost; Regression with decision tree regressor, ensemble regressors, Naïve Bayes and Gaussian regressors; Clustering: K-means and Agglomerative clustering; Model selection, cross-validation and grid search. Dimensionality Reduction and Preprocessing.
(Note: Most of the modeling concepts will be covered separately while developing models with R. Scikit-learn will, therefore, be an implementing medium for what has been already learnt.)