VTEC Training

VTEC Training

Portland,
ME

Class Dates:

1/1/0001

Length:

4 Days

Cost:

$2195.00

Class Time:

Technology:

Database

Delivery:

Instructor-Led Training, Virtual Instructor-Led Training

- Course Overview
- This Data Science training course is complemented by a variety of hands-on exercises to help the attendees reinforce their theoretical knowledge of the material being studied.

**TOPICS**

Applied data science, business analytics, and data engineering

Common data science/machine learning algorithms for supervised and unsupervised machine learning

NumPy, pandas, matplotlib, seaborn, scikit-learn

Python REPLs

Jupyter notebooks

Data analytics life-cycle phases

Data repairing and normalizing

Data aggregation and grouping

Data visualization and EDA

Operational data analytics

Distributed and scalable data processing

Cloud machine learning and data engineering capabilities - Audience
- IT architects and technical managers

- Participants should have a working knowledge of Python (or have the programming background and/or the ability to quickly pick up Python’s syntax), and be familiar with core statistical concepts (variance, correlation, etc.)

- Lesson 1. Python for Data Science
- Python Data Science-Centric Libraries
- SciPy
- NumPy
- pandas
- Scikit-learn
- Matplotlib
- Seaborn
- Python Dev Tools and REPLs
- IPython
- Jupyter Notebooks
- Anaconda
- Summary
- Lesson 2. Data Visualization in Python
- Why Do I Need Data Visualization?
- Data Visualization in Python
- Getting Started with matplotlib
- A Basic Plot, Scatter Plots, Figures
- Saving Figures to a File, Seaborn
- Getting Started with seaborn
- Histograms and KDE
- Plotting Bivariate Distributions
- Scatter Plots in seaborn, Pair plots in seaborn
- Heatmaps
- A Seaborn Scatterplot with Varying Point Sizes and Hues
- Summary
- Lesson 3. Introduction to NumPy
- What is NumPy?
- The First Take on NumPy Arrays, The ndarray Data Structure
- Understanding Axes, Indexing Elements in a NumPy Array
- Re-Shaping, Commonly Used Array Metrics
- Commonly Used Aggregate Functions
- Sorting Arrays, Vectorization, Vectorization Visually
- Broadcasting, Broadcasting Visually
- Filtering, Array Arithmetic Operations
- Reductions: Finding the Sum of Elements by Axis
- Array Slicing, 2-D Array Slicing
- The Linear Algebra Functions
- Summary
- Lesson 4. Introduction to pandas
- What is pandas?
- The DataFrame Object, The DataFrame's Value Proposition
- Creating a pandas DataFrame, Getting DataFrame Metrics
- Accessing DataFrame Columns, Accessing DataFrame Rows
- Accessing DataFrame Cells, Deleting Rows and Columns
- Adding a New Column to a DataFrame
- Getting Descriptive Statistics of DataFrame Columns
- Getting Descriptive Statistics of DataFrames
- Reading From CSV Files
- Writing to a CSV File
- Summary
- Lesson 5. Repairing and Normalizing Data
- Repairing and Normalizing Data
- Dealing with the Missing Data
- Sample Data Set, Getting Info on Null Data
- Dropping a Column
- Interpolating Missing Data in pandas
- Replacing the Missing Values with the Mean Value
- Scaling (Normalizing) the Data
- Data Preprocessing with scikit-learn
- Scaling with the scale() Function
- The MinMaxScaler Object
- Summary
- Lesson 6. Defining Data Science
- What is Data Science?
- Data Science, Machine Learning, AI?, The Data Science Ecosystem
- Tools of the Trade, The Data-Related Roles, Data Scientists at Work
- Examples of Data Science Projects, The Concept of a Data Product
- Applied Data Science at Google
- Data Science and ML Terminology: Features and Observations
- Terminology: Labels and Ground Truth, Label Examples
- Terminology: Continuous and Categorical Features
- Encoding Categorical Features using One-Hot Encoding Scheme
- Example of 'One-Hot' Encoding Scheme
- Gartner's Magic Quadrant for Data Science and Machine Learning Platforms (a Labeling Example)
- Machine Learning in a Nutshell, Common Distance Metrics
- .
- The Euclidean Distance, Decision Boundary Examples (Object Classification)
- What is a Model?, Training a Model to Make Predictions
- Types of Machine Learning, Supervised vs Unsupervised Machine Learning, Supervised Machine Learning Algorithms
- Unsupervised Machine Learning Algorithms, Which ML Algorithm to Choose?
- Bias-Variance (Underfitting vs Overfitting) Trade-off
- Underfitting vs Overfitting (a Regression Model Example) Visually
- ML Model Evaluation, Mean Squared Error (MSE) and Mean Absolute Error (MAE)
- Coefficient of Determination, Confusion Matrix
- The Binary Classification Confusion Matrix, The Typical Machine Learning Process
- A Better Algorithm or More Data?, The Typical Data Processing Pipeline in Data Science
- Data Discovery Phase, Data Harvesting Phase
- Data Cleaning/Priming/Enhancing Phase, Exploratory Data Analysis and Feature Selection
- .
- Exploratory Data Analysis and Feature Selection Cont'd
- ML Model Planning Phase, Feature Engineering
- ML Model Building Phase, Capacity Planning and Resource Provisioning
- Communicating the Results
- Production Roll-out
- Data Science Gotchas
- Summary
- Lesson 7. Overview of the scikit-learn Library
- The scikit-learn Library
- The Navigational Map of ML Algorithms Supported by scikit-learn
- Developer Support
- scikit-learn Estimators, Models, and Predictors
- Annotated Example of the LinearRegression Estimator
- Annotated Example of the Support Vector Classification Estimator
- Data Splitting into Training and Test Datasets
- Data Splitting in scikit-learn
- Cross-Validation Technique
- Summary
- 8. Classification Algorithms (Supervised Machine Learning)
- Classification (Supervised ML) Use Cases
- Classifying with k-Nearest Neighbors
- k-Nearest Neighbors Algorithm Visually
- Decision Trees, Decision Tree Terminology, Decision Tree Classification in the Context of Information Theory
- Using Decision Trees, Properties of the Decision Tree Algorithm
- The Simplified Decision Tree Algorithm
- Random Forest, Properties of the Random Forest Algorithm
- Support Vector Machines (SVMs), SVM Classification Visually
- Properties of SVMs, Dealing with Non-Linear Class Boundaries,
- Logistic Regression (Logit), The Sigmoid Function
- Logistic Regression Classification Example
- Logistic Regression's Problem Domain
- .
- Naive Bayes Classifier (SL)
- Naive Bayesian Probabilistic Model in a Nutshell
- Bayes Formula
- Document Classification with Naive Bayes
- Summary
- Lesson 9. Unsupervised Machine Learning Algorithms
- PCA, PCA and Data Variance, PCA Properties
- Importance of Feature Scaling Visually
- Unsupervised Learning Type: Clustering
- Clustering vs Classification
- Clustering Examples
- k-means Clustering
- k-means Clustering in a Nutshell
- k-means Characteristics
- Global vs Local Minimum Explained
- Summary
- Lab Exercises
- Lab 1. Learning the CoLab Jupyter Notebook Environment
- Lab 2. Data Visualization in Python
- Lab 3. Understanding NumPy
- Lab 4. Data Repairing
- Lab 5. Understanding Common Metrics
- Lab 6. Coding kNN Algorithm in NumPy (Optional)
- Lab 7. Understanding Machine Learning Datasets in scikit-learn
- Lab 8. Building Linear Regression Models
- Lab 9. Spam Detection with Random Forest
- Lab 10. Spam Detection with Support Vector Machines
- Lab 11. Spam Detection with Logistic Regression
- Lab 12. Comparing Classification Algorithms
- .
- Lab 13. Feature Engineering and EDA
- Lab 14. Understanding PCA