Book Description

From the Inside Flap Table of Contents1. Why is Cross Validation important? 11Solution 11Code 112. Why is Grid Search important? 12Solution 12Code 123. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 13Solution 13Code 144. How to deal with categorical features? And what is one-hot-encoding? 16Solution 16Code 175. What are generalized linear models and what is an R Formula? 18Solution 18Code 186. What are the Decision Trees? 19Solution 19Code 217. What are the Ensembles? 22Solution 228. What is a Gradient Boosted Tree? 22Solution 229. What is a Gradient Boosted Trees Regressor? 23Solution 23Code 2310. Gradient Boosted Trees Classification 24Solution 24Code 2511. What is a Random Forest? 26Solution 26Code 2612. What is an AdaBoost classification algorithm? 27Solution 2713. What is a recommender system? 28Solution 2814. What is a collaborative filtering ALS algorithm? 29Solution 29Code 3015. What is the DBSCAN clustering algorithm? 32Solution 32Code 3216. What is a Streaming K-Means? 33Solution 33Code 3317. What is Canopi Clusterting? 34Solution 3418. What is Bisecting K-Means? 35Solution 3519. What is the PCA Dimensional reduction technique? 36Solution 36Code 3720. What is the SVD Dimensional reduction technique? 38Solution 38Code 3821. What is a Latent Semantic Analysis (LSA)? 38Solution 3822. What is Parquet? 39Solution 39Code 3923. What is an Isotonic Regression? 39Solution 39Code 4024. What is LARS? 41Solution 4125. What is GMLNET? 42Solution 4226. What is SVM with soft margins? 43Solution 4327. What is the Expectation Maximization Clustering algorithm? 43Solution 4328. What is a Gaussian Mixture? 44Solution 44Code 4529. What is the Latent Dirichlet Allocation topic model? 46Solution 46Code 4730. What is the Associative Rule Learning? 47Solution 4731. What is FP-growth? 49Solution 49Code 4932. How to use the GraphX Library 50Solution 5033. What is PageRank? And how to compute it with GraphX 51Solution 51Code 51Code 5234. What is a Power Iteration Clustering? 53Solution 53Code 5435. What is a Perceptron? 54Solution 5436. What is an ANN (Artificial Neural Network)? 55Solution 5537. What are the activation functions? 56Solution 5638. How many types of Neural Networks are known? 5739. How to ?train a Neural Network 58Solution 5840. Which are the possible ANNs applications? 59Solution 5941. Can you code a simple ANNs in Python? 60Solution 60Code 6042. What support has Spark for Neural Networks? 61Solution 61Code 6143. What is Deep Learning? 62Solution 6244. What are autoencoders and stacked autoencoders? 67Solution 6745. What are convolutional neural networks? 68Solution 6846. What are Restricted Boltzmann Machines, Deep Belief Networks and Recurrent networks? 70Solution 7047. What is pre-training? 71Solution 7148. An example of Deep Learning with Nolearn and Lasagne package 72Solution 72Code 73Outcome 73Code 7449. Can you compute an embedding with Word2Vec? 75Solution 75Code 76Code 7650. What are Radial Basis Networks? 77Solution 77Code 7851. What are Splines? 78Solution 78Code 7852. What are Self-Organized-Maps (SOMs)? 78Solution 78Code 7953. What is Conjugate Gradient? 79Solution 7954. What is exploitation-exploration? And what is the armed bandit method? 80Solution 8055. What is Simulated Annealing? 81Solution 81Code 8156. What is a Monte Carlo experiment? 81Solution 81Code 8257. What is a Markov Chain? 83Solution 8358. What is Gibbs sampling? 83Solution 83Code 8459. What is Locality Sensitive Hashing (LSH)? 84Solution 84Code 8560. What is minHash? 85Solution 85Code 8661. What are Bloom Filters? 86Solution 86Code 8762. What is Count Min Sketches? 87Solution 87Code 8763. How to build a news clustering system 88Solution 8864. What is A/B testing? 89Solution 8965. What is Natural Language Processing? 90Solution 90Code 90Outcome 9266. Where to go from here 92Appendix A 9567. Ultra-Quick introduction to Python 9568. Ultra-Quick introduction to Probabilities 9669. Ultra-Quick introduction to Matrices and Vectors 9770. Ultra-Quick summary of metrics 98Classification Metrics 98Clustering Metrics 99Scoring Metrics 99Rank Correlation Metrics 99Probability Metrics 100Ranking Models 10071. Comparison of different machine learning techniques 101Linear regression 101Logistic regression 101Support Vector Machines 101Clustering 102Decision Trees, Random Forests, and GBTs 102Associative Rules 102Neural Networks and Deep Learning 103 Read more