【Scala】电子书 - Scala机器学习

该帖子部分内容已隐藏
付费阅读
金币 3
此内容为付费阅读,请付费后查看

书籍目录

Cover

Copyright

Credits

About the Author

About the Reviewers

www.PacktPub.com

Table of Contents

Preface

Chapter 1: Getting Started

Mathematical notation for the curious

Why machine learning?

Classification

Prediction

Optimization

Regression

Why Scala?

Abstraction

Scalability

Configurability

Maintainability

Computation on demand

Model categorization

Taxonomy of machine learning algorithms

Unsupervised learning

Clustering

Dimension reduction

Supervised learning

Generative models

Discriminative models

Reinforcement learning

Tools and frameworks

Java

Scala

Apache Commons Math

Description

Licensing

Installation

JFreeChart

Description

Licensing

Installation

Other libraries and frameworks

Source code

Context versus view bounds

Presentation

Primitives and implicits

Primitive types

Type conversions

Operators

Immutability

Performance of Scala iterators

Let's kick the tires

Overview of computational workflows

Writing a simple workflow

Selecting a dataset

Loading the dataset

Preprocessing the dataset

Creating a model (learning)

Classify the data

Summary

Chapter 2: Hello World!

Modeling

A model by any other name

Model versus design

Selecting a model's features

Extracting features

Designing a workflow

The computational framework

The pipe operator

Monadic data transformation

Dependency injection

Workflow modules

The workflow factory

Examples of workflow components

The preprocessing module

The clustering module

Assessing a model

Validation

Key metrics

Implementation

K-fold cross-validation

Bias-variance decomposition

Overfitting

Summary

Chapter 3: Data Preprocessing

Time series

Moving averages

The simple moving average

The weighted moving average

The exponential moving average

Fourier analysis

Discrete Fourier transform (DFT)

DFT-based filtering

Detection of market cycles

The Kalman filter

The state space estimation

The transition equation

The measurement equation

The recursive algorithm

Prediction

Correction

Kalman smoothing

Experimentation

Alternative preprocessing techniques

Summary

Chapter 4: Unsupervised Learning

Clustering

K-means clustering

Measuring similarity

Overview of the K-means algorithm

Step 1 – cluster configuration

Step 2 – cluster assignment

Step 3 – iterative reconstruction

Curse of dimensionality

Experiment

Tuning the number of clusters

Validation

Expectation-maximization (EM) algorithm

Gaussian mixture model

EM overview

Implementation

Testing

Online EM

Dimension reduction

Principal components analysis (PCA)

Algorithm

Implementation

Test case

Evaluation

Other dimension reduction techniques

Performance considerations

K-means

EM

PCA

Summary

Chapter 5: Naïve Bayes Classifiers

Probabilistic graphical models

Naïve Bayes classifiers

Introducing the multinomial Naïve Bayes

Formalism

The frequentist perspective

The predictive model

The zero-frequency problem

Implementation

Software design

Training

Classification

Labeling

Results

Multivariate Bernoulli classification

Model

Implementation

Naïve Bayes and text mining

Basics of information retrieval

Implementation

Extraction of terms

Scoring of terms

Testing

Retrieving textual information

Evaluation

Pros and cons

Summary

Chapter 6: Regression and Regularization

Linear regression

One-variate linear regression

Implementation

Test case

Ordinary least squares (OLS) regression

Design

Implementation

Test case 1 – trending

Test case 2 – features selection

Regularization

Ln roughness penalty

The ridge regression

Implementation

The test case

Numerical optimization

The logistic regression

The logit function

Binomial classification

Software design

The training workflow

Configuring the least squares optimizer

Computing the Jacobian matrix

Defining the exit conditions

Defining the least squares problem

Minimizing the loss function

Test

Classification

Summary

Chapter 7: Sequential Data Models

Markov decision processes

The Markov property

The first-order discrete Markov chain

The hidden Markov model (HMM)

Notation

The lambda model

HMM execution state

Evaluation (CF-1)

Alpha class (the forward variable)

Beta class (the backward variable)

Training (CF-2)

Baum-Welch estimator (EM)

Decoding (CF-3)

The Viterbi algorithm

Putting it all together

Test case

The hidden Markov model for time series analysis

Conditional random fields

Introduction to CRF

Linear chain CRF

CRF and text analytics

The feature functions model

Software design

Implementation

Building the training set

Generating tags

Extracting data sequences

CRF control parameters

Putting it all together

Tests

The training convergence profile

Impact of the size of the training set

Impact of the L2 regularization factor

Comparing CRF and HMM

Performance consideration

Summary

Chapter 8: Kernel Models and Support Vector Machines

Kernel functions

Overview

Common discriminative kernels

The support vector machine (SVM)

The linear SVM

The separable case (hard margin)

The nonseparable case (soft margin)

The nonlinear SVM

Max-margin classification

The kernel trick

Support vector classifier (SVC)

The binary SVC

LIBSVM

Software design

Configuration parameters

SVM implementation

C-penalty and margin

Kernel evaluation

Application to risk analysis

Anomaly detection with one-class SVC

Support vector regression (SVR)

Overview

SVR versus linear regression

Performance considerations

Summary

Chapter 9: Artificial Neural Networks

Feed-forward neural networks (FFNN)

The Biological background

The mathematical background

The multilayer perceptron (MLP)

The activation function

The network architecture

Software design

Model definition

Layers

Synapses

Connections

Training cycle/epoch

Step 1 – input forward propagation

Step 2 – sum of squared errors

Step 3 – error backpropagation

Step 4 – synapse/weights adjustment

Step 5 – convergence criteria

Configuration

Putting all together

Training strategies and classification

Online versus batch training

Regularization

Model instantiation

Prediction

Evaluation

Impact of learning rate

Impact of the momentum factor

Test case

Implementation

Models evaluation

Impact of hidden layers architecture

Benefits and limitations

Summary

Chapter 10: Genetic Algorithms

Evolution

The origin

NP problems

Evolutionary computing

Genetic algorithms and machine learning

Genetic algorithm components

Encodings

Value encoding

Predicate encoding

Solution encoding

The encoding scheme

Genetic operators

Selection

Crossover

Mutation

Fitness score

Implementation

Software design

Key components

Selection

Controlling population growth

GA configuration

Crossover

Population

Chromosomes

Genes

Mutation

Population

Chromosomes

Genes

The reproduction cycle

GA for trading strategies

Definition of trading strategies

Trading operators

The cost/unfitness function

Trading signals

Trading strategies

Signal encoding

Test case

Data extraction

Initial population

Configuration

GA instantiation

GA execution

Tests

Advantages and risks of genetic algorithms

Summary

Chapter 11: Reinforcement Learning

Introduction

The problem

A solution – Q-learning

Terminology

Concept

Value of policy

Bellman optimality equations

Temporal difference for model-free learning

Action-value iterative update

Implementation

Software design

States and actions

Search space

Policy and action-value

The Q-learning training

Tail recursion to the rescue

Prediction

Option trading using Q-learning

Option property

Option model

Function approximation

Constrained state-transition

Putting it all together

Evaluation

Pros and cons of reinforcement learning

Learning classifier systems

Introduction to LCS

Why LCS

Terminology

Extended learning classifier systems (XCS)

XCS components

Application to portfolio management

XCS core data

XCS rules

Covering

Example of implementation

Benefits and limitation of learning

classifier systems

Summary

Chapter 12: Scalable Frameworks

Overview

Scala

Controlling object creation

Parallel collections

Processing a parallel collection

Benchmark framework

Performance evaluation

Scalability with Actors

The Actor model

Partitioning

Beyond actors – reactive programming

Akka

Master-workers

Messages exchange

Worker actors

The workflow controller

The master Actor

Master with routing

Distributed discrete Fourier transform

Limitations

Futures

The Actor life cycle

Blocking on futures

Handling future callbacks

Putting all together

Apache Spark

Why Spark

Design principles

In-memory persistency

Laziness

Transforms and Actions

Shared variables

Experimenting with Spark

Deploying Spark

Using Spark shell

MLlib

RDD generation

K-means using Spark

Performance evaluation

Tuning parameters

Tests

Performance considerations

Pros and cons

0xdata Sparkling Water

Summary

Appendix: Basic Concepts

Scala programming

List of libraries

Format of code snippets

Encapsulation

Class constructor template

Companion objects versus case classes

Enumerations versus case classes

Overloading

Design template for classifiers

Data extraction

Data sources

Extraction of documents

Matrix class

Mathematics

Linear algebra

QR Decomposition

LU factorization

LDL decomposition

Cholesky factorization

Singular value decomposition

Eigenvalue decomposition

Algebraic and numerical libraries

First order predicate logic

Jacobian and Hessian matrices

Summary of optimization techniques

Gradient descent methods

Quasi-Newton algorithms

Nonlinear least squares minimization

Lagrange multipliers

Overview of dynamic programming

Finances 101

Fundamental analysis

Technical analysis

Terminology

Trading signals and strategy

Price patterns

Options trading

Financial data sources

Suggested online courses

References

Index

下载地址

请登录后发表评论

    没有回复内容