TensorFlow is a free and opensource software library for dataflow and differentiable programming across a range of tasks and it is used for machine learning applications such as neural networks. Tensorflow has grown to become one of the widely adopted ML platforms in the world. TensorFlow offers multiple levels of abstraction so you can choose the right one for your needs. You can build and train models by using the highlevel Keras API, which makes getting started with TensorFlow and machine learning easy.
There are a number of good resources to learn Tensorflow, I would mention:
 HandsOn Machine Learning with ScikitLearn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 2nd Edition.
It takes you from classical ML to new Tensorflow 2, to broaden your understanding about machine learning in general and more specific about Deep Learning.  Coursera TensorFlow in Practice Specialization. A Very insightful series, which helps you to discover the tools software developers use to build scalable AIpowered algorithms in TensorFlow.
 Google  Machine Learning Crash Course. A course developed to train internal employees, was updated for the new Tensorflow API.
This is the first part of the Exploring Tensorflow 2 series. In this part I’ll use Sequential and Functional Keras API to create a Deep Neural Network model and use it to predict the likelihood to survive of the Titanic passengers (a well known dataset for data science). I’ll try also to discover best NN parameters for the model using GridSearchCV / RandomizedSearchCV.
In the next blog posts I’ll explore more of the Tensorflow 2 API, like TF.Data for input pipelines, TF.Keras to create CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) models, deployment strategies and others.
The Notebook is available on Kaggle. Kaggle is the largest data science community where notebooks, datasets are shared and you can even enrol in a ML competition.
Table of Contents:
 2. Preprocessing
 3. Build, Train and Evaluate the Model
 3.1. Tensorflow 2 & Keras
 3.2. Keras building blocks
 3.3. TF Keras model using Sequential API
 3.3.1. Model architecture
 3.3.2. Compile the Model
 3.3.3. Train the Model
 3.3.4. Predict on the Test data and submit to Kaggle
 3.4. TF Keras model using Functional API
 3.5. Hyperparameter Tuning
2. PreProcessing the data
This is a simplified version of the data preprocessing, checkout the Notebook for detailed flow. Also, it worth to explore the other notebooks for Titanic dataset published by the community.
Preprocessing refers to the transformations applied to our data before feeding it to the algorithm. Data Preprocessing is a technique that is used to convert the raw data into a clean data set.
Deep Learning cannot operate on label data directly. It requires all input variables and output variables to be numeric, threfore the categorical data will be transformed in numbers and numerical values will be scaled for better convergence.
We’ll introduce new features as well, that helps the deep learning algorithms to find better patterns, which will increase prediction accuracy.


'2.1.0'
The data has been split into two groups:
 training set (train.csv) – should be used to build your machine learning models and it has the labels
 test set (test.csv) – should be used to see how well your model performs on unseen data, as I’m using Kaggle the ground truth for each passenger is not provided


Checkout the original data, before applying feature engineering, encoding and scalling:
2.1. Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms.
Based on initial exploration of the data, we observed that the Titles from the names (like Miss, Mrs etc) could influence the performance of the model. Therefore we extract those and populate within a new column.


2.2. Fill in the missing data
Although it is possible to train algorithms with missing values, it’s highly recommended to clean the data.
Strategies to clean up dataset:
 Drop the missing values: Drop the row if a particular feature is missing
 Replace the missing values with mean/median imputation: Calculate the mean or median of the feature and replace it with the missing values. This is an approximation which can add variance to the data set. But the loss of the data can be negated by this method which yields better results compared to removal of rows and columns


2.3. Encode Categorical and Scale Continuous variables
There are two types of data:
 Categorical : Features whose values are taken from a defined set of values. For instance, days in a week : {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday} is a category because its value is always taken from this set. Another example could be the Boolean set : {True, False}
 Numerical : Features whose values are continuous or integervalued. They are represented by numbers and possess most of the properties of numbers. For instance, number of steps you walk in a day, or the speed at which you are driving your car at.
Encode Categorical Variables
We need to convert all categorical variables into numeric format. The categorical variables we will be keeping are Embarked, Sex and Title.
Scale Continuous variables
Standardization is a useful technique to transform attributes with a Gaussian distribution and differing means and standard deviations to a standard Gaussian distribution with a mean of 0 and a standard deviation of 1.
We can standardize data using scikitlearn with the StandardScaler class.




(891, 16)
3. Create the Model.
3.1. TensorFlow 2 & Keras
TensorFlow has been upgraded recently to 2.x version, most notable additions in TensorFlow 2.x comparing with 1.x:
 tf.keras  default TensorFlow’s highlevel API for building and training deep learning models.
 Eager execution  is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later.
 Distribution Strategy API  makes it easy to distribute and train models on different hardware configurations without changing the model definition.
 Standardized SavedModel file format  allows you to run your models with TensorFlow, deploy them with TensorFlow Serving, use them on mobile and embedded systems with TensorFlow Lite, and train and run in the browser or Node.js with TensorFlow.js.
This is just a small subset of features, checkout the official documentation for the latest details. Keep in mind that the framework is growing very rapidly, so there are a number of new additions in the last 2.x APIs.
Keras is a userfriendly API standard for machine learning and it is the central highlevel API used to build and train models in TensorFlow 2.x. Importantly, Keras provides several modelbuilding APIs (Sequential, Functional, and Subclassing), so you can choose the right level of abstraction for your project.
The new architecture of TensorFlow 2.x using a simplified, conceptual diagram:
3.2. Keras building blocks
Keras models are made by connecting configurable building blocks together.
In Keras, you assemble layers to build models. A model is a graph of layers. The most common type of model is a stack of layers: the tf.keras.Sequential model.
3.3. TF Keras model using Sequential API
3.3.1. Model architecture
Dense layers are the most basic neural network architecture. In this layer, all the inputs and outputs are connected to all the neurons in each layer.
This is one Neuron computation:
A neural network without an activation function is essentially just a linear regression model. The activation function does the nonlinear transformation to the input making it capable to learn and perform more complex tasks. Most known activation functions are: Sigmoid , Tanh and Relu.
It is out of the scope of this article to present the math behind the DNN, but the scope of the training process is to find the best weights that will minimize the cost function, in order to make better predictions.
The simplified process:
 Forward Propagation – given the inputs from previous layer or the input layer, each unit computes the affine function and then apply the activation function. All computed variables on each layers are stored and used by backpropagation.
 Calculate loss function which compare the final output of the model with the actual target in the training data.
 Backpropagation, it goes backwards from right to left, and propagates the error to every node, then calculate each weight’s contribution to the error, and adjust the weights accordingly using gradient descent.
TF Keras Dense Layer implements the operation: output = activation(dot(input, kernel) + bias) where:
 activation is the elementwise activation function passed as the activation argument.
 kernel is a weights matrix created by the layer.
 bias is a bias vector created by the layer.
Now, let’s create a simple model using TF Keras. Sequential model is the simplest kind of Keras model for neural networks that are just composed of a single stack of layers connected sequentially.


Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 18) 306
_________________________________________________________________
dense_1 (Dense) (None, 8) 152
_________________________________________________________________
dense_2 (Dense) (None, 1) 9
=================================================================
Total params: 467
Trainable params: 467
Nontrainable params: 0
_________________________________________________________________


(18, 8)
3.3.2. Compile the model
tf.keras.Model.compile takes three important arguments:
 optimizer: This object specifies the training procedure. Pass it optimizer instances from the tf.keras.optimizers module, such as tf.keras.optimizers.Adam or tf.keras.optimizers.SGD. If you just want to use the default parameters, you can also specify optimizers via strings, such as ‘adam’ or ‘sgd’.
 loss: The function to minimize during optimization. Common choices include mean square error (mse), categorical_crossentropy, and binary_crossentropy. Loss functions are specified by name or by passing a callable object from the tf.keras.losses module.
 metrics: Used to monitor training. These are string names or callables from the tf.keras.metrics module.
 Additionally, to make sure the model trains and evaluates eagerly, you can make sure to pass run_eagerly=True as a parameter to compile.


This is equivalent to:
model1.compile(loss="binary_crossentropy”, optimizer="sgd”, metrics=[“accuracy”])
3.3.3. Train the model
tf.keras.Model.fit takes three important arguments:
 epochs: Training is structured into epochs. An epoch is one iteration over the entire input data (this is done in smaller batches).
 batch_size: When passed NumPy data, the model slices the data into smaller batches and iterates over these batches during training. This integer specifies the size of each batch. Be aware that the last batch may be smaller if the total number of samples is not divisible by the batch size.
 validation_data: When prototyping a model, you want to easily monitor its performance on some validation data. Passing this argument—a tuple of inputs and labels—allows the model to display the loss and metrics in inference mode for the passed data, at the end of each epoch.
 validation_split: Float between 0 and 1. Same as validation_data, but the but it use a fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.


val_acc: 80.47%


{'batch_size': 32,
'epochs': 100,
'steps': 23,
'samples': 712,
'verbose': 0,
'do_validation': True,
'metrics': ['loss', 'binary_accuracy', 'val_loss', 'val_binary_accuracy']}


3.3.4 Predict on the Test data and submit to Kaggle
If you participate in a Kaggle competion, you may want to submit the model predictions and receive the calculated score from Kaggle. This way you can verify if the model generalize well or not.


3.4. Keras Model using Functional API
Although it is possible to use Keras Sequential API, for next model as the layers are added one after another, for the sake of this exercise I’ll create the model using Functional API.
The Keras functional API is the way to go for defining complex models, such as multioutput models, directed acyclic graphs, or models with shared layers.
Comparing with previuos model:
 Elu is used as an activation function, it looks a lot like the ReLU function but it has a nonzero gradient for z < 0
 I’m using Dropout, which is one popular technique for regularization, however L1, L2 and MaxNorm can be used as well.


Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 16)] 0
_________________________________________________________________
dense_3 (Dense) (None, 36) 612
_________________________________________________________________
dropout (Dropout) (None, 36) 0
_________________________________________________________________
dense_4 (Dense) (None, 18) 666
_________________________________________________________________
dropout_1 (Dropout) (None, 18) 0
_________________________________________________________________
dense_5 (Dense) (None, 8) 152
_________________________________________________________________
dense_6 (Dense) (None, 1) 9
=================================================================
Total params: 1,439
Trainable params: 1,439
Nontrainable params: 0
_________________________________________________________________


The fit() method accepts a callbacks argument that lets you specify a list of objects that Keras will call at the start and end of training, at the start and end of each epoch, and even before and after processing each batch.
tf.keras.callbacks.EarlyStopping is usefull when you want to stop training when a monitored quantity has stopped improving. The number of epochs can be set to a large value since training will stop automatically when there is no more progress.


Train the model with callbacks:


val_acc: 86.41%




3.5. Hyperparameter Tuning
Create the model, in such way to be configurable. The dynamic model allows you to try and find out the best model hyperparameters.


In Keras there is a wrapper for using the ScikitLearn API with Keras models, that has 2 classes:
 KerasClassifier: Implementation of the scikitlearn classifier API for Keras.
 KerasRegressor: Implementation of the scikitlearn regressor API for Keras.
KerasClassifier object is a thin wrapper around the Keras model built using build_model(). The object can be used like a regular ScikitLearn regressor so we can train, evaluate and make predictions using it.






val_acc: 82.82%
Random search for hyper parameters.
In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. The number of parameter settings that are tried is given by n_iter.
Activation functions:
 Relu  Applies the rectified linear unit activation function: max(x, 0)
 Selu  Scaled Exponential Linear Unit (SELU): scale * x if x > 0 and scale * alpha * (exp(x)  1) if x < 0
 Elu  Exponential linear unit: x if x > 0 and alpha * (exp(x)1) if x < 0
Optimizers:
 SGD  stochastic gradient descent, is the “classical” optimization algorithm. In SGD the gradient is computed for the network loss function with respect to each individual weight in the network. Each forward pass through the network results in a certain parameterized loss function, and use the calculated gradients with the learning rate to move the weights in whatever direction its gradient is pointing.
 Adagrad  scales down the gradient vector along the steepest dimensions
 RMSprop  accumulate only the gradients from the most recent iterations (as opposed to all the gradients since the beginning of training)
 Adam  stands for adaptive moment estimation, combines the ideas of momentum optimization and RMSProp


{'optimizer': 'Adam', 'n_neurons': 42, 'n_hidden': 4, 'activation': 'selu'}
It didn’t work out for me, as each time I run it I got different results, but I guess it worth to know that it is posible to use these valuable Scikit Learn libraries with Keras:
{‘optimizer’: ‘RMSprop’, ‘n_neurons’: 84, ‘n_hidden’: 3, ‘activation’: ‘selu’}
{‘optimizer’: ‘Adam’, ‘n_neurons’: 77, ‘n_hidden’: 4, ‘activation’: ‘elu’}
{‘optimizer’: ‘RMSprop’, ‘n_neurons’: 30, ‘n_hidden’: 1, ‘activation’: ‘relu’}
{‘optimizer’: ‘RMSprop’, ‘n_neurons’: 6, ‘n_hidden’: 4, ‘activation’: ‘selu’}
{‘optimizer’: ‘SGD’, ‘n_neurons’: 100, ‘n_hidden’: 4, ‘activation’: ‘elu’}
Assume that you’ve found better hyperparameters, it can be easily apply to your model.


Model: "sequential_33"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_116 (Dense) (None, 77) 1309
_________________________________________________________________
dropout_67 (Dropout) (None, 77) 0
_________________________________________________________________
dense_117 (Dense) (None, 25) 1950
_________________________________________________________________
dropout_68 (Dropout) (None, 25) 0
_________________________________________________________________
dense_118 (Dense) (None, 15) 390
_________________________________________________________________
dropout_69 (Dropout) (None, 15) 0
_________________________________________________________________
dense_119 (Dense) (None, 11) 176
_________________________________________________________________
dropout_70 (Dropout) (None, 11) 0
_________________________________________________________________
dense_120 (Dense) (None, 1) 12
=================================================================
Total params: 3,837
Trainable params: 3,837
Nontrainable params: 0
_________________________________________________________________
None


Using deep learning for this dataset may be overkil, as this dataset is tiny, however it was fun to play with Tensorflow 2.x and Keras API. The performance was not the primary goal of this article, but if you are looking for best performance, maybe using Ensembling/Stacking with RandomForest and GradientBoosting could work better for this dataset.
Anyway, I hope you enjoyed the rundown.