Titanic Survival Prediction using Danfo.js and TensorFlow.js

‌Above, you wrote an async function because loading the dataset over the internet takes a few seconds, depending on your network. Inside the async function, you pass in the URL of the Titanic dataset to the read_csv function.

Next, you’ll perform some basic data pre-processing. The ctypes attribute returns the column data types:

From the data types table above, you’ll notice that there are two strong columns. The first is the Name column which contains the names of each passenger. From the head of the dataset you printed above, you’ll confirm that each name has a title. So you can extract these titles from the names, and this can serve as a new feature.

//A feature engineering: Extract all titles from names columns
let title = df['Name'].apply((x) => { return x.split(".")[0] }).values

//replace in df
df.addColumn({ column: "Name", value: title })

In the code above, you’re calling the apply function in the Name column. The parameter to the apply function is a function that gets called on each element of the column. This function can be any JavaScript function.

So what exactly is the function doing? Well, it’s basically slicing each name and extracting the title. And finally, you’re using the result to replace the original name column. When you’re done, your output becomes:

‌You’ll notice we now have titles in place of names. You can easily one-hot encode this feature:

//label Encode Name feature
let encoder = new dfd.LabelEncoder()
let cols = ["Sex", "Name"]
cols.forEach(col => {
  encoder.fit(df[col])
  enc_val = encoder.transform(df[col])
  df.addColumn({ column: col, value: enc_val })
})

df.head().print()

In code cell above, you’re label encoding the Sex and Name columns. You loop over each column name, fit the encoder to the column, transform it, and finally reassign it to the DataFrame. The output is shown below:

‌Next, you’ll split the data, separating the features from the labels. In this task, you’re trying to predict the survival of a passenger. The Survival column is the first in the DataFrame, so you’ll use iloc to subset the DataFrame:

‌Next, you’ll scale the data using MinMaxScaler. It’s important to scale your data before model training, as this will affect that process.

// Standardize the data with MinMaxScaler
let scaler = new dfd.MinMaxScaler()
scaler.fit(Xtrain)
Xtrain = scaler.transform(Xtrain)
return [Xtrain.tensor, ytrain.tensor]

‌In the code cell above, first, you created an instance from the MinMaxScaler class. Next, you fit the training data and finally, you transformed it. The output from the scaler is a DataFrame of the same size as the values scaled.

‌The full code for the load_process_data function becomes:

const dfd = require("danfojs-node")
const tf = require("@tensorflow/tfjs-node")

async function load_process_data() {
    let df = await dfd.read_csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")

    //A feature engineering: Extract all titles from names columns
    let title = df['Name'].apply((x) => { return x.split(".")[0] }).values
    //replace in df
    df.addColumn({ column: "Name", value: title })

    //label Encode Name feature
    let encoder = new dfd.LabelEncoder()
    let cols = ["Sex", "Name"]
    cols.forEach(col => {
        encoder.fit(df[col])
        enc_val = encoder.transform(df[col])
        df.addColumn({ column: col, value: enc_val })
    })


    let Xtrain,ytrain;
    Xtrain = df.iloc({ columns: [`1:`] })
    ytrain = df['Survived']

    // Standardize the data with MinMaxScaler
    let scaler = new dfd.MinMaxScaler()
    scaler.fit(Xtrain)
    Xtrain = scaler.transform(Xtrain)

    return [Xtrain.tensor, ytrain.tensor] //return the data as tensors
}

load_process_data()

Model building with TensorFlow.js‌

In this section, you’ll build a simple classification model using TensorFlow.js. If you’re not familiar with TensorFlow.js, you can start here.

Create a simple function called get_model. This will construct and return a model when called.

function get_model() {
    const model = tf.sequential();
    model.add(tf.layers.dense({ inputShape: [7], units: 124, activation: 'relu', kernelInitializer: 'leCunNormal' }));
    model.add(tf.layers.dense({ units: 64, activation: 'relu' }));
    model.add(tf.layers.dense({ units: 32, activation: 'relu' }));
    model.add(tf.layers.dense({ units: 1, activation: "sigmoid" }))
    model.summary();
    return model
}

‌In the code cell above, you’ve created a neural network with 4 layers. Note the input shape—this should be the same as your column numbers. Also, note that you used a sigmoid activation function in the output layer. This is because you’re working on a binary classification problem.

Next, you’ll create a function called train:


async function train() {
    const model = get_model()
    const data = await load_process_data()
    const Xtrain = data[0]
    const ytrain = data[1]

    model.compile({
        optimizer: "rmsprop",
        loss: 'binaryCrossentropy',
        metrics: ['accuracy'],
    });
    
    console.log("Training started....")
    await model.fit(Xtrain, ytrain,{
        batchSize: 32,
        epochs: 15,
        validationSplit: 0.2,
        callbacks:{
            onEpochEnd: async(epoch, logs)=>{
                console.log(`EPOCH (${epoch + 1}): Train Accuracy: ${(logs.acc * 100).toFixed(2)},
                                                     Val Accuracy:  ${(logs.val_acc * 100).toFixed(2)}n`);
            }
        }
    });
    
}

‌This function calls the load_process_data function to retrieve training data as tensors and also calls the get_model to retrieve the model. Next, you compile the model by specifying an optimizer, a loss function, and a metric to report.

‌Next, you call the fit function on the model by passing the training data and labels (tensors), specifying a batch size, number of epochs, validation split size, and also a callback function to track training progress.

The training progress is printed to the console at the end of each epoch. Below is the full code snippet for loading data to start training your model:

const dfd = require("danfojs-node")
const tf = require("@tensorflow/tfjs-node")

async function load_process_data() {
    let df = await dfd.read_csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")

    //A feature engineering: Extract all titles from names columns
    let title = df['Name'].apply((x) => { return x.split(".")[0] }).values
    //replace in df
    df.addColumn({ column: "Name", value: title })

    //label Encode Name feature
    let encoder = new dfd.LabelEncoder()
    let cols = ["Sex", "Name"]
    cols.forEach(col => {
        encoder.fit(df[col])
        enc_val = encoder.transform(df[col])
        df.addColumn({ column: col, value: enc_val })
    })


    let Xtrain,ytrain;
    Xtrain = df.iloc({ columns: [`1:`] })
    ytrain = df['Survived']

    // Standardize the data with MinMaxScaler
    let scaler = new dfd.MinMaxScaler()
    scaler.fit(Xtrain)
    Xtrain = scaler.transform(Xtrain)

    return [Xtrain.tensor, ytrain.tensor] //return the data as tensors
}

load_process_data()


function get_model() {
    const model = tf.sequential();
    model.add(tf.layers.dense({ inputShape: [7], units: 124, activation: 'relu', kernelInitializer: 'leCunNormal' }));
    model.add(tf.layers.dense({ units: 64, activation: 'relu' }));
    model.add(tf.layers.dense({ units: 32, activation: 'relu' }));
    model.add(tf.layers.dense({ units: 1, activation: "sigmoid" }))
    model.summary();
    return model
}

async function train() {
    const model = await get_model()
    const data = await load_process_data()
    const Xtrain = data[0]
    const ytrain = data[1]

    model.compile({
        optimizer: "rmsprop",
        loss: 'binaryCrossentropy',
        metrics: ['accuracy'],
    });

    console.log("Training started....")
    await model.fit(Xtrain, ytrain,{
        batchSize: 32,
        epochs: 15,
        validationSplit: 0.2,
        callbacks:{
            onEpochEnd: async(epoch, logs)=>{
                console.log(`EPOCH (${epoch + 1}): Train Accuracy: ${(logs.acc * 100).toFixed(2)},
                                                     Val Accuracy:  ${(logs.val_acc * 100).toFixed(2)}n`);
            }
        }
    });
};

train()

In your terminal, run the script with Node:

This runs the script and displays the training progress after each epoch, as shown below:

‌After 15 epochs, we’ve reached an accuracy of about 83%. This can definitely be improved, but for the sake of simplicity, we’ll stop here.

Conclusion

‌In this tutorial, you’ve seen how to use danfo.js with TensorFlow.js to load and process data, as well as train a neural network, all in JavaScript. This is similar to the Pandas-TensorFlow packages in Python.

‌You’ll also notice that danfo.js provides a similar API as Pandas and can easily be picked up by Python developers.

‌As an extra task, you can try to do more feature engineering using danfo.js and try to improve the accuracy of your model.

‌Go danfo! 😎

Some important links:

And that’s it! If you have questions, comments, or additions, don’t hesitate to use the comment section below.

Connect with me on Twitter.

Connect with me on LinkedIn.

Avatar photo

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *