Predicting Destination - AirBnB Customers¶

The goal for this analysis is to build a neural network, using Tensorflow, that can predict which country a new user of AirBnB will make his/her first trip to. More information about the data can be found at: https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings. This data was originally from a 2016 Kaggle competition, sponsored by AirBnB. I believe that it is still an excellent learning reasource because of the opportunity to clean data, engineering features, and build a model, which will all significantly impact the quality of the predictions.

import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split
from scipy import stats
pd.set_option("display.max_columns", 1000)
import math
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
from sklearn.cross_validation import train_test_split
import tensorflow as tf
from tqdm import tqdm
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from sklearn.metrics import classification_report

//anaconda/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

# Load the data
countries = pd.read_csv("countries.csv")
test = pd.read_csv("test_users.csv")
train = pd.read_csv("train_users_2.csv")
sessions = pd.read_csv("sessions.csv")

First, let's have a look at the data we are working with.

test.head()

train.head()

sessions.head(10)

print(test.shape)
print(train.shape)
print(sessions.shape)

(62096, 15)
(213451, 16)
(10567737, 6)

Our target feature is 'country_destination,' which can be found in the 'train' dataframe. Given this, let's first explore the sessions data, then merge it with the train dataframe (on 'user_id'), after we are done transforming it.

Sessions¶

sessions.isnull().sum()

user_id            34496
action             79626
action_type      1126204
action_detail    1126204
device_type            0
secs_elapsed      136031
dtype: int64

#Drop rows where user_id is null because we want to tie everything back to a user.
sessions = sessions[sessions.user_id.notnull()]

sessions.isnull().sum()

user_id                0
action             79480
action_type      1122957
action_detail    1122957
device_type            0
secs_elapsed      135483
dtype: int64

# How do nulls in action relate to action_type
sessions[sessions.action.isnull()].action_type.value_counts()

message_post    79480
Name: action_type, dtype: int64

# Every action with a null value, has action_type equal to 'message_post'.
# Let's change all the null values to 'message'
sessions.loc[sessions.action.isnull(), 'action'] = 'message'

sessions.isnull().sum()

user_id                0
action                 0
action_type      1122957
action_detail    1122957
device_type            0
secs_elapsed      135483
dtype: int64

# action_type and action_detail are missing values in the same rows, this simplifies things a little.
print(sessions[sessions.action_type.isnull()].action.value_counts())
print()
print(sessions[sessions.action_detail.isnull()].action.value_counts())

show                      580485
similar_listings_v2       168457
lookup                    161422
campaigns                 104331
track_page_view            80949
index                      16682
localization_settings       5380
uptodate                    3329
signed_out_modal            1054
currencies                   292
update                       225
braintree_client_token       120
check                        119
widget                        75
phone_verification            16
satisfy                        9
disaster_action                6
track_activity                 6
Name: action, dtype: int64

show                      580485
similar_listings_v2       168457
lookup                    161422
campaigns                 104331
track_page_view            80949
index                      16682
localization_settings       5380
uptodate                    3329
signed_out_modal            1054
currencies                   292
update                       225
braintree_client_token       120
check                        119
widget                        75
phone_verification            16
satisfy                        9
disaster_action                6
track_activity                 6
Name: action, dtype: int64

To fill in the null values for action_type and action_detail, we will perform these steps:

Use the most common value relative to each user and action
Use the most common value relative to each action
Use the value 'missing'

# function that finds the most common value of a feature, specific to each user and action.
def most_common_value_by_user(merge_df, feature): 
    # Find the value counts for a feature, for each user and action.
    new_df = pd.DataFrame(merge_df.groupby(['user_id','action'])[feature].value_counts())
    # Set the index to a new feature so that it can be transformed.
    new_df['index_tuple'] = new_df.index 
    # Change the feature name to count, since it is the value count of the feature.
    new_df['count'] = new_df[feature]
    
    new_columns = ['user_id','action',feature]
    # separate the features of index_tuple (a list), into their own columns
    for n,col in enumerate(new_columns):
        new_df[col] = new_df.index_tuple.apply(lambda index_tuple: index_tuple[n])
    
    # reset index and drop index_tuple
    new_df = new_df.reset_index(drop = True)
    new_df = new_df.drop(['index_tuple'], axis = 1) 
    
    # Create a new dataframe for each user, action, and the count of the most common feature
    new_df_max = pd.DataFrame(new_df.groupby(['user_id','action'], as_index = False)['count'].max())
    # Merge dataframes to include the name of the most common feature
    new_df_max = new_df_max.merge(new_df, on = ['user_id','action','count'])
    # Drop count as it is not needed for the next step
    new_df_max = new_df_max.drop('count', axis = 1)
    
    # Merge with main dataframe (sessions)
    merge_df = merge_df.merge(new_df_max, left_on = ['user_id','action'], right_on = ['user_id','action'], how = 'left')
    
    return merge_df

sessions = most_common_value_by_user(sessions, 'action_type')
print("action_type is complete.")

sessions = most_common_value_by_user(sessions, 'action_detail')
print("action_detail is complete.")

action_type is complete.
action_detail is complete.

# Replace the nulls with the values from the features created by 'most_common_value_by_user' function.
sessions.loc[sessions.action_type_x.isnull(), 'action_type_x'] = sessions.action_type_y
sessions.loc[sessions.action_detail_x.isnull(), 'action_detail_x'] = sessions.action_detail_y

# Change the features' names to their originals and drop unnecessary features.
sessions['action_type'] = sessions.action_type_x
sessions['action_detail'] = sessions.action_detail_x
sessions = sessions.drop(['action_type_x','action_type_y','action_detail_x','action_detail_y'], axis = 1)

That helped to remove some of the nulls values. Now let's try the more general function.

sessions.isnull().sum()

user_id               0
action                0
device_type           0
secs_elapsed     167233
action_type      531386
action_detail    531386
dtype: int64

# function that finds the most common value of a feature, specific to each action.
def most_common_value_by_all_users(merge_df, feature):
    # Group by action, then find the value counts of the feature
    new_df = pd.DataFrame(merge_df.groupby('action')[feature].value_counts())
    # Set the index to a new feature so that it can be transformed.
    new_df['index_tuple'] = new_df.index 
    # Change the feature name to count, since it is the value count of the feature.
    new_df['count'] = new_df[feature]
    
    new_columns = ['action',feature]
    # separate the features of index_tuple (a list), into their own columns
    for n,col in enumerate(new_columns):
        new_df[col] = new_df.index_tuple.apply(lambda index_tuple: index_tuple[n])
    
    # reset index and drop index_tuple
    new_df = new_df.reset_index(drop = True)
    new_df = new_df.drop(['index_tuple'], axis = 1) 
    
    # Create a new dataframe for each action, and the count of the most common feature
    new_df_max = pd.DataFrame(new_df.groupby('action', as_index = False)['count'].max())
    # Merge dataframes to include the name of the most common feature
    new_df_max = new_df_max.merge(new_df, on = ['action','count'])
    # Drop count as it is not needed for next step
    new_df_max = new_df_max.drop('count', axis = 1)
    
    # Merge dataframe with main dataframe (sessions)
    merge_df = merge_df.merge(new_df_max, left_on = 'action', right_on = 'action', how = 'left')
    
    return merge_df

sessions = most_common_value_by_all_users(sessions, 'action_type')
print("action_type is complete.")
sessions = most_common_value_by_all_users(sessions, 'action_detail')
print("action_detail is complete.")

action_type is complete.
action_detail is complete.

# Replace the nulls with the values from the features created by 'most_common_value_by_all_users' function.
sessions.loc[sessions.action_type_x.isnull(), 'action_type_x'] = sessions.action_type_y
sessions.loc[sessions.action_detail_x.isnull(), 'action_detail_x'] = sessions.action_detail_y

# Change the features' names to their originals and drop the unnecessary features.
sessions['action_type'] = sessions.action_type_x
sessions['action_detail'] = sessions.action_detail_x
sessions = sessions.drop(['action_type_x','action_type_y','action_detail_x','action_detail_y'], axis = 1)

There are still some null values remaining. Let's take a look at what actions these null values are related to.

sessions.isnull().sum()

user_id               0
action                0
device_type           0
secs_elapsed     167233
action_type      415562
action_detail    415562
dtype: int64

sessions[sessions.action_type.isnull()].action.value_counts()

similar_listings_v2       168457
lookup                    161422
track_page_view            80949
uptodate                    3329
signed_out_modal            1054
braintree_client_token       120
check                        119
widget                        75
phone_verification            16
satisfy                        9
disaster_action                6
track_activity                 6
Name: action, dtype: int64

Let's take a look at the value counts for all of the actions to see how their frequency compares to others.

sessions.action.value_counts()

show                           2866444
index                           893606
search_results                  723124
personalize                     704782
search                          533887
ajax_refresh_subtotal           486414
update                          370379
similar_listings                363423
social_connections              337764
reviews                         324825
create                          225961
active                          187370
similar_listings_v2             168457
lookup                          161422
dashboard                       152515
header_userpic                  141315
collections                     124067
edit                            108999
campaigns                       104647
track_page_view                  80949
message                          79484
unavailabilities                 77985
qt2                              64585
notifications                    61946
confirm_email                    58565
requested                        57068
identity                         53550
ajax_check_dates                 52426
show_personalize                 50353
authenticate                     44323
                                ...   
reset_calendar                       2
envoy_bank_details_redirect          2
recommendation_page                  2
unsubscribe                          2
views_campaign                       2
sandy                                2
stpcv                                2
rest-of-world                        2
accept_decline                       2
tos_2014                             2
special_offer                        2
views_campaign_rules                 2
use_mobile_site                      2
preapproval                          2
confirmation                         2
desks                                1
deactivate                           1
nyan                                 1
revert_to_admin                      1
set_minimum_payout_amount            1
plaxo_cb                             1
reactivate                           1
deauthorize                          1
host_cancel                          1
wishlists                            1
acculynk_bin_check_failed            1
sldf                                 1
events                               1
update_message                       1
deactivated                          1
Name: action, dtype: int64

'similar_listings_v2', 'lookup', and 'track_page_view' are the three main features with null values. I will give each of them specific values for action_type and action_detail, otherwise I will set the value to 'missing'.

# Use these values for 'similar_listings_v2' since they are similar actions.
print(sessions[sessions.action == 'similar_listings'].action_type.value_counts())
print(sessions[sessions.action == 'similar_listings'].action_detail.value_counts())

data    363423
Name: action_type, dtype: int64
similar_listings    363423
Name: action_detail, dtype: int64

sessions.loc[sessions.action == 'similar_listings_v2', 'action_type'] = "data"
sessions.loc[sessions.action == 'similar_listings_v2', 'action_detail'] = "similar_listings"

# No other action is similar, so we'll use the same work for all three features.
sessions.loc[sessions.action == 'lookup', 'action_type'] = "lookup"
sessions.loc[sessions.action == 'lookup', 'action_detail'] = "lookup"

sessions.loc[sessions.action == 'track_page_view', 'action_type'] = "track_page_view"
sessions.loc[sessions.action == 'track_page_view', 'action_detail'] = "track_page_view"

sessions.action_type = sessions.action_type.fillna("missing")
sessions.action_detail = sessions.action_detail.fillna("missing")

All good. Now just secs_elapsed is left.

sessions.isnull().sum()

user_id               0
action                0
device_type           0
secs_elapsed     167233
action_type           0
action_detail         0
dtype: int64

To keep things simple, let's fill the nulls with the median value for each action.

# Find the median secs_elapsed for each action
median_duration = pd.DataFrame(sessions.groupby('action', as_index = False)['secs_elapsed'].median())
median_duration.head()

# Merge dataframes on action
sessions = sessions.merge(median_duration, left_on = 'action', right_on = 'action', how = 'left')
print("Merge complete.")
# if secs_elapsed is null, fill it with the median value
sessions.loc[sessions.secs_elapsed_x.isnull(), 'secs_elapsed_x'] = sessions.secs_elapsed_y
print("Nulls are filled.")
# Change column name
sessions['secs_elapsed'] = sessions.secs_elapsed_x
print("Column is created.")
# Drop unneeded columns
sessions = sessions.drop(['secs_elapsed_x','secs_elapsed_y'], axis = 1)
print("Columns are dropped.")

Merge complete.
Nulls are filled.
Column is created.
Columns are dropped.

All clean!

sessions.isnull().sum()

user_id          0
action           0
device_type      0
action_type      0
action_detail    0
secs_elapsed     0
dtype: int64

I think the best next step would be to take the information from sessions and summarize it. We will create a new dataframe, add the most important features, then join it with the train dataframe.

Sessions' Summary¶

sessions.head()

# sessions_summary is set to the number of times a user_id appears in sessions
sessions_summary = pd.DataFrame(sessions.user_id.value_counts(sort = False))
# Set action_count equal to user_id
sessions_summary['action_count'] = sessions_summary.user_id
# Set user_id equal to the index
sessions_summary['user_id'] = sessions_summary.index
# Rest the index
sessions_summary = sessions_summary.reset_index(drop = True)

Looks good, now let's add some features!

sessions_summary.head()

# user_duration is the sums of secs_elapsed for each user
user_duration = pd.DataFrame(sessions.groupby('user_id').secs_elapsed.sum())
user_duration['user_id'] = user_duration.index
# Merge dataframes
sessions_summary = sessions_summary.merge(user_duration)
# Create new feature, 'duration', to equal secs_elapsed
sessions_summary['duration'] = sessions_summary.secs_elapsed
sessions_summary = sessions_summary.drop("secs_elapsed", axis = 1)

sessions_summary.head()

# This function finds the most common value, for a specific feature, for each user.
def most_frequent_value(merge_df, feature):
    # Group by the users and find the value counts of the feature
    new_df = pd.DataFrame(sessions.groupby('user_id')[feature].value_counts())
    # The index is a tuple, and we need to seperate it, so let's create a new feature from it.
    new_df['index_tuple'] = new_df.index
    # The new columns are the features created from the tuple.
    new_columns = ['user_id',feature]
    for n,col in enumerate(new_columns):
        new_df[col] = new_df.index_tuple.apply(lambda index_tuple: index_tuple[n])
    
    # Drop the old index (the tuple index)
    new_df = new_df.reset_index(drop = True)
    # Drop the unneeded feature
    new_df = new_df.drop('index_tuple', axis = 1)
    # Select the first value for each user, its most common
    new_df = new_df.groupby('user_id').first()
    
    # Set user_id equal to the index, then reset the index
    new_df['user_id'] = new_df.index
    new_df = new_df.reset_index(drop = True)
    
    merge_df = merge_df.merge(new_df)
    
    return merge_df

# For each categorical feature in sessions, find the most common value for each user.
sessions_feature = ['action','action_type','action_detail','device_type']

for feature in sessions_feature:
    sessions_summary = most_frequent_value(sessions_summary, feature)
    print("{} is complete.".format(feature))

action is complete.
action_type is complete.
action_detail is complete.
device_type is complete.

sessions_summary.head()

# This function finds the number of unique values of a feature for each user.
def unique_features(feature, feature_name, merge_df):
    # Create a dataframe by grouping the users and the feature
    unique_feature = pd.DataFrame(sessions.groupby('user_id')[feature].unique())
    unique_feature['user_id'] = unique_feature.index
    unique_feature = unique_feature.reset_index(drop = True)
    # Create a new feature equal to the number of unique features for each user
    unique_feature[feature_name] = unique_feature[feature].map(lambda x: len(x))
    # Drop the needed feature
    unique_feature = unique_feature.drop(feature, axis = 1)
    merge_df = merge_df.merge(unique_feature, on = 'user_id')
    return merge_df

# Apply unique_features to each of the categorical features in sessions
sessions_summary = unique_features('action', 'unique_actions', sessions_summary)
print("action is complete.")
sessions_summary = unique_features('action_type', 'unique_action_types', sessions_summary)
print("action_type is complete.")
sessions_summary = unique_features('action_detail', 'unique_action_details', sessions_summary)
print("action_detail is complete.")
sessions_summary = unique_features('device_type', 'unique_device_types', sessions_summary)
print("device_type is complete.")

action is complete.
action_type is complete.
action_detail is complete.
device_type is complete.

sessions_summary.head()

# Find the maximum and minimum secs_elapsed/duration for each user in sessions.
max_durations = pd.DataFrame(sessions.groupby(['user_id'], as_index = False)['secs_elapsed'].max())
sessions_summary = sessions_summary.merge(max_durations, on = 'user_id')
sessions_summary['max_duration'] = sessions_summary.secs_elapsed
sessions_summary = sessions_summary.drop('secs_elapsed', axis = 1)

print("max_durations is complete.")

min_durations = pd.DataFrame(sessions.groupby(['user_id'], as_index = False)['secs_elapsed'].min())
sessions_summary = sessions_summary.merge(min_durations, on = 'user_id')
sessions_summary['min_duration'] = sessions_summary.secs_elapsed
sessions_summary = sessions_summary.drop('secs_elapsed', axis = 1)

print("min_durations is complete.")

max_durations is complete.
min_durations is complete.

# Find the average duration for each user
sessions_summary['avg_duration'] = sessions_summary.duration / sessions_summary.action_count

sessions_summary.head(5)

# Add new features based on the type of device that the user used most frequently.
apple_device = ['Mac Desktop','iPhone','iPdad Tablet','iPodtouch']
desktop_device = ['Mac Desktop','Windows Desktop','Chromebook','Linux Desktop']
tablet_device = ['Android App Unknown Phone/Tablet','iPad Tablet','Tablet']
mobile_device = ['Android Phone','iPhone','Windows Phone','Blackberry','Opera Phone']

device_types = {'apple_device': apple_device, 
                'desktop_device': desktop_device,
                'tablet_device': tablet_device,
                'mobile_device': mobile_device}

for device in device_types:
    sessions_summary[device] = 0
    sessions_summary.loc[sessions_summary.device_type.isin(device_types[device]), device] = 1

sessions_summary.head()

# Check if there are any null values before merging with train.
sessions_summary.isnull().sum()

user_id                  0
action_count             0
duration                 0
action                   0
action_type              0
action_detail            0
device_type              0
unique_actions           0
unique_action_types      0
unique_action_details    0
unique_device_types      0
max_duration             0
min_duration             0
avg_duration             0
tablet_device            0
mobile_device            0
desktop_device           0
apple_device             0
dtype: int64

print(sessions_summary.shape)
print(train.shape)
print(test.shape)

(135483, 18)
(213451, 16)
(62096, 15)

# Merge train and test with sessions_summary
train1 = train.merge(sessions_summary, left_on = train['id'], right_on = sessions_summary['user_id'], how = 'inner')
# train2 is equal to the users that are not in train1
train2 = train[~train.id.isin(train1.id)]
train = pd.concat([train1, train2])

test1 = test.merge(sessions_summary, left_on = test['id'], right_on = sessions_summary['user_id'], how = 'inner')
# test2 is equal to the users that are not in test1
test2 = test[~test.id.isin(test1.id)]
test = pd.concat([test1, test2])

The next step is to transform the features so that they are ready for training the neural network.

Feature Engineering¶

# Concatenate train and test because all transformations need to happen to both dataframes.
df = pd.concat([train,test])

df.head()

df.shape

(275547, 34)

df.isnull().sum()

action                     140064
action_count               140064
action_detail              140064
action_type                140064
affiliate_channel               0
affiliate_provider              0
age                        116866
apple_device               140064
avg_duration               140064
country_destination         62096
date_account_created            0
date_first_booking         186639
desktop_device             140064
device_type                140064
duration                   140064
first_affiliate_tracked      6085
first_browser                   0
first_device_type               0
gender                          0
id                              0
language                        0
max_duration               140064
min_duration               140064
mobile_device              140064
signup_app                      0
signup_flow                     0
signup_method                   0
tablet_device              140064
timestamp_first_active          0
unique_action_details      140064
unique_action_types        140064
unique_actions             140064
unique_device_types        140064
user_id                    140064
dtype: int64

# We don't need this because we already have id and it has 0 null values.
df = df.drop('user_id', axis = 1)

Since there are many users that do not appear in the sessions dataframe, all of their values are NaN. Let's sort out those nulls values first.

def missing_session_data_cat(feature):
    return df[feature].fillna("missing")

def missing_session_data_cont(feature):
    return df[feature].fillna(0)

session_features_cat = ['action','action_detail','action_type','device_type']
session_features_cont = ['action_count','apple_device','desktop_device','mobile_device','tablet_device',
                         'duration','avg_duration','max_duration','min_duration','unique_action_details',
                         'unique_action_types','unique_actions','unique_device_types']

for feature in session_features_cat:
    df[feature] = missing_session_data_cat(feature)
    
for feature in session_features_cont:
    df[feature] = missing_session_data_cont(feature)

That removed most of the null values!

df.isnull().sum()

action                          0
action_count                    0
action_detail                   0
action_type                     0
affiliate_channel               0
affiliate_provider              0
age                        116866
apple_device                    0
avg_duration                    0
country_destination         62096
date_account_created            0
date_first_booking         186639
desktop_device                  0
device_type                     0
duration                        0
first_affiliate_tracked      6085
first_browser                   0
first_device_type               0
gender                          0
id                              0
language                        0
max_duration                    0
min_duration                    0
mobile_device                   0
signup_app                      0
signup_flow                     0
signup_method                   0
tablet_device                   0
timestamp_first_active          0
unique_action_details           0
unique_action_types             0
unique_actions                  0
unique_device_types             0
dtype: int64

df.action_count.describe()

count    275547.000000
mean         39.140927
std          88.944077
min           0.000000
25%           0.000000
50%           0.000000
75%          42.000000
max        2724.000000
Name: action_count, dtype: float64

df[df.action_count > 0].action_count.describe()

count    135483.000000
mean         79.605301
std         113.439177
min           1.000000
25%          17.000000
50%          43.000000
75%          97.000000
max        2724.000000
Name: action_count, dtype: float64

plt.hist(df[df.action_count > 0].action_count)
plt.yscale('log')
plt.show()

# Group action_count into quartiles.
df['action_count_quartile'] = df.action_count.map(lambda x: 0 if x == 0 else (
                                                            1 if x <= 17 else (
                                                            2 if x <= 43 else (
                                                            3 if x <= 97 else 4))))

df[df.age.notnull()].age.describe()

count    158681.000000
mean         47.145310
std         142.629468
min           1.000000
25%          28.000000
50%          33.000000
75%          42.000000
max        2014.000000
Name: age, dtype: float64

plt.hist(df[df.age <= 100].age, bins = 80)
plt.show()

No one is 2014 years old. If anyone is older than 80, let's bring their age down to 80...I'm sure some of them wouldn't mind that.

df.loc[df.age > 80, 'age'] = 80

df[df.age.notnull()].age.describe()

count    158681.000000
mean         36.756436
std          12.770211
min           1.000000
25%          28.000000
50%          33.000000
75%          42.000000
max          80.000000
Name: age, dtype: float64

Let's see if there is a feature that is correlated with age, to help find good values for the nulls

for feature in df.columns:
    if(df[feature].dtype == float or df[feature].dtype == int):
        correlation = stats.pearsonr(df[df.age.notnull()].age, df[df.age.notnull()][feature])
        print("Correlation with {} = {}".format(feature, correlation))

Correlation with action_count = (-0.064026876831618271, 8.935019331436594e-144)
Correlation with age = (1.0, 0.0)
Correlation with apple_device = (-0.11877768421316583, 0.0)
Correlation with avg_duration = (-0.015890959356089224, 2.4440110140227963e-10)
Correlation with desktop_device = (-0.04078523014721834, 2.1151097257467199e-59)
Correlation with duration = (-0.031522514371148634, 3.5056510866694804e-36)
Correlation with max_duration = (-0.046902158494160351, 5.5907076372683492e-78)
Correlation with min_duration = (-0.0061751882510133168, 0.013898465412299924)
Correlation with mobile_device = (-0.10935875994089284, 0.0)
Correlation with signup_flow = (-0.10945378061441423, 0.0)
Correlation with tablet_device = (0.032457538742075048, 2.94315571171229e-38)
Correlation with timestamp_first_active = (-0.10000080826874851, 0.0)
Correlation with unique_action_details = (-0.080635864696282103, 4.1759774725532945e-227)
Correlation with unique_action_types = (-0.087255839917087102, 1.0418634090552499e-265)
Correlation with unique_actions = (-0.073144851623982723, 3.9161188539179863e-187)
Correlation with unique_device_types = (-0.087245472405669544, 1.2041418624458218e-265)
Correlation with action_count_quartile = (-0.095963056084712631, 3.4732814902639632e-321)

Unfortunately, nothing is really correlated with age. Since there are too many missing values for age, I'm going to set the missing values equal to the median, 33.

# Create age_group before filling in the nulls, so that the distribution is not altered.
df['age_group'] = df.age.map(lambda x: 0 if math.isnan(x) else (
                                       1 if x < 18 else (
                                       2 if x <= 33 else (
                                       3 if x <= 42 else 4))))

df.age = df.age.fillna(33)

df.age.isnull().sum()

0

plt.figure(figsize=(8,4))
plt.hist(df.duration, bins = 100)
plt.title("Duration")
plt.xlabel("Duration")
plt.ylabel("Number of Users")
plt.yscale('log')
plt.show()

plt.figure(figsize=(8,4))
plt.hist(df.avg_duration, bins = 100)
plt.title("Average Duration")
plt.xlabel("Average Duration")
plt.ylabel("Number of Users")
plt.yscale('log')
plt.show()

print(df.duration.describe())
print()
print(df.avg_duration.describe())

count    2.755470e+05
mean     7.671002e+05
std      1.570985e+06
min      0.000000e+00
25%      0.000000e+00
50%      0.000000e+00
75%      8.771662e+05
max      3.851321e+07
Name: duration, dtype: float64

count    275547.000000
mean      13423.996460
std       29962.460601
min           0.000000
25%           0.000000
50%           0.000000
75%       16553.788947
max      933452.333333
Name: avg_duration, dtype: float64

print(np.percentile(df.duration, 50))
print(np.percentile(df.duration, 51))
print(np.percentile(df.duration, 75))
print()
print(np.percentile(df.avg_duration, 50))
print(np.percentile(df.avg_duration, 51))
print(np.percentile(df.avg_duration, 75))

0.0
678.46
877166.25

0.0
346.23
16553.7889474

# Divide users into 3 equal-ish groups
df['duration_group'] = df.duration.map(lambda x: 0 if x == 0 else (
                                                 1 if x <= 877166.25 else 2))

df['avg_duration_group'] = df.avg_duration.map(lambda x: 0 if x == 0 else (
                                                         1 if x <= 16553.7889474 else 2))

print(df.duration_group.value_counts())
print()
print(df.avg_duration_group.value_counts())

0    140064
2     68887
1     66596
Name: duration_group, dtype: int64

0    140064
2     68887
1     66596
Name: avg_duration_group, dtype: int64

There are too many unknowns to try to group them with a gender. I'm going to leave OTHER because it could/probably represent(s) people who do not identify as either male/female.

df.gender.value_counts()

-unknown-    129480
FEMALE        77524
MALE          68209
OTHER           334
Name: gender, dtype: int64

df.mobile_device.value_counts()

0.0    241225
1.0     34322
Name: mobile_device, dtype: int64

df.signup_flow.value_counts()

0     206092
25     29834
12     11244
3       8822
2       6881
23      6408
24      4328
1       1047
8        315
6        301
21       197
5         36
20        14
16        11
15        10
14         4
10         2
4          1
Name: signup_flow, dtype: int64

# If signup_flow == 0, signup_flow_simple == 0
# If signup_flow > 0, signup_flow_simple == 1
df['signup_flow_simple'] = df.signup_flow.map(lambda x: 0 if x == 0 else 1)

df['signup_flow_simple'].value_counts()

0    206092
1     69455
Name: signup_flow_simple, dtype: int64

df.tablet_device.value_counts()

0.0    262566
1.0     12981
Name: tablet_device, dtype: int64

# Convert dates to datetime for manipulation
df.date_account_created = pd.to_datetime(df.date_account_created, format='%Y-%m-%d')
df.date_first_booking = pd.to_datetime(df.date_first_booking, format='%Y-%m-%d')

# Check to make sure the date range makes sense.
print(df.date_account_created.min())
print(df.date_account_created.max())
print()
print(df.date_first_booking.min())
print(df.date_first_booking.max())

2010-01-01 00:00:00
2014-09-30 00:00:00

2010-01-02 00:00:00
2015-06-29 00:00:00

# calendar contains more years of information than we need.
calendar = USFederalHolidayCalendar()
# Set holidays equal to the holidays in our date range.
holidays = calendar.holidays(start = df.date_account_created.min(), 
                             end = df.date_first_booking.max())

# us_bd contains more years of information than we need.
us_bd = CustomBusinessDay(calendar = USFederalHolidayCalendar())
# Set business_days equal to the work days in our date range.
business_days = pd.DatetimeIndex(start = df.date_account_created.min(), 
                                 end = df.date_first_booking.max(), 
                                 freq = us_bd)

# Create date features
df['year_account_created'] = df.date_account_created.dt.year
df['month_account_created'] = df.date_account_created.dt.month
df['weekday_account_created'] = df.date_account_created.dt.weekday
df['business_day_account_created'] = df.date_account_created.isin(business_days)
df['business_day_account_created'] = df.business_day_account_created.map(lambda x: 1 if x == True else 0)
df['holiday_account_created'] = df.date_account_created.isin(holidays)
df['holiday_account_created'] = df.holiday_account_created.map(lambda x: 1 if x == True else 0)

df['year_first_booking'] = df.date_first_booking.dt.year
df['month_first_booking'] = df.date_first_booking.dt.month
df['weekday_first_booking'] = df.date_first_booking.dt.weekday
df['business_day_first_booking'] = df.date_first_booking.isin(business_days)
df['business_day_first_booking'] = df.business_day_first_booking.map(lambda x: 1 if x == True else 0)
df['holiday_first_booking'] = df.date_first_booking.isin(holidays)
df['holiday_first_booking'] = df.holiday_first_booking.map(lambda x: 1 if x == True else 0)

# Drop unneeded features
df = df.drop(["date_first_booking","date_account_created"], axis = 1)

df.isnull().sum()

action                               0
action_count                         0
action_detail                        0
action_type                          0
affiliate_channel                    0
affiliate_provider                   0
age                                  0
apple_device                         0
avg_duration                         0
country_destination              62096
desktop_device                       0
device_type                          0
duration                             0
first_affiliate_tracked           6085
first_browser                        0
first_device_type                    0
gender                               0
id                                   0
language                             0
max_duration                         0
min_duration                         0
mobile_device                        0
signup_app                           0
signup_flow                          0
signup_method                        0
tablet_device                        0
timestamp_first_active               0
unique_action_details                0
unique_action_types                  0
unique_actions                       0
unique_device_types                  0
action_count_quartile                0
age_group                            0
duration_group                       0
avg_duration_group                   0
signup_flow_simple                   0
year_account_created                 0
month_account_created                0
weekday_account_created              0
business_day_account_created         0
holiday_account_created              0
year_first_booking              186639
month_first_booking             186639
weekday_first_booking           186639
business_day_first_booking           0
holiday_first_booking                0
dtype: int64

# Set nulls values equal to one less than the minimum.
# I could set the nulls to 0, but the scale would be ugly when we normalize the features.
df.year_first_booking = df.year_first_booking.fillna(min(df.year_first_booking) - 1)
df.month_first_booking = df.month_first_booking.fillna(min(df.month_first_booking) - 1)
df.weekday_first_booking += 1
df.weekday_first_booking = df.weekday_first_booking.fillna(0)

df.isnull().sum()

action                              0
action_count                        0
action_detail                       0
action_type                         0
affiliate_channel                   0
affiliate_provider                  0
age                                 0
apple_device                        0
avg_duration                        0
country_destination             62096
desktop_device                      0
device_type                         0
duration                            0
first_affiliate_tracked          6085
first_browser                       0
first_device_type                   0
gender                              0
id                                  0
language                            0
max_duration                        0
min_duration                        0
mobile_device                       0
signup_app                          0
signup_flow                         0
signup_method                       0
tablet_device                       0
timestamp_first_active              0
unique_action_details               0
unique_action_types                 0
unique_actions                      0
unique_device_types                 0
action_count_quartile               0
age_group                           0
duration_group                      0
avg_duration_group                  0
signup_flow_simple                  0
year_account_created                0
month_account_created               0
weekday_account_created             0
business_day_account_created        0
holiday_account_created             0
year_first_booking                  0
month_first_booking                 0
weekday_first_booking               0
business_day_first_booking          0
holiday_first_booking               0
dtype: int64

df.first_affiliate_tracked.value_counts()

untracked        143181
linked            62064
omg               54859
tracked-other      6655
product            2353
marketing           281
local ops            69
Name: first_affiliate_tracked, dtype: int64

For the missing values for "first_affiliate_tracked" I am going to set these equal to "untracked". Not only is this the most common value, but it makes sense that if we are missing data on these people that they would not have been tracked.

df.first_affiliate_tracked = df.first_affiliate_tracked.fillna("untracked")

df.isnull().sum()

action                              0
action_count                        0
action_detail                       0
action_type                         0
affiliate_channel                   0
affiliate_provider                  0
age                                 0
apple_device                        0
avg_duration                        0
country_destination             62096
desktop_device                      0
device_type                         0
duration                            0
first_affiliate_tracked             0
first_browser                       0
first_device_type                   0
gender                              0
id                                  0
language                            0
max_duration                        0
min_duration                        0
mobile_device                       0
signup_app                          0
signup_flow                         0
signup_method                       0
tablet_device                       0
timestamp_first_active              0
unique_action_details               0
unique_action_types                 0
unique_actions                      0
unique_device_types                 0
action_count_quartile               0
age_group                           0
duration_group                      0
avg_duration_group                  0
signup_flow_simple                  0
year_account_created                0
month_account_created               0
weekday_account_created             0
business_day_account_created        0
holiday_account_created             0
year_first_booking                  0
month_first_booking                 0
weekday_first_booking               0
business_day_first_booking          0
holiday_first_booking               0
dtype: int64

Everything is all clean (the null values in 'country_destination' belong to the testing data). Now let's explore the categorical features that might have too many values and reduce that number before we do one-hot encoding.

df.head()

df.first_browser.value_counts()

Chrome                  78671
Safari                  53302
-unknown-               44394
Firefox                 38665
Mobile Safari           29636
IE                      24744
Chrome Mobile            3186
Android Browser          1577
AOL Explorer              254
Opera                     228
Silk                      172
IE Mobile                 118
BlackBerry Browser         89
Chromium                   83
Mobile Firefox             64
Maxthon                    60
Apple Mail                 45
Sogou Explorer             43
SiteKiosk                  27
RockMelt                   24
Iron                       24
Yandex.Browser             14
IceWeasel                  14
Pale Moon                  13
CometBird                  12
SeaMonkey                  12
Camino                      9
TenFourFox                  8
Opera Mini                  8
wOSBrowser                  7
CoolNovo                    6
Avant Browser               4
Opera Mobile                4
Mozilla                     3
Flock                       2
Comodo Dragon               2
SlimBrowser                 2
OmniWeb                     2
Crazy Browser               2
TheWorld Browser            2
IceDragon                   1
Conkeror                    1
Googlebot                   1
Kindle Browser              1
IBrowse                     1
Nintendo Browser            1
Outlook 2007                1
NetNewsWire                 1
Epic                        1
PS Vita browser             1
Google Earth                1
Palm Pre web browser        1
UC Browser                  1
Arora                       1
Stainless                   1
Name: first_browser, dtype: int64

# Create a new feature for those using mobile browsers
mobile_browsers = ['Mobile Safari','Chrome Mobile','IE Mobile','Mobile Firefox','Android Browser']
df.loc[df.first_browser.isin(mobile_browsers), "first_browser"] = "Mobile"

# The cut_off is set at 0.5% of the data. If a value is not common enough, it will be grouped into something generic.
cut_off = 1378

other_browsers = []
for browser, count in df.first_browser.value_counts().iteritems():
    if count < cut_off:
        other_browsers.append(browser)
   
df.loc[df.first_browser.isin(other_browsers), "first_browser"] = "Other"

print(other_browsers)

['AOL Explorer', 'Opera', 'Silk', 'BlackBerry Browser', 'Chromium', 'Maxthon', 'Apple Mail', 'Sogou Explorer', 'SiteKiosk', 'RockMelt', 'Iron', 'Yandex.Browser', 'IceWeasel', 'Pale Moon', 'CometBird', 'SeaMonkey', 'Camino', 'TenFourFox', 'Opera Mini', 'wOSBrowser', 'CoolNovo', 'Opera Mobile', 'Avant Browser', 'Mozilla', 'Crazy Browser', 'SlimBrowser', 'TheWorld Browser', 'Flock', 'Comodo Dragon', 'OmniWeb', 'Conkeror', 'IBrowse', 'Nintendo Browser', 'Arora', 'Stainless', 'IceDragon', 'Epic', 'Googlebot', 'Outlook 2007', 'NetNewsWire', 'Google Earth', 'Palm Pre web browser', 'PS Vita browser', 'Kindle Browser', 'UC Browser']

df.first_browser.value_counts()

Chrome       78671
Safari       53302
-unknown-    44394
Firefox      38665
Mobile       34581
IE           24744
Other         1190
Name: first_browser, dtype: int64

df.language.value_counts()

en           265538
zh             2634
fr             1508
es             1174
ko             1116
de              977
it              633
ru              508
ja              345
pt              322
sv              176
nl              134
tr               92
da               75
pl               75
no               51
cs               49
el               30
th               28
hu               25
id               23
fi               20
ca                6
is                5
hr                2
-unknown-         1
Name: language, dtype: int64

I think that language might be a more important feature than some others, so I will decrease the cut off to 275, or 0.1% of the data.

other_languages = []
for language, count in df.language.value_counts().iteritems():
    if count < 275:
        other_languages.append(language)
    
print(other_languages)

df.loc[df.language.isin(other_languages), "language"] = "Other"

['sv', 'nl', 'tr', 'da', 'pl', 'no', 'cs', 'el', 'th', 'hu', 'id', 'fi', 'ca', 'is', 'hr', '-unknown-']

df.language.value_counts()

en       265538
zh         2634
fr         1508
es         1174
ko         1116
de          977
Other       792
it          633
ru          508
ja          345
pt          322
Name: language, dtype: int64

# New feature for languages that are not English.
df['not_English'] = df.language.map(lambda x: 0 if x == 'en' else 1)

df.action.value_counts()

missing                                         140064
show                                             62664
search_results                                    9979
index                                             8054
create                                            6464
dashboard                                         5549
active                                            5198
update                                            5087
search                                            4962
requested                                         3156
authenticate                                      2294
edit                                              2284
personalize                                       2119
header_userpic                                    1861
ask_question                                      1753
ajax_refresh_subtotal                             1704
lookup                                            1158
identity                                          1012
message                                            709
cancellation_policies                              704
click                                              605
track_page_view                                    561
confirm_email                                      538
qt2                                                469
ajax_photo_widget_form_iframe                      445
reviews                                            425
ajax_check_dates                                   374
notifications                                      342
calendar_tab_inner2                                286
callback                                           283
                                                 ...  
recent_reservations                                  2
airbnb_picks                                         2
11                                                   2
apply                                                2
phone_verification_number_submitted_for_sms          2
my                                                   1
phone_verification_number_submitted_for_call         1
confirmation                                         1
book                                                 1
travel_plans_previous                                1
rate                                                 1
badge                                                1
top_destinations                                     1
show_personalize                                     1
spoken_languages                                     1
concierge                                            1
new_session                                          1
place_worth                                          1
other_hosting_reviews_first                          1
acculynk_pin_pad_inactive                            1
view                                                 1
ajax_price_and_availability                          1
clickthrough                                         1
google_importer                                      1
ajax_referral_banner_type                            1
photography                                          1
review_page                                          1
salute                                               1
home_safety_landing                                  1
requirements                                         1
Name: action, dtype: int64

other_actions = []
for action, count in df.action.value_counts().iteritems():
    if count < cut_off:
        other_actions.append(action)
    
print(other_actions)

df.loc[df.action.isin(other_actions), "action"] = "Other"

['lookup', 'identity', 'message', 'cancellation_policies', 'click', 'track_page_view', 'confirm_email', 'qt2', 'ajax_photo_widget_form_iframe', 'reviews', 'ajax_check_dates', 'notifications', 'calendar_tab_inner2', 'callback', 'message_to_host_focus', 'similar_listings', 'edit_verification', 'apply_reservation', 'ajax_get_referrals_amt', 'manage_listing', 'unavailabilities', 'payment_methods', 'impressions', 'collections', 'campaigns', 'tos_confirm', 'coupon_field_focus', 'faq_category', 'travel_plans_current', 'faq', 'similar_listings_v2', 'pending', 'complete_status', 'new', 'references', 'populate_help_dropdown', 'endpoint_error', 'available', 'set_password', 'agree_terms_check', 'apply_coupon_click', 'account', 'custom_recommended_destinations', 'status', 'kba_update', 'message_to_host_change', '10', 'reviews_new', 'login', 'referrer_status', 'at_checkpoint', 'populate_from_facebook', 'signup_login', 'decision_tree', 'tell_a_friend', 'hosting_social_proof', 'position', 'create_multiple', 'listings', 'settings', 'contact_new', 'this_hosting_reviews', 'jumio_token', 'ajax_image_upload', 'terms', 'kba', 'profile_pic', 'delete', 'ajax_lwlb_contact', 'coupon_code_click', 'facebook_auto_login', 'phone_verification_error', '12', 'department', 'issue', 'itinerary', 'ajax_statsd', 'glob', 'open_graph_setting', 'forgot_password', 'authorize', 'about_us', 'connect', 'privacy', 'payout_preferences', 'social_connections', 'patch', 'signup_modal', 'localization_settings', 'read_policy_click', 'apply_code', 'this_hosting_reviews_3000', 'request_new_confirm_email', 'signed_out_modal', 'payment_instruments', 'pending_tickets', 'update_cached', 'host_summary', 'reputation', 'login_modal', 'ajax_google_translate_description', 'verify', 'other_hosting_reviews', 'office_location', 'departments', 'set_user', '15', 'recommend', 'invalid_action', 'hospitality', 'remove_dashboard_alert', 'change_currency', 'cancellation_policy_click', 'signature', 'pay', 'my_reservations', 'recommendations', 'mobile_landing_page', 'ajax_referral_banner_experiment_type', 'handle_vanity_url', 'ajax_google_translate', 'change', 'recent_reservations', 'airbnb_picks', '11', 'apply', 'phone_verification_number_submitted_for_sms', 'my', 'phone_verification_number_submitted_for_call', 'confirmation', 'book', 'travel_plans_previous', 'rate', 'badge', 'top_destinations', 'show_personalize', 'spoken_languages', 'concierge', 'new_session', 'place_worth', 'other_hosting_reviews_first', 'acculynk_pin_pad_inactive', 'view', 'ajax_price_and_availability', 'clickthrough', 'google_importer', 'ajax_referral_banner_type', 'photography', 'review_page', 'salute', 'home_safety_landing', 'requirements']

df.action.value_counts()

missing                  140064
show                      62664
Other                     12355
search_results             9979
index                      8054
create                     6464
dashboard                  5549
active                     5198
update                     5087
search                     4962
requested                  3156
authenticate               2294
edit                       2284
personalize                2119
header_userpic             1861
ask_question               1753
ajax_refresh_subtotal      1704
Name: action, dtype: int64

df.action_detail.value_counts()

missing                        140069
-unknown-                       34464
p3                              29932
view_search_results             27361
user_profile                    11608
dashboard                        4652
update_listing                   4343
header_userpic                   2460
p5                               2392
create_user                      2008
message_thread                   1777
change_trip_characteristics      1514
contact_host                     1391
edit_profile                     1135
confirm_email_link                963
wishlist_content_update           952
message_post                      756
cancellation_policies             731
track_page_view                   588
login                             560
signup                            495
create_phone_numbers              459
lookup                            405
similar_listings                  372
list_your_space                   359
p1                                307
change_contact_host_dates         255
book_it                           253
unavailable_dates                 220
listing_reviews                   213
                                ...  
view_reservations                   7
view_listing                        7
set_password_page                   7
forgot_password                     7
read_policy_click                   6
signup_modal                        5
listing_recommendations             5
listing_descriptions                5
apply_coupon_click                  4
account_privacy_settings            4
previous_trips                      4
user_tax_forms                      3
your_reservations                   3
login_modal                         3
user_listings                       3
modify_reservations                 3
cancellation_policy_click           3
admin_templates                     2
profile_reviews                     2
translations                        2
change_or_alter                     2
oauth_login                         1
user_profile_content_update         1
complete_booking                    1
modify_users                        1
friends_wishlists                   1
airbnb_picks_wishlists              1
alteration_field                    1
host_home                           1
guest_receipt                       1
Name: action_detail, dtype: int64

other_action_details = []
for action_detail, count in df.action_detail.value_counts().iteritems():
    if count < cut_off:
        other_action_details.append(action_detail)
    
print(other_action_details)

df.loc[df.action_detail.isin(other_action_details), "action_detail"] = "Other"

['edit_profile', 'confirm_email_link', 'wishlist_content_update', 'message_post', 'cancellation_policies', 'track_page_view', 'login', 'signup', 'create_phone_numbers', 'lookup', 'similar_listings', 'list_your_space', 'p1', 'change_contact_host_dates', 'book_it', 'unavailable_dates', 'listing_reviews', 'message_to_host_focus', 'manage_listing', 'user_wishlists', 'oauth_response', 'account_notification_settings', 'apply_coupon', 'login_page', 'p4', 'update_listing_description', 'message_inbox', 'profile_verifications', 'your_trips', 'your_listings', 'trip_availability', 'update_user_profile', 'notifications', 'profile_references', 'create_listing', 'signup_login_page', 'wishlist', 'user_reviews', 'pending', 'message_to_host_change', 'reservations', 'instant_book', 'request_to_book', 'set_password', 'at_checkpoint', 'listing_reviews_page', 'coupon_field_focus', 'user_social_connections', 'update_user', 'terms_and_privacy', 'account_payment_methods', 'coupon_code_click', 'account_payout_preferences', 'guest_itinerary', 'view_reservations', 'view_listing', 'set_password_page', 'forgot_password', 'read_policy_click', 'signup_modal', 'listing_recommendations', 'listing_descriptions', 'apply_coupon_click', 'account_privacy_settings', 'previous_trips', 'user_tax_forms', 'your_reservations', 'login_modal', 'user_listings', 'modify_reservations', 'cancellation_policy_click', 'admin_templates', 'profile_reviews', 'translations', 'change_or_alter', 'oauth_login', 'user_profile_content_update', 'complete_booking', 'modify_users', 'friends_wishlists', 'airbnb_picks_wishlists', 'alteration_field', 'host_home', 'guest_receipt']

df.action_detail.value_counts()

missing                        140069
-unknown-                       34464
p3                              29932
view_search_results             27361
user_profile                    11608
Other                           11576
dashboard                        4652
update_listing                   4343
header_userpic                   2460
p5                               2392
create_user                      2008
message_thread                   1777
change_trip_characteristics      1514
contact_host                     1391
Name: action_detail, dtype: int64

df.action_type.value_counts()

missing             140070
view                 77695
-unknown-            17573
click                17307
data                 14198
submit                7056
message_post           990
track_page_view        293
partner_callback       132
booking_request        122
lookup                 109
modify                   2
Name: action_type, dtype: int64

other_action_types = []
for action_type, count in df.action_type.value_counts().iteritems():
    if count < 1378:
        other_action_types.append(action_type)
    
print(other_action_types)

df.loc[df.action_type.isin(other_action_types), "action_type"] = "Other"

['message_post', 'track_page_view', 'partner_callback', 'booking_request', 'lookup', 'modify']

df.action_type.value_counts()

missing      140070
view          77695
-unknown-     17573
click         17307
data          14198
submit         7056
Other          1648
Name: action_type, dtype: int64

df.affiliate_provider.value_counts()

direct                 181270
google                  65956
other                   13036
facebook                 3996
bing                     3719
craigslist               3475
padmapper                 836
vast                      830
yahoo                     653
facebook-open-graph       566
gsp                       455
meetup                    358
email-marketing           270
naver                      66
baidu                      32
yandex                     18
wayn                        8
daum                        3
Name: affiliate_provider, dtype: int64

other_affiliate_providers = []
for affiliate_provider, count in df.affiliate_provider.value_counts().iteritems():
    if count < cut_off:
        other_affiliate_providers.append(affiliate_provider)
    
print(other_affiliate_providers)

df.loc[df.affiliate_provider.isin(other_affiliate_providers), "affiliate_provider"] = "other"

['padmapper', 'vast', 'yahoo', 'facebook-open-graph', 'gsp', 'meetup', 'email-marketing', 'naver', 'baidu', 'yandex', 'wayn', 'daum']

df.affiliate_provider.value_counts()

direct        181270
google         65956
other          17131
facebook        3996
bing            3719
craigslist      3475
Name: affiliate_provider, dtype: int64

df.device_type.value_counts()

missing                             140064
Mac Desktop                          44271
Windows Desktop                      37221
iPhone                               26571
iPad Tablet                           8879
Android Phone                         7666
-unknown-                             5801
Android App Unknown Phone/Tablet      2634
Tablet                                1468
Linux Desktop                          428
Chromebook                             374
iPodtouch                               85
Windows Phone                           56
Blackberry                              27
Opera Phone                              2
Name: device_type, dtype: int64

other_device_types = []
for device_type, count in df.device_type.value_counts().iteritems():
    if count < 1378:
        other_device_types.append(device_type)
    
print(other_device_types)

df.loc[df.device_type.isin(other_device_types), "device_type"] = "Other"

['Linux Desktop', 'Chromebook', 'iPodtouch', 'Windows Phone', 'Blackberry', 'Opera Phone']

df.device_type.value_counts()

missing                             140064
Mac Desktop                          44271
Windows Desktop                      37221
iPhone                               26571
iPad Tablet                           8879
Android Phone                         7666
-unknown-                             5801
Android App Unknown Phone/Tablet      2634
Tablet                                1468
Other                                  972
Name: device_type, dtype: int64

df.signup_method.value_counts()

basic       198222
facebook     74864
google        2438
weibo           23
Name: signup_method, dtype: int64

# Create a new dataframe for the labels
labels = pd.DataFrame(df.country_destination)
df = df.drop("country_destination", axis = 1)

labels.head()

# Drop id since it is no longer needed.
df = df.drop('id', axis = 1)

# Group all features as either continuous (cont) or categorical (cat)
cont_features = []
cat_features = []

for feature in df.columns:
    if df[feature].dtype == float or df[feature].dtype == int:
        cont_features.append(feature)
    elif df[feature].dtype == object:
        cat_features.append(feature)

# Check to ensure that we have all of the features
print(cat_features)
print()
print(cont_features)
print()
print(len(cat_features) + len(cont_features))
print(df.shape[1])

['action', 'action_detail', 'action_type', 'affiliate_channel', 'affiliate_provider', 'device_type', 'first_affiliate_tracked', 'first_browser', 'first_device_type', 'gender', 'language', 'signup_app', 'signup_method']

['action_count', 'age', 'apple_device', 'avg_duration', 'desktop_device', 'duration', 'max_duration', 'min_duration', 'mobile_device', 'signup_flow', 'tablet_device', 'timestamp_first_active', 'unique_action_details', 'unique_action_types', 'unique_actions', 'unique_device_types', 'action_count_quartile', 'age_group', 'duration_group', 'avg_duration_group', 'signup_flow_simple', 'year_account_created', 'month_account_created', 'weekday_account_created', 'business_day_account_created', 'holiday_account_created', 'year_first_booking', 'month_first_booking', 'weekday_first_booking', 'business_day_first_booking', 'holiday_first_booking', 'not_English']

45
45

# Although dates have continuous values, they should be treated as categorical features.
date_features = ['year_account_created','month_account_created','weekday_account_created',
                      'year_first_booking','month_first_booking','weekday_first_booking']
for feature in date_features:
    cont_features.remove(feature)
    cat_features.append(feature)

for feature in cat_features:
    # Create dummies of each value of a categorical feature
    dummies = pd.get_dummies(df[feature], prefix = feature, drop_first = False)
    # Drop the unneeded feature
    df = df.drop(feature, axis = 1)
    df = pd.concat([df, dummies], axis=1)
    print("{} is complete".format(feature))

action is complete
action_detail is complete
action_type is complete
affiliate_channel is complete
affiliate_provider is complete
device_type is complete
first_affiliate_tracked is complete
first_browser is complete
first_device_type is complete
gender is complete
language is complete
signup_app is complete
signup_method is complete
year_account_created is complete
month_account_created is complete
weekday_account_created is complete
year_first_booking is complete
month_first_booking is complete
weekday_first_booking is complete

min_max_scaler = preprocessing.MinMaxScaler()
# Normalize the continuous features
for feature in cont_features:
    df.loc[:,feature] = min_max_scaler.fit_transform(df[feature])

//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/utils/validation.py:429: DataConversionWarning: Data with input dtype int64 was converted to float64 by MinMaxScaler.
  warnings.warn(msg, _DataConversionWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
//anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)

df.head()

# Split df into training and testing data
df_train = df[:len(train)]
df_test = df[len(train):]

# Shorten labels to length of the training data
y = labels[:len(train)]

# Create dummy features for each country
y_dummies = pd.get_dummies(y, drop_first = False)
y = pd.concat([y, y_dummies], axis=1)
y = y.drop("country_destination", axis = 1)
y.head()

print(df_train.shape)
print(df_test.shape)
print(y.shape)

(213451, 186)
(62096, 186)
(213451, 12)

# Take a look to see how common each country is.
train.country_destination.value_counts()

NDF      124543
US        62376
other     10094
FR         5023
IT         2835
GB         2324
ES         2249
CA         1428
DE         1061
NL          762
AU          539
PT          217
Name: country_destination, dtype: int64

# Find the order of the features
y.columns

Index(['country_destination_AU', 'country_destination_CA',
       'country_destination_DE', 'country_destination_ES',
       'country_destination_FR', 'country_destination_GB',
       'country_destination_IT', 'country_destination_NDF',
       'country_destination_NL', 'country_destination_PT',
       'country_destination_US', 'country_destination_other'],
      dtype='object')

Due to the imbalance in the data, we are going to set the sum of each feature equal to each other. This will help the neural network to train because it won't be biased to reducing the NDF errors since that would have the greatest effect in the cost function.

y[y.columns[0]] *= len(y)/539
y[y.columns[1]] *= len(y)/1428
y[y.columns[2]] *= len(y)/1061
y[y.columns[3]] *= len(y)/2249
y[y.columns[4]] *= len(y)/5023
y[y.columns[5]] *= len(y)/2324
y[y.columns[6]] *= len(y)/2835
y[y.columns[7]] *= len(y)/124543
y[y.columns[8]] *= len(y)/762
y[y.columns[9]] *= len(y)/217
y[y.columns[10]] *= len(y)/62376
y[y.columns[11]] *= len(y)/10094

# Check the sum of each feature
totals = []
for i in range(12):
    totals.append(sum(y[y.columns[i]]))
totals

[213450.99999999805,
 213451.00000000515,
 213451.00000000218,
 213451.00000000844,
 213450.99999998632,
 213450.99999999162,
 213451.00000000955,
 213450.99999959889,
 213450.99999999884,
 213451.00000000067,
 213451.00000015536,
 213451.00000003129]

x_train, x_test, y_train, y_test = train_test_split(df_train, y, test_size = 0.2, random_state = 2)

# Tensorflow needs the data in matrices
inputX = x_train.as_matrix()
inputY = y_train.as_matrix()
inputX_test = x_test.as_matrix()
inputY_test = y_test.as_matrix()

# Number of input nodes/number of features.
input_nodes = 186

# Multiplier maintains a fixed ratio of nodes between each layer.
mulitplier = 1.33

# Number of nodes in each hidden layer
hidden_nodes1 = 50
hidden_nodes2 = round(hidden_nodes1 * mulitplier)

# Percent of nodes to keep during dropout.
pkeep = tf.placeholder(tf.float32)

# The standard deviation when setting the values for the weights.
std = 1

#input
features = tf.placeholder(tf.float32, [None, input_nodes])

#layer 1
W1 = tf.Variable(tf.truncated_normal([input_nodes, hidden_nodes1], stddev = std))
b1 = tf.Variable(tf.zeros([hidden_nodes1]))
y1 = tf.nn.sigmoid(tf.matmul(features, W1) + b1)

#layer 2
W2 = tf.Variable(tf.truncated_normal([hidden_nodes1, hidden_nodes2], stddev = std))
b2 = tf.Variable(tf.zeros([hidden_nodes2]))
y2 = tf.nn.sigmoid(tf.matmul(y1, W2) + b2)
#y2 = tf.nn.dropout(y2, pkeep)

#layer 3
W3 = tf.Variable(tf.truncated_normal([hidden_nodes2, 12], stddev = std)) 
b3 = tf.Variable(tf.zeros([12]))
y3 = tf.nn.softmax(tf.matmul(y2, W3) + b3)

#output
predictions = y3
labels = tf.placeholder(tf.float32, [None, 12])

#Parameters
training_epochs = 3000
training_dropout = 0.6 # Not using dropout led to the best results.
display_step = 10
n_samples = inputY.shape[1]
batch = tf.Variable(0)

learning_rate = tf.train.exponential_decay(
  0.05,              #Base learning rate.
  batch,             #Current index into the dataset.
  len(inputX),       #Decay step.
  0.95,              #Decay rate.
  staircase=False)

Based on the evaluation method of the Kaggle competition, we are going to check the accuracy of the top prediction and the top 5 predictions for each user.

# Determine if the predictions are correct
correct_prediction = tf.equal(tf.argmax(predictions,1), tf.argmax(labels,1))
correct_top5 = tf.nn.in_top_k(predictions, tf.argmax(labels, 1), k = 5)

# Calculate the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
accuracy_top5 = tf.reduce_mean(tf.cast(correct_top5, tf.float32))

print('Accuracy function created.')

# Cross entropy
cross_entropy = -tf.reduce_sum(labels * tf.log(tf.clip_by_value(predictions,1e-10,1.0)))

# Training loss
loss = tf.reduce_mean(cross_entropy)

#We will optimize our model via AdamOptimizer
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

Accuracy function created.

#Initialize variables and tensorflow session
init = tf.initialize_all_variables()
session = tf.Session()
session.run(init)

WARNING:tensorflow:From <ipython-input-138-f19d41dba435>:2 in <module>.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.

accuracy_summary = [] #Record accuracy values for plot
accuracy_top5_summary = [] #Record accuracy values for plot
loss_summary = [] #Record cost values for plot

test_accuracy_summary = [] #Record accuracy values for plot
test_accuracy_top5_summary = [] #Record accuracy values for plot
test_loss_summary = [] #Record cost values for plot

init = tf.initialize_all_variables()

for i in range(training_epochs):  
    session.run([optimizer], 
                feed_dict={features: inputX, 
                           labels: inputY,
                           pkeep: training_dropout})

    # Display logs per epoch step
    if (i) % display_step == 0:
        train_accuracy, train_accuracy_top5, newLoss = session.run([accuracy,accuracy_top5,loss], 
                                                                   feed_dict={features: inputX, 
                                                                              labels: inputY,
                                                                              pkeep: training_dropout})
        print ("Epoch:", i,
               "Accuracy =", "{:.6f}".format(train_accuracy), 
               "Top 5 Accuracy =", "{:.6f}".format(train_accuracy_top5),
               "Loss = ", "{:.6f}".format(newLoss))
        accuracy_summary.append(train_accuracy)
        accuracy_top5_summary.append(train_accuracy_top5)
        loss_summary.append(newLoss)
        
        test_accuracy,test_accuracy_top5,test_newLoss = session.run([accuracy,accuracy_top5,loss], 
                                                              feed_dict={features: inputX_test, 
                                                                         labels: inputY_test,
                                                                         pkeep: 1})
        print ("Epoch:", i,
               "Test-Accuracy =", "{:.6f}".format(test_accuracy), 
               "Test-Top 5 Accuracy =", "{:.6f}".format(test_accuracy_top5),
               "Test-Loss = ", "{:.6f}".format(test_newLoss))
        test_accuracy_summary.append(test_accuracy)
        test_accuracy_top5_summary.append(test_accuracy_top5)
        test_loss_summary.append(test_newLoss)

print()
print ("Optimization Finished!")
training_accuracy, training_top5_accuracy = session.run([accuracy,accuracy_top5], 
                                feed_dict={features: inputX, labels: inputY, pkeep: training_dropout})
print ("Training Accuracy=", training_accuracy)
print ("Training Top 5 Accuracy=", training_top5_accuracy)
print()
testing_accuracy, testing_top5_accuracy = session.run([accuracy,accuracy_top5], 
                                                       feed_dict={features: inputX_test, 
                                                                  labels: inputY_test,
                                                                  pkeep: 1})
print ("Testing Accuracy=", testing_accuracy)
print ("Testing Top 5 Accuracy=", testing_top5_accuracy)

WARNING:tensorflow:From <ipython-input-139-6246d5175f43>:9 in <module>.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
Epoch: 0 Accuracy = 0.165900 Top 5 Accuracy = 0.509434 Loss =  9022430.000000
Epoch: 0 Test-Accuracy = 0.170059 Test-Top 5 Accuracy = 0.512403 Test-Loss =  2246018.000000
Epoch: 10 Accuracy = 0.521393 Top 5 Accuracy = 0.650404 Loss =  5307889.000000
Epoch: 10 Test-Accuracy = 0.521538 Test-Top 5 Accuracy = 0.651964 Test-Loss =  1341002.750000
Epoch: 20 Accuracy = 0.596656 Top 5 Accuracy = 0.671984 Loss =  4771457.000000
Epoch: 20 Test-Accuracy = 0.596917 Test-Top 5 Accuracy = 0.674334 Test-Loss =  1207018.000000
Epoch: 30 Accuracy = 0.682016 Top 5 Accuracy = 0.849947 Loss =  4566464.500000
Epoch: 30 Test-Accuracy = 0.682064 Test-Top 5 Accuracy = 0.850507 Test-Loss =  1164330.750000
Epoch: 40 Accuracy = 0.718014 Top 5 Accuracy = 0.891918 Loss =  4473507.500000
Epoch: 40 Test-Accuracy = 0.718067 Test-Top 5 Accuracy = 0.889579 Test-Loss =  1148240.000000
Epoch: 50 Accuracy = 0.637122 Top 5 Accuracy = 0.859428 Loss =  4422973.000000
Epoch: 50 Test-Accuracy = 0.636973 Test-Top 5 Accuracy = 0.857347 Test-Loss =  1146496.500000
Epoch: 60 Accuracy = 0.624139 Top 5 Accuracy = 0.827108 Loss =  4371764.000000
Epoch: 60 Test-Accuracy = 0.624160 Test-Top 5 Accuracy = 0.824553 Test-Loss =  1149996.125000
Epoch: 70 Accuracy = 0.622283 Top 5 Accuracy = 0.819747 Loss =  4315948.500000
Epoch: 70 Test-Accuracy = 0.621958 Test-Top 5 Accuracy = 0.816027 Test-Loss =  1153743.000000
Epoch: 80 Accuracy = 0.616708 Top 5 Accuracy = 0.802003 Loss =  4252883.000000
Epoch: 80 Test-Accuracy = 0.616313 Test-Top 5 Accuracy = 0.796960 Test-Loss =  1160309.500000
Epoch: 90 Accuracy = 0.618828 Top 5 Accuracy = 0.809944 Loss =  4182342.250000
Epoch: 90 Test-Accuracy = 0.618514 Test-Top 5 Accuracy = 0.802933 Test-Loss =  1168843.250000
Epoch: 100 Accuracy = 0.621639 Top 5 Accuracy = 0.821270 Loss =  4105650.500000
Epoch: 100 Test-Accuracy = 0.620646 Test-Top 5 Accuracy = 0.813380 Test-Loss =  1178433.125000
Epoch: 110 Accuracy = 0.618072 Top 5 Accuracy = 0.815905 Loss =  4027599.500000
Epoch: 110 Test-Accuracy = 0.615774 Test-Top 5 Accuracy = 0.808109 Test-Loss =  1189457.125000
Epoch: 120 Accuracy = 0.618728 Top 5 Accuracy = 0.820116 Loss =  3952207.500000
Epoch: 120 Test-Accuracy = 0.616242 Test-Top 5 Accuracy = 0.811178 Test-Loss =  1201650.625000
Epoch: 130 Accuracy = 0.619079 Top 5 Accuracy = 0.828162 Loss =  3886280.000000
Epoch: 130 Test-Accuracy = 0.616968 Test-Top 5 Accuracy = 0.818931 Test-Loss =  1216675.000000
Epoch: 140 Accuracy = 0.617164 Top 5 Accuracy = 0.826780 Loss =  3831829.250000
Epoch: 140 Test-Accuracy = 0.614603 Test-Top 5 Accuracy = 0.815793 Test-Loss =  1230044.000000
Epoch: 150 Accuracy = 0.618945 Top 5 Accuracy = 0.835073 Loss =  3784665.000000
Epoch: 150 Test-Accuracy = 0.614884 Test-Top 5 Accuracy = 0.823382 Test-Loss =  1247714.000000
Epoch: 160 Accuracy = 0.618570 Top 5 Accuracy = 0.835758 Loss =  3740164.250000
Epoch: 160 Test-Accuracy = 0.613806 Test-Top 5 Accuracy = 0.823733 Test-Loss =  1259344.750000
Epoch: 170 Accuracy = 0.619548 Top 5 Accuracy = 0.841427 Loss =  3696753.000000
Epoch: 170 Test-Accuracy = 0.614064 Test-Top 5 Accuracy = 0.829847 Test-Loss =  1275993.500000
Epoch: 180 Accuracy = 0.619319 Top 5 Accuracy = 0.842211 Loss =  3655125.000000
Epoch: 180 Test-Accuracy = 0.613127 Test-Top 5 Accuracy = 0.829730 Test-Loss =  1289048.625000
Epoch: 190 Accuracy = 0.618939 Top 5 Accuracy = 0.845379 Loss =  3613836.500000
Epoch: 190 Test-Accuracy = 0.612518 Test-Top 5 Accuracy = 0.832049 Test-Loss =  1302832.875000
Epoch: 200 Accuracy = 0.619565 Top 5 Accuracy = 0.852149 Loss =  3573714.000000
Epoch: 200 Test-Accuracy = 0.613408 Test-Top 5 Accuracy = 0.838748 Test-Loss =  1317959.250000
Epoch: 210 Accuracy = 0.618295 Top 5 Accuracy = 0.849701 Loss =  3533467.500000
Epoch: 210 Test-Accuracy = 0.611112 Test-Top 5 Accuracy = 0.835797 Test-Loss =  1332102.000000
Epoch: 220 Accuracy = 0.619940 Top 5 Accuracy = 0.859575 Loss =  3494618.250000
Epoch: 220 Test-Accuracy = 0.613244 Test-Top 5 Accuracy = 0.846338 Test-Loss =  1346006.500000
Epoch: 230 Accuracy = 0.619232 Top 5 Accuracy = 0.855686 Loss =  3456234.000000
Epoch: 230 Test-Accuracy = 0.611347 Test-Top 5 Accuracy = 0.843175 Test-Loss =  1356915.125000
Epoch: 240 Accuracy = 0.619337 Top 5 Accuracy = 0.855967 Loss =  3419809.000000
Epoch: 240 Test-Accuracy = 0.611112 Test-Top 5 Accuracy = 0.843316 Test-Loss =  1371355.750000
Epoch: 250 Accuracy = 0.620186 Top 5 Accuracy = 0.861554 Loss =  3388884.500000
Epoch: 250 Test-Accuracy = 0.611511 Test-Top 5 Accuracy = 0.849336 Test-Loss =  1388860.750000
Epoch: 260 Accuracy = 0.605821 Top 5 Accuracy = 0.819120 Loss =  3545866.000000
Epoch: 260 Test-Accuracy = 0.599260 Test-Top 5 Accuracy = 0.804877 Test-Loss =  1530484.250000
Epoch: 270 Accuracy = 0.668781 Top 5 Accuracy = 0.948870 Loss =  3421295.000000
Epoch: 270 Test-Accuracy = 0.660748 Test-Top 5 Accuracy = 0.935420 Test-Loss =  1489959.000000
Epoch: 280 Accuracy = 0.609399 Top 5 Accuracy = 0.813065 Loss =  3357680.250000
Epoch: 280 Test-Accuracy = 0.600899 Test-Top 5 Accuracy = 0.798974 Test-Loss =  1430165.750000
Epoch: 290 Accuracy = 0.635617 Top 5 Accuracy = 0.907361 Loss =  3300976.500000
Epoch: 290 Test-Accuracy = 0.625893 Test-Top 5 Accuracy = 0.892553 Test-Loss =  1440139.250000
Epoch: 300 Accuracy = 0.616286 Top 5 Accuracy = 0.846791 Loss =  3264129.750000
Epoch: 300 Test-Accuracy = 0.605420 Test-Top 5 Accuracy = 0.831955 Test-Loss =  1438067.625000
Epoch: 310 Accuracy = 0.626446 Top 5 Accuracy = 0.886712 Loss =  3236772.500000
Epoch: 310 Test-Accuracy = 0.614931 Test-Top 5 Accuracy = 0.871074 Test-Loss =  1444352.750000
Epoch: 320 Accuracy = 0.622944 Top 5 Accuracy = 0.874268 Loss =  3211510.000000
Epoch: 320 Test-Accuracy = 0.611300 Test-Top 5 Accuracy = 0.857862 Test-Loss =  1459722.125000
Epoch: 330 Accuracy = 0.623729 Top 5 Accuracy = 0.877196 Loss =  3188301.750000
Epoch: 330 Test-Accuracy = 0.611394 Test-Top 5 Accuracy = 0.861423 Test-Loss =  1469061.375000
Epoch: 340 Accuracy = 0.625363 Top 5 Accuracy = 0.883855 Loss =  3166312.000000
Epoch: 340 Test-Accuracy = 0.612776 Test-Top 5 Accuracy = 0.867771 Test-Loss =  1480894.250000
Epoch: 350 Accuracy = 0.624391 Top 5 Accuracy = 0.881776 Loss =  3145008.500000
Epoch: 350 Test-Accuracy = 0.612237 Test-Top 5 Accuracy = 0.864304 Test-Loss =  1491645.125000
Epoch: 360 Accuracy = 0.625281 Top 5 Accuracy = 0.885254 Loss =  3124510.000000
Epoch: 360 Test-Accuracy = 0.612588 Test-Top 5 Accuracy = 0.867700 Test-Loss =  1503555.000000
Epoch: 370 Accuracy = 0.625796 Top 5 Accuracy = 0.887520 Loss =  3104588.500000
Epoch: 370 Test-Accuracy = 0.613103 Test-Top 5 Accuracy = 0.869668 Test-Loss =  1516310.125000
Epoch: 380 Accuracy = 0.625873 Top 5 Accuracy = 0.887743 Loss =  3085142.000000
Epoch: 380 Test-Accuracy = 0.612658 Test-Top 5 Accuracy = 0.869223 Test-Loss =  1529080.250000
Epoch: 390 Accuracy = 0.626312 Top 5 Accuracy = 0.890337 Loss =  3065810.750000
Epoch: 390 Test-Accuracy = 0.613010 Test-Top 5 Accuracy = 0.871542 Test-Loss =  1541329.000000
Epoch: 400 Accuracy = 0.626763 Top 5 Accuracy = 0.891825 Loss =  3046702.250000
Epoch: 400 Test-Accuracy = 0.612752 Test-Top 5 Accuracy = 0.872713 Test-Loss =  1553015.000000
Epoch: 410 Accuracy = 0.627044 Top 5 Accuracy = 0.893172 Loss =  3027404.000000
Epoch: 410 Test-Accuracy = 0.613455 Test-Top 5 Accuracy = 0.873814 Test-Loss =  1564915.250000
Epoch: 420 Accuracy = 0.627506 Top 5 Accuracy = 0.894460 Loss =  3008180.000000
Epoch: 420 Test-Accuracy = 0.613174 Test-Top 5 Accuracy = 0.875524 Test-Loss =  1576596.500000
Epoch: 430 Accuracy = 0.628104 Top 5 Accuracy = 0.895953 Loss =  2989394.000000
Epoch: 430 Test-Accuracy = 0.613127 Test-Top 5 Accuracy = 0.877047 Test-Loss =  1588426.000000
Epoch: 440 Accuracy = 0.628397 Top 5 Accuracy = 0.897488 Loss =  2971131.000000
Epoch: 440 Test-Accuracy = 0.613572 Test-Top 5 Accuracy = 0.878452 Test-Loss =  1599828.250000
Epoch: 450 Accuracy = 0.628455 Top 5 Accuracy = 0.900023 Loss =  2979848.000000
Epoch: 450 Test-Accuracy = 0.612916 Test-Top 5 Accuracy = 0.881052 Test-Loss =  1611988.500000
Epoch: 460 Accuracy = 0.630212 Top 5 Accuracy = 0.904287 Loss =  2936936.750000
Epoch: 460 Test-Accuracy = 0.614204 Test-Top 5 Accuracy = 0.884425 Test-Loss =  1620108.375000
Epoch: 470 Accuracy = 0.630259 Top 5 Accuracy = 0.903244 Loss =  2924608.750000
Epoch: 470 Test-Accuracy = 0.613408 Test-Top 5 Accuracy = 0.883980 Test-Loss =  1630292.500000
Epoch: 480 Accuracy = 0.629416 Top 5 Accuracy = 0.902477 Loss =  2905937.000000
Epoch: 480 Test-Accuracy = 0.612752 Test-Top 5 Accuracy = 0.882622 Test-Loss =  1641145.000000
Epoch: 490 Accuracy = 0.629913 Top 5 Accuracy = 0.903496 Loss =  2888264.750000
Epoch: 490 Test-Accuracy = 0.612822 Test-Top 5 Accuracy = 0.883114 Test-Loss =  1652784.000000
Epoch: 500 Accuracy = 0.630130 Top 5 Accuracy = 0.905101 Loss =  2872441.500000
Epoch: 500 Test-Accuracy = 0.613267 Test-Top 5 Accuracy = 0.884941 Test-Loss =  1664442.250000
Epoch: 510 Accuracy = 0.630423 Top 5 Accuracy = 0.906453 Loss =  2857422.500000
Epoch: 510 Test-Accuracy = 0.613642 Test-Top 5 Accuracy = 0.886135 Test-Loss =  1675863.750000
Epoch: 520 Accuracy = 0.630780 Top 5 Accuracy = 0.907437 Loss =  2842949.250000
Epoch: 520 Test-Accuracy = 0.613759 Test-Top 5 Accuracy = 0.887049 Test-Loss =  1687370.625000
Epoch: 530 Accuracy = 0.631008 Top 5 Accuracy = 0.908158 Loss =  2828546.500000
Epoch: 530 Test-Accuracy = 0.613876 Test-Top 5 Accuracy = 0.887634 Test-Loss =  1698580.375000
Epoch: 540 Accuracy = 0.631155 Top 5 Accuracy = 0.909809 Loss =  2814355.000000
Epoch: 540 Test-Accuracy = 0.613549 Test-Top 5 Accuracy = 0.888829 Test-Loss =  1709835.875000
Epoch: 550 Accuracy = 0.631518 Top 5 Accuracy = 0.911267 Loss =  2800691.750000
Epoch: 550 Test-Accuracy = 0.613712 Test-Top 5 Accuracy = 0.890445 Test-Loss =  1720924.250000
Epoch: 560 Accuracy = 0.631940 Top 5 Accuracy = 0.912122 Loss =  2787335.750000
Epoch: 560 Test-Accuracy = 0.614181 Test-Top 5 Accuracy = 0.891898 Test-Loss =  1731755.875000
Epoch: 570 Accuracy = 0.632197 Top 5 Accuracy = 0.913797 Loss =  2774265.750000
Epoch: 570 Test-Accuracy = 0.614368 Test-Top 5 Accuracy = 0.893045 Test-Loss =  1742361.250000
Epoch: 580 Accuracy = 0.632420 Top 5 Accuracy = 0.914775 Loss =  2761413.500000
Epoch: 580 Test-Accuracy = 0.614415 Test-Top 5 Accuracy = 0.893467 Test-Loss =  1753143.250000
Epoch: 590 Accuracy = 0.632754 Top 5 Accuracy = 0.915923 Loss =  2751627.500000
Epoch: 590 Test-Accuracy = 0.614603 Test-Top 5 Accuracy = 0.895060 Test-Loss =  1765984.125000
Epoch: 600 Accuracy = 0.632894 Top 5 Accuracy = 0.916837 Loss =  2748743.500000
Epoch: 600 Test-Accuracy = 0.614696 Test-Top 5 Accuracy = 0.894568 Test-Loss =  1775142.500000
Epoch: 610 Accuracy = 0.633445 Top 5 Accuracy = 0.918189 Loss =  2729975.000000
Epoch: 610 Test-Accuracy = 0.614649 Test-Top 5 Accuracy = 0.896044 Test-Loss =  1782635.500000
Epoch: 620 Accuracy = 0.633632 Top 5 Accuracy = 0.918839 Loss =  2715619.500000
Epoch: 620 Test-Accuracy = 0.614860 Test-Top 5 Accuracy = 0.896840 Test-Loss =  1796384.750000
Epoch: 630 Accuracy = 0.634264 Top 5 Accuracy = 0.920110 Loss =  2702567.500000
Epoch: 630 Test-Accuracy = 0.614837 Test-Top 5 Accuracy = 0.897309 Test-Loss =  1808805.250000
Epoch: 640 Accuracy = 0.634505 Top 5 Accuracy = 0.921287 Loss =  2691456.000000
Epoch: 640 Test-Accuracy = 0.615165 Test-Top 5 Accuracy = 0.898246 Test-Loss =  1822867.625000
Epoch: 650 Accuracy = 0.634598 Top 5 Accuracy = 0.922072 Loss =  2681144.750000
Epoch: 650 Test-Accuracy = 0.615235 Test-Top 5 Accuracy = 0.898925 Test-Loss =  1836016.250000
Epoch: 660 Accuracy = 0.634791 Top 5 Accuracy = 0.923208 Loss =  2670648.750000
Epoch: 660 Test-Accuracy = 0.615118 Test-Top 5 Accuracy = 0.899417 Test-Loss =  1848357.250000
Epoch: 670 Accuracy = 0.635078 Top 5 Accuracy = 0.924350 Loss =  2660500.000000
Epoch: 670 Test-Accuracy = 0.615048 Test-Top 5 Accuracy = 0.900424 Test-Loss =  1860843.750000
Epoch: 680 Accuracy = 0.635529 Top 5 Accuracy = 0.925346 Loss =  2650398.250000
Epoch: 680 Test-Accuracy = 0.614977 Test-Top 5 Accuracy = 0.901010 Test-Loss =  1874202.125000
Epoch: 690 Accuracy = 0.635623 Top 5 Accuracy = 0.926306 Loss =  2640422.250000
Epoch: 690 Test-Accuracy = 0.615048 Test-Top 5 Accuracy = 0.902743 Test-Loss =  1887367.750000
Epoch: 700 Accuracy = 0.635676 Top 5 Accuracy = 0.927266 Loss =  2630415.250000
Epoch: 700 Test-Accuracy = 0.614486 Test-Top 5 Accuracy = 0.903516 Test-Loss =  1900982.250000
Epoch: 710 Accuracy = 0.635775 Top 5 Accuracy = 0.928145 Loss =  2620514.500000
Epoch: 710 Test-Accuracy = 0.614439 Test-Top 5 Accuracy = 0.904851 Test-Loss =  1914457.500000
Epoch: 720 Accuracy = 0.636045 Top 5 Accuracy = 0.929140 Loss =  2610700.500000
Epoch: 720 Test-Accuracy = 0.613712 Test-Top 5 Accuracy = 0.906327 Test-Loss =  1928652.500000
Epoch: 730 Accuracy = 0.636121 Top 5 Accuracy = 0.929913 Loss =  2601018.000000
Epoch: 730 Test-Accuracy = 0.613619 Test-Top 5 Accuracy = 0.907803 Test-Loss =  1943249.000000
Epoch: 740 Accuracy = 0.636285 Top 5 Accuracy = 0.931225 Loss =  2597511.500000
Epoch: 740 Test-Accuracy = 0.613057 Test-Top 5 Accuracy = 0.908786 Test-Loss =  1958998.500000
Epoch: 750 Accuracy = 0.636086 Top 5 Accuracy = 0.932748 Loss =  2589274.000000
Epoch: 750 Test-Accuracy = 0.614111 Test-Top 5 Accuracy = 0.911129 Test-Loss =  1975408.000000
Epoch: 760 Accuracy = 0.636437 Top 5 Accuracy = 0.932449 Loss =  2576093.500000
Epoch: 760 Test-Accuracy = 0.612916 Test-Top 5 Accuracy = 0.911433 Test-Loss =  1986345.750000
Epoch: 770 Accuracy = 0.636706 Top 5 Accuracy = 0.933796 Loss =  2566251.750000
Epoch: 770 Test-Accuracy = 0.612963 Test-Top 5 Accuracy = 0.912253 Test-Loss =  1999974.250000
Epoch: 780 Accuracy = 0.636747 Top 5 Accuracy = 0.935026 Loss =  2556964.500000
Epoch: 780 Test-Accuracy = 0.613338 Test-Top 5 Accuracy = 0.913963 Test-Loss =  2014101.000000
Epoch: 790 Accuracy = 0.636906 Top 5 Accuracy = 0.935793 Loss =  2548164.250000
Epoch: 790 Test-Accuracy = 0.612893 Test-Top 5 Accuracy = 0.914502 Test-Loss =  2030396.875000
Epoch: 800 Accuracy = 0.636841 Top 5 Accuracy = 0.936373 Loss =  2539894.500000
Epoch: 800 Test-Accuracy = 0.612846 Test-Top 5 Accuracy = 0.915041 Test-Loss =  2041358.750000
Epoch: 810 Accuracy = 0.636964 Top 5 Accuracy = 0.937245 Loss =  2531572.500000
Epoch: 810 Test-Accuracy = 0.612822 Test-Top 5 Accuracy = 0.916071 Test-Loss =  2055304.375000
Epoch: 820 Accuracy = 0.636952 Top 5 Accuracy = 0.938253 Loss =  2523969.500000
Epoch: 820 Test-Accuracy = 0.612776 Test-Top 5 Accuracy = 0.917406 Test-Loss =  2068028.250000
Epoch: 830 Accuracy = 0.637023 Top 5 Accuracy = 0.939359 Loss =  2520860.000000
Epoch: 830 Test-Accuracy = 0.612939 Test-Top 5 Accuracy = 0.918976 Test-Loss =  2083729.500000
Epoch: 840 Accuracy = 0.637239 Top 5 Accuracy = 0.939547 Loss =  2510291.750000
Epoch: 840 Test-Accuracy = 0.613174 Test-Top 5 Accuracy = 0.918156 Test-Loss =  2093131.375000
Epoch: 850 Accuracy = 0.637403 Top 5 Accuracy = 0.940882 Loss =  2501326.250000
Epoch: 850 Test-Accuracy = 0.613221 Test-Top 5 Accuracy = 0.919796 Test-Loss =  2105081.750000
Epoch: 860 Accuracy = 0.637450 Top 5 Accuracy = 0.942041 Loss =  2493963.750000
Epoch: 860 Test-Accuracy = 0.613150 Test-Top 5 Accuracy = 0.920522 Test-Loss =  2120399.750000
Epoch: 870 Accuracy = 0.637362 Top 5 Accuracy = 0.942885 Loss =  2486478.000000
Epoch: 870 Test-Accuracy = 0.612776 Test-Top 5 Accuracy = 0.920873 Test-Loss =  2134351.750000
Epoch: 880 Accuracy = 0.637614 Top 5 Accuracy = 0.943623 Loss =  2479670.750000
Epoch: 880 Test-Accuracy = 0.612893 Test-Top 5 Accuracy = 0.921904 Test-Loss =  2146379.750000
Epoch: 890 Accuracy = 0.637749 Top 5 Accuracy = 0.943775 Loss =  2475710.500000
Epoch: 890 Test-Accuracy = 0.612494 Test-Top 5 Accuracy = 0.921904 Test-Loss =  2160966.000000
Epoch: 900 Accuracy = 0.637854 Top 5 Accuracy = 0.944934 Loss =  2464939.500000
Epoch: 900 Test-Accuracy = 0.612776 Test-Top 5 Accuracy = 0.923145 Test-Loss =  2169376.500000
Epoch: 910 Accuracy = 0.638153 Top 5 Accuracy = 0.946381 Loss =  2459572.500000
Epoch: 910 Test-Accuracy = 0.613127 Test-Top 5 Accuracy = 0.924645 Test-Loss =  2182095.500000
Epoch: 920 Accuracy = 0.638358 Top 5 Accuracy = 0.947084 Loss =  2450774.250000
Epoch: 920 Test-Accuracy = 0.612705 Test-Top 5 Accuracy = 0.924363 Test-Loss =  2194508.750000
Epoch: 930 Accuracy = 0.638575 Top 5 Accuracy = 0.947740 Loss =  2443714.500000
Epoch: 930 Test-Accuracy = 0.612541 Test-Top 5 Accuracy = 0.924949 Test-Loss =  2206397.000000
Epoch: 940 Accuracy = 0.638785 Top 5 Accuracy = 0.948091 Loss =  2439277.750000
Epoch: 940 Test-Accuracy = 0.612635 Test-Top 5 Accuracy = 0.925066 Test-Loss =  2218437.000000
Epoch: 950 Accuracy = 0.638797 Top 5 Accuracy = 0.948975 Loss =  2431477.000000
Epoch: 950 Test-Accuracy = 0.612939 Test-Top 5 Accuracy = 0.926144 Test-Loss =  2228784.000000
Epoch: 960 Accuracy = 0.638920 Top 5 Accuracy = 0.949695 Loss =  2424252.500000
Epoch: 960 Test-Accuracy = 0.612939 Test-Top 5 Accuracy = 0.926472 Test-Loss =  2247752.250000
Epoch: 970 Accuracy = 0.639189 Top 5 Accuracy = 0.950814 Loss =  2417390.000000
Epoch: 970 Test-Accuracy = 0.613314 Test-Top 5 Accuracy = 0.927198 Test-Loss =  2246546.250000
Epoch: 980 Accuracy = 0.639189 Top 5 Accuracy = 0.951042 Loss =  2409612.500000
Epoch: 980 Test-Accuracy = 0.612893 Test-Top 5 Accuracy = 0.927174 Test-Loss =  2267358.000000
Epoch: 990 Accuracy = 0.639189 Top 5 Accuracy = 0.951534 Loss =  2402521.750000
Epoch: 990 Test-Accuracy = 0.612869 Test-Top 5 Accuracy = 0.927596 Test-Loss =  2278736.250000
Epoch: 1000 Accuracy = 0.639418 Top 5 Accuracy = 0.952097 Loss =  2395399.000000
Epoch: 1000 Test-Accuracy = 0.612822 Test-Top 5 Accuracy = 0.927854 Test-Loss =  2290171.500000
Epoch: 1010 Accuracy = 0.639635 Top 5 Accuracy = 0.952770 Loss =  2389387.750000
Epoch: 1010 Test-Accuracy = 0.613291 Test-Top 5 Accuracy = 0.927877 Test-Loss =  2303861.500000
Epoch: 1020 Accuracy = 0.639582 Top 5 Accuracy = 0.952840 Loss =  2389624.000000
Epoch: 1020 Test-Accuracy = 0.613712 Test-Top 5 Accuracy = 0.928837 Test-Loss =  2322570.000000
Epoch: 1030 Accuracy = 0.639430 Top 5 Accuracy = 0.952460 Loss =  2379928.500000
Epoch: 1030 Test-Accuracy = 0.613197 Test-Top 5 Accuracy = 0.928205 Test-Loss =  2327114.000000
Epoch: 1040 Accuracy = 0.640103 Top 5 Accuracy = 0.954170 Loss =  2371105.750000
Epoch: 1040 Test-Accuracy = 0.613783 Test-Top 5 Accuracy = 0.928791 Test-Loss =  2338195.000000
Epoch: 1050 Accuracy = 0.639998 Top 5 Accuracy = 0.953578 Loss =  2362986.250000
Epoch: 1050 Test-Accuracy = 0.613150 Test-Top 5 Accuracy = 0.928322 Test-Loss =  2357343.750000
Epoch: 1060 Accuracy = 0.640326 Top 5 Accuracy = 0.954498 Loss =  2356899.250000
Epoch: 1060 Test-Accuracy = 0.613361 Test-Top 5 Accuracy = 0.929470 Test-Loss =  2363334.000000
Epoch: 1070 Accuracy = 0.640607 Top 5 Accuracy = 0.955071 Loss =  2351281.750000
Epoch: 1070 Test-Accuracy = 0.613549 Test-Top 5 Accuracy = 0.930337 Test-Loss =  2374273.500000
Epoch: 1080 Accuracy = 0.640976 Top 5 Accuracy = 0.955030 Loss =  2346180.000000
Epoch: 1080 Test-Accuracy = 0.613385 Test-Top 5 Accuracy = 0.929985 Test-Loss =  2390890.000000
Epoch: 1090 Accuracy = 0.641544 Top 5 Accuracy = 0.956126 Loss =  2338536.000000
Epoch: 1090 Test-Accuracy = 0.613923 Test-Top 5 Accuracy = 0.931648 Test-Loss =  2400878.000000
Epoch: 1100 Accuracy = 0.641725 Top 5 Accuracy = 0.956190 Loss =  2332444.000000
Epoch: 1100 Test-Accuracy = 0.614181 Test-Top 5 Accuracy = 0.931250 Test-Loss =  2420190.250000
Epoch: 1110 Accuracy = 0.641503 Top 5 Accuracy = 0.956366 Loss =  2335543.000000
Epoch: 1110 Test-Accuracy = 0.614064 Test-Top 5 Accuracy = 0.931274 Test-Loss =  2441918.500000
Epoch: 1120 Accuracy = 0.641930 Top 5 Accuracy = 0.956723 Loss =  2326694.750000
Epoch: 1120 Test-Accuracy = 0.614275 Test-Top 5 Accuracy = 0.931883 Test-Loss =  2441634.000000
Epoch: 1130 Accuracy = 0.642293 Top 5 Accuracy = 0.957484 Loss =  2318511.000000
Epoch: 1130 Test-Accuracy = 0.614626 Test-Top 5 Accuracy = 0.932492 Test-Loss =  2460612.000000
Epoch: 1140 Accuracy = 0.642340 Top 5 Accuracy = 0.957525 Loss =  2311091.750000
Epoch: 1140 Test-Accuracy = 0.614603 Test-Top 5 Accuracy = 0.931812 Test-Loss =  2470264.250000
Epoch: 1150 Accuracy = 0.642814 Top 5 Accuracy = 0.958082 Loss =  2304447.750000
Epoch: 1150 Test-Accuracy = 0.615024 Test-Top 5 Accuracy = 0.932820 Test-Loss =  2483530.250000
Epoch: 1160 Accuracy = 0.642885 Top 5 Accuracy = 0.958263 Loss =  2298872.000000
Epoch: 1160 Test-Accuracy = 0.614907 Test-Top 5 Accuracy = 0.933101 Test-Loss =  2501695.500000
Epoch: 1170 Accuracy = 0.643265 Top 5 Accuracy = 0.958837 Loss =  2293050.000000
Epoch: 1170 Test-Accuracy = 0.615259 Test-Top 5 Accuracy = 0.933944 Test-Loss =  2508715.750000
Epoch: 1180 Accuracy = 0.643576 Top 5 Accuracy = 0.959112 Loss =  2287656.750000
Epoch: 1180 Test-Accuracy = 0.615352 Test-Top 5 Accuracy = 0.934530 Test-Loss =  2521155.000000
Epoch: 1190 Accuracy = 0.643359 Top 5 Accuracy = 0.959510 Loss =  2288258.750000
Epoch: 1190 Test-Accuracy = 0.615376 Test-Top 5 Accuracy = 0.934998 Test-Loss =  2528600.000000
Epoch: 1200 Accuracy = 0.643892 Top 5 Accuracy = 0.960928 Loss =  2280958.500000
Epoch: 1200 Test-Accuracy = 0.615750 Test-Top 5 Accuracy = 0.936427 Test-Loss =  2541103.750000
Epoch: 1210 Accuracy = 0.643822 Top 5 Accuracy = 0.959792 Loss =  2273768.000000
Epoch: 1210 Test-Accuracy = 0.615469 Test-Top 5 Accuracy = 0.934951 Test-Loss =  2559193.500000
Epoch: 1220 Accuracy = 0.644179 Top 5 Accuracy = 0.960781 Loss =  2266861.000000
Epoch: 1220 Test-Accuracy = 0.615118 Test-Top 5 Accuracy = 0.936450 Test-Loss =  2569362.750000
Epoch: 1230 Accuracy = 0.644296 Top 5 Accuracy = 0.961302 Loss =  2261465.000000
Epoch: 1230 Test-Accuracy = 0.615446 Test-Top 5 Accuracy = 0.936966 Test-Loss =  2579270.250000
Epoch: 1240 Accuracy = 0.644302 Top 5 Accuracy = 0.961279 Loss =  2257407.750000
Epoch: 1240 Test-Accuracy = 0.615118 Test-Top 5 Accuracy = 0.936849 Test-Loss =  2592246.250000
Epoch: 1250 Accuracy = 0.644782 Top 5 Accuracy = 0.962292 Loss =  2251328.000000
Epoch: 1250 Test-Accuracy = 0.615165 Test-Top 5 Accuracy = 0.937879 Test-Loss =  2601963.500000
Epoch: 1260 Accuracy = 0.644823 Top 5 Accuracy = 0.962204 Loss =  2246036.500000
Epoch: 1260 Test-Accuracy = 0.615141 Test-Top 5 Accuracy = 0.938184 Test-Loss =  2619557.500000
Epoch: 1270 Accuracy = 0.644823 Top 5 Accuracy = 0.962866 Loss =  2243877.500000
Epoch: 1270 Test-Accuracy = 0.615118 Test-Top 5 Accuracy = 0.938394 Test-Loss =  2634694.500000
Epoch: 1280 Accuracy = 0.644688 Top 5 Accuracy = 0.962485 Loss =  2237495.500000
Epoch: 1280 Test-Accuracy = 0.614696 Test-Top 5 Accuracy = 0.938137 Test-Loss =  2652082.000000
Epoch: 1290 Accuracy = 0.645672 Top 5 Accuracy = 0.964582 Loss =  2232758.500000
Epoch: 1290 Test-Accuracy = 0.615376 Test-Top 5 Accuracy = 0.940432 Test-Loss =  2647461.250000
Epoch: 1300 Accuracy = 0.644624 Top 5 Accuracy = 0.963358 Loss =  2227615.500000
Epoch: 1300 Test-Accuracy = 0.614298 Test-Top 5 Accuracy = 0.939004 Test-Loss =  2678091.500000
Epoch: 1310 Accuracy = 0.645491 Top 5 Accuracy = 0.964681 Loss =  2223090.000000
Epoch: 1310 Test-Accuracy = 0.614884 Test-Top 5 Accuracy = 0.940643 Test-Loss =  2674678.000000
Epoch: 1320 Accuracy = 0.645303 Top 5 Accuracy = 0.964611 Loss =  2215818.750000
Epoch: 1320 Test-Accuracy = 0.614486 Test-Top 5 Accuracy = 0.940643 Test-Loss =  2693712.000000
Epoch: 1330 Accuracy = 0.645286 Top 5 Accuracy = 0.965162 Loss =  2213908.250000
Epoch: 1330 Test-Accuracy = 0.614720 Test-Top 5 Accuracy = 0.940643 Test-Loss =  2709291.000000
Epoch: 1340 Accuracy = 0.645789 Top 5 Accuracy = 0.965349 Loss =  2206027.250000
Epoch: 1340 Test-Accuracy = 0.614486 Test-Top 5 Accuracy = 0.940643 Test-Loss =  2718647.500000
Epoch: 1350 Accuracy = 0.645819 Top 5 Accuracy = 0.965607 Loss =  2207134.500000
Epoch: 1350 Test-Accuracy = 0.614743 Test-Top 5 Accuracy = 0.941369 Test-Loss =  2725136.250000
Epoch: 1360 Accuracy = 0.645965 Top 5 Accuracy = 0.966462 Loss =  2199840.000000
Epoch: 1360 Test-Accuracy = 0.615095 Test-Top 5 Accuracy = 0.941838 Test-Loss =  2742745.500000
Epoch: 1370 Accuracy = 0.646223 Top 5 Accuracy = 0.965812 Loss =  2193124.500000
Epoch: 1370 Test-Accuracy = 0.614509 Test-Top 5 Accuracy = 0.940924 Test-Loss =  2754396.500000
Epoch: 1380 Accuracy = 0.646691 Top 5 Accuracy = 0.967299 Loss =  2188037.250000
Epoch: 1380 Test-Accuracy = 0.615446 Test-Top 5 Accuracy = 0.942541 Test-Loss =  2756704.750000
Epoch: 1390 Accuracy = 0.646188 Top 5 Accuracy = 0.966667 Loss =  2185089.250000
Epoch: 1390 Test-Accuracy = 0.615024 Test-Top 5 Accuracy = 0.941768 Test-Loss =  2778436.250000
Epoch: 1400 Accuracy = 0.646498 Top 5 Accuracy = 0.966772 Loss =  2178445.500000
Epoch: 1400 Test-Accuracy = 0.614977 Test-Top 5 Accuracy = 0.941486 Test-Loss =  2783100.000000
Epoch: 1410 Accuracy = 0.646808 Top 5 Accuracy = 0.967065 Loss =  2177806.250000
Epoch: 1410 Test-Accuracy = 0.615422 Test-Top 5 Accuracy = 0.942658 Test-Loss =  2786615.500000
Epoch: 1420 Accuracy = 0.646756 Top 5 Accuracy = 0.967311 Loss =  2169892.500000
Epoch: 1420 Test-Accuracy = 0.615001 Test-Top 5 Accuracy = 0.942259 Test-Loss =  2803569.750000
Epoch: 1430 Accuracy = 0.646504 Top 5 Accuracy = 0.966842 Loss =  2166717.500000
Epoch: 1430 Test-Accuracy = 0.614158 Test-Top 5 Accuracy = 0.941299 Test-Loss =  2818313.500000
Epoch: 1440 Accuracy = 0.646680 Top 5 Accuracy = 0.967557 Loss =  2164130.500000
Epoch: 1440 Test-Accuracy = 0.614486 Test-Top 5 Accuracy = 0.942775 Test-Loss =  2818750.000000
Epoch: 1450 Accuracy = 0.647505 Top 5 Accuracy = 0.968945 Loss =  2158136.500000
Epoch: 1450 Test-Accuracy = 0.615305 Test-Top 5 Accuracy = 0.944040 Test-Loss =  2826777.250000
Epoch: 1460 Accuracy = 0.646791 Top 5 Accuracy = 0.967381 Loss =  2152875.250000
Epoch: 1460 Test-Accuracy = 0.614181 Test-Top 5 Accuracy = 0.941065 Test-Loss =  2849334.500000
Epoch: 1470 Accuracy = 0.647909 Top 5 Accuracy = 0.968992 Loss =  2147634.000000
Epoch: 1470 Test-Accuracy = 0.615493 Test-Top 5 Accuracy = 0.943852 Test-Loss =  2845585.500000
Epoch: 1480 Accuracy = 0.647734 Top 5 Accuracy = 0.968822 Loss =  2147202.000000
Epoch: 1480 Test-Accuracy = 0.615540 Test-Top 5 Accuracy = 0.943876 Test-Loss =  2854938.000000
Epoch: 1490 Accuracy = 0.647710 Top 5 Accuracy = 0.968851 Loss =  2140588.000000
Epoch: 1490 Test-Accuracy = 0.615376 Test-Top 5 Accuracy = 0.943407 Test-Loss =  2872957.500000
Epoch: 1500 Accuracy = 0.647775 Top 5 Accuracy = 0.968722 Loss =  2134431.250000
Epoch: 1500 Test-Accuracy = 0.615399 Test-Top 5 Accuracy = 0.943056 Test-Loss =  2882432.000000
Epoch: 1510 Accuracy = 0.647494 Top 5 Accuracy = 0.968910 Loss =  2130041.500000
Epoch: 1510 Test-Accuracy = 0.615352 Test-Top 5 Accuracy = 0.943009 Test-Loss =  2893115.750000
Epoch: 1520 Accuracy = 0.647394 Top 5 Accuracy = 0.969038 Loss =  2127296.500000
Epoch: 1520 Test-Accuracy = 0.614860 Test-Top 5 Accuracy = 0.943407 Test-Loss =  2906832.750000
Epoch: 1530 Accuracy = 0.647458 Top 5 Accuracy = 0.969302 Loss =  2123920.000000
Epoch: 1530 Test-Accuracy = 0.615118 Test-Top 5 Accuracy = 0.944180 Test-Loss =  2913661.250000
Epoch: 1540 Accuracy = 0.648589 Top 5 Accuracy = 0.971726 Loss =  2118354.000000
Epoch: 1540 Test-Accuracy = 0.616195 Test-Top 5 Accuracy = 0.946288 Test-Loss =  2914184.500000
Epoch: 1550 Accuracy = 0.649016 Top 5 Accuracy = 0.972306 Loss =  2115613.750000
Epoch: 1550 Test-Accuracy = 0.616523 Test-Top 5 Accuracy = 0.947834 Test-Loss =  2920663.000000
Epoch: 1560 Accuracy = 0.648542 Top 5 Accuracy = 0.971486 Loss =  2110375.000000
Epoch: 1560 Test-Accuracy = 0.616102 Test-Top 5 Accuracy = 0.946288 Test-Loss =  2942552.250000
Epoch: 1570 Accuracy = 0.648190 Top 5 Accuracy = 0.971082 Loss =  2105796.750000
Epoch: 1570 Test-Accuracy = 0.615095 Test-Top 5 Accuracy = 0.945328 Test-Loss =  2953548.500000
Epoch: 1580 Accuracy = 0.647212 Top 5 Accuracy = 0.970046 Loss =  2107247.750000
Epoch: 1580 Test-Accuracy = 0.614181 Test-Top 5 Accuracy = 0.943923 Test-Loss =  2982694.500000
Epoch: 1590 Accuracy = 0.650135 Top 5 Accuracy = 0.974186 Loss =  2100167.750000
Epoch: 1590 Test-Accuracy = 0.617250 Test-Top 5 Accuracy = 0.948397 Test-Loss =  2962164.000000
Epoch: 1600 Accuracy = 0.648021 Top 5 Accuracy = 0.970608 Loss =  2092607.250000
Epoch: 1600 Test-Accuracy = 0.614368 Test-Top 5 Accuracy = 0.944227 Test-Loss =  2994607.000000
Epoch: 1610 Accuracy = 0.650029 Top 5 Accuracy = 0.973343 Loss =  2088638.250000
Epoch: 1610 Test-Accuracy = 0.616242 Test-Top 5 Accuracy = 0.947483 Test-Loss =  2989836.500000
Epoch: 1620 Accuracy = 0.649151 Top 5 Accuracy = 0.971937 Loss =  2082992.000000
Epoch: 1620 Test-Accuracy = 0.614813 Test-Top 5 Accuracy = 0.945445 Test-Loss =  3008547.000000
Epoch: 1630 Accuracy = 0.649848 Top 5 Accuracy = 0.973226 Loss =  2080199.375000
Epoch: 1630 Test-Accuracy = 0.615821 Test-Top 5 Accuracy = 0.946616 Test-Loss =  3010231.250000
Epoch: 1640 Accuracy = 0.649906 Top 5 Accuracy = 0.973062 Loss =  2078709.500000
Epoch: 1640 Test-Accuracy = 0.615422 Test-Top 5 Accuracy = 0.946827 Test-Loss =  3027298.250000
Epoch: 1650 Accuracy = 0.650867 Top 5 Accuracy = 0.974543 Loss =  2075473.875000
Epoch: 1650 Test-Accuracy = 0.616922 Test-Top 5 Accuracy = 0.948397 Test-Loss =  3023039.750000
Epoch: 1660 Accuracy = 0.650205 Top 5 Accuracy = 0.972921 Loss =  2069283.750000
Epoch: 1660 Test-Accuracy = 0.615586 Test-Top 5 Accuracy = 0.946593 Test-Loss =  3046530.250000
Epoch: 1670 Accuracy = 0.650761 Top 5 Accuracy = 0.973940 Loss =  2064156.125000
Epoch: 1670 Test-Accuracy = 0.616125 Test-Top 5 Accuracy = 0.947670 Test-Loss =  3048980.250000
Epoch: 1680 Accuracy = 0.650919 Top 5 Accuracy = 0.973934 Loss =  2059496.625000
Epoch: 1680 Test-Accuracy = 0.616055 Test-Top 5 Accuracy = 0.947577 Test-Loss =  3061358.250000
Epoch: 1690 Accuracy = 0.651728 Top 5 Accuracy = 0.975281 Loss =  2064817.250000
Epoch: 1690 Test-Accuracy = 0.617460 Test-Top 5 Accuracy = 0.948912 Test-Loss =  3065214.500000
Epoch: 1700 Accuracy = 0.649818 Top 5 Accuracy = 0.973079 Loss =  2058106.250000
Epoch: 1700 Test-Accuracy = 0.615399 Test-Top 5 Accuracy = 0.946242 Test-Loss =  3085262.500000
Epoch: 1710 Accuracy = 0.651927 Top 5 Accuracy = 0.975398 Loss =  2053624.500000
Epoch: 1710 Test-Accuracy = 0.617109 Test-Top 5 Accuracy = 0.948842 Test-Loss =  3089841.750000
Epoch: 1720 Accuracy = 0.650170 Top 5 Accuracy = 0.973313 Loss =  2047960.875000
Epoch: 1720 Test-Accuracy = 0.615493 Test-Top 5 Accuracy = 0.946242 Test-Loss =  3105028.750000
Epoch: 1730 Accuracy = 0.651540 Top 5 Accuracy = 0.975182 Loss =  2042486.625000
Epoch: 1730 Test-Accuracy = 0.616523 Test-Top 5 Accuracy = 0.948280 Test-Loss =  3109925.000000
Epoch: 1740 Accuracy = 0.651054 Top 5 Accuracy = 0.974959 Loss =  2039258.500000
Epoch: 1740 Test-Accuracy = 0.616219 Test-Top 5 Accuracy = 0.947647 Test-Loss =  3123861.250000
Epoch: 1750 Accuracy = 0.650656 Top 5 Accuracy = 0.974754 Loss =  2037568.875000
Epoch: 1750 Test-Accuracy = 0.616008 Test-Top 5 Accuracy = 0.947647 Test-Loss =  3131712.750000
Epoch: 1760 Accuracy = 0.651476 Top 5 Accuracy = 0.975240 Loss =  2032484.750000
Epoch: 1760 Test-Accuracy = 0.616570 Test-Top 5 Accuracy = 0.948490 Test-Loss =  3140497.000000
Epoch: 1770 Accuracy = 0.651365 Top 5 Accuracy = 0.975627 Loss =  2029305.375000
Epoch: 1770 Test-Accuracy = 0.616477 Test-Top 5 Accuracy = 0.949053 Test-Loss =  3150976.000000
Epoch: 1780 Accuracy = 0.652120 Top 5 Accuracy = 0.977091 Loss =  2028626.250000
Epoch: 1780 Test-Accuracy = 0.616687 Test-Top 5 Accuracy = 0.950669 Test-Loss =  3153050.000000
Epoch: 1790 Accuracy = 0.651728 Top 5 Accuracy = 0.976323 Loss =  2024737.875000
Epoch: 1790 Test-Accuracy = 0.616500 Test-Top 5 Accuracy = 0.949825 Test-Loss =  3170407.250000
Epoch: 1800 Accuracy = 0.651552 Top 5 Accuracy = 0.976739 Loss =  2021782.500000
Epoch: 1800 Test-Accuracy = 0.616078 Test-Top 5 Accuracy = 0.949989 Test-Loss =  3171258.750000
Epoch: 1810 Accuracy = 0.652483 Top 5 Accuracy = 0.977706 Loss =  2017960.000000
Epoch: 1810 Test-Accuracy = 0.617132 Test-Top 5 Accuracy = 0.950903 Test-Loss =  3185601.000000
Epoch: 1820 Accuracy = 0.652026 Top 5 Accuracy = 0.977342 Loss =  2014126.750000
Epoch: 1820 Test-Accuracy = 0.616687 Test-Top 5 Accuracy = 0.950247 Test-Loss =  3195146.250000
Epoch: 1830 Accuracy = 0.650644 Top 5 Accuracy = 0.975357 Loss =  2014717.500000
Epoch: 1830 Test-Accuracy = 0.614931 Test-Top 5 Accuracy = 0.948725 Test-Loss =  3208618.000000
Epoch: 1840 Accuracy = 0.650826 Top 5 Accuracy = 0.975960 Loss =  2009982.625000
Epoch: 1840 Test-Accuracy = 0.615188 Test-Top 5 Accuracy = 0.949521 Test-Loss =  3224227.250000
Epoch: 1850 Accuracy = 0.653274 Top 5 Accuracy = 0.979550 Loss =  2013813.500000
Epoch: 1850 Test-Accuracy = 0.617999 Test-Top 5 Accuracy = 0.953035 Test-Loss =  3207965.500000
Epoch: 1860 Accuracy = 0.650984 Top 5 Accuracy = 0.976136 Loss =  2006054.125000
Epoch: 1860 Test-Accuracy = 0.615259 Test-Top 5 Accuracy = 0.949685 Test-Loss =  3237213.750000
Epoch: 1870 Accuracy = 0.652290 Top 5 Accuracy = 0.977770 Loss =  2000678.250000
Epoch: 1870 Test-Accuracy = 0.616617 Test-Top 5 Accuracy = 0.950341 Test-Loss =  3242853.000000
Epoch: 1880 Accuracy = 0.652553 Top 5 Accuracy = 0.978631 Loss =  1998126.625000
Epoch: 1880 Test-Accuracy = 0.616594 Test-Top 5 Accuracy = 0.951559 Test-Loss =  3239581.500000
Epoch: 1890 Accuracy = 0.651376 Top 5 Accuracy = 0.976950 Loss =  1994687.000000
Epoch: 1890 Test-Accuracy = 0.615657 Test-Top 5 Accuracy = 0.950130 Test-Loss =  3261173.750000
Epoch: 1900 Accuracy = 0.651160 Top 5 Accuracy = 0.976827 Loss =  1992578.500000
Epoch: 1900 Test-Accuracy = 0.615282 Test-Top 5 Accuracy = 0.949966 Test-Loss =  3263632.000000
Epoch: 1910 Accuracy = 0.652319 Top 5 Accuracy = 0.978736 Loss =  1990503.625000
Epoch: 1910 Test-Accuracy = 0.616430 Test-Top 5 Accuracy = 0.952051 Test-Loss =  3271236.500000
Epoch: 1920 Accuracy = 0.653174 Top 5 Accuracy = 0.979082 Loss =  1986442.250000
Epoch: 1920 Test-Accuracy = 0.617109 Test-Top 5 Accuracy = 0.952238 Test-Loss =  3285592.750000
Epoch: 1930 Accuracy = 0.652922 Top 5 Accuracy = 0.979029 Loss =  1985008.250000
Epoch: 1930 Test-Accuracy = 0.616781 Test-Top 5 Accuracy = 0.952519 Test-Loss =  3282521.000000
Epoch: 1940 Accuracy = 0.653525 Top 5 Accuracy = 0.979638 Loss =  1981577.750000
Epoch: 1940 Test-Accuracy = 0.617624 Test-Top 5 Accuracy = 0.953058 Test-Loss =  3295340.500000
Epoch: 1950 Accuracy = 0.653935 Top 5 Accuracy = 0.980915 Loss =  1978082.750000
Epoch: 1950 Test-Accuracy = 0.617320 Test-Top 5 Accuracy = 0.953948 Test-Loss =  3299154.000000
Epoch: 1960 Accuracy = 0.652313 Top 5 Accuracy = 0.978818 Loss =  1974552.500000
Epoch: 1960 Test-Accuracy = 0.615727 Test-Top 5 Accuracy = 0.952308 Test-Loss =  3312952.500000
Epoch: 1970 Accuracy = 0.654257 Top 5 Accuracy = 0.980973 Loss =  1976741.000000
Epoch: 1970 Test-Accuracy = 0.618163 Test-Top 5 Accuracy = 0.954112 Test-Loss =  3316082.500000
Epoch: 1980 Accuracy = 0.653028 Top 5 Accuracy = 0.979386 Loss =  1972220.500000
Epoch: 1980 Test-Accuracy = 0.617156 Test-Top 5 Accuracy = 0.952590 Test-Loss =  3329828.750000
Epoch: 1990 Accuracy = 0.654170 Top 5 Accuracy = 0.981032 Loss =  1972091.500000
Epoch: 1990 Test-Accuracy = 0.617695 Test-Top 5 Accuracy = 0.954065 Test-Loss =  3328398.500000
Epoch: 2000 Accuracy = 0.653408 Top 5 Accuracy = 0.980101 Loss =  1966708.125000
Epoch: 2000 Test-Accuracy = 0.618023 Test-Top 5 Accuracy = 0.953433 Test-Loss =  3347232.750000
Epoch: 2010 Accuracy = 0.653151 Top 5 Accuracy = 0.979539 Loss =  1961945.125000
Epoch: 2010 Test-Accuracy = 0.616922 Test-Top 5 Accuracy = 0.952402 Test-Loss =  3346944.000000
Epoch: 2020 Accuracy = 0.653356 Top 5 Accuracy = 0.980294 Loss =  1959715.500000
Epoch: 2020 Test-Accuracy = 0.617437 Test-Top 5 Accuracy = 0.952988 Test-Loss =  3361398.000000
Epoch: 2030 Accuracy = 0.653719 Top 5 Accuracy = 0.980692 Loss =  1954679.750000
Epoch: 2030 Test-Accuracy = 0.617507 Test-Top 5 Accuracy = 0.953386 Test-Loss =  3365930.500000
Epoch: 2040 Accuracy = 0.653918 Top 5 Accuracy = 0.980985 Loss =  1953237.250000
Epoch: 2040 Test-Accuracy = 0.618069 Test-Top 5 Accuracy = 0.953573 Test-Loss =  3374416.500000
Epoch: 2050 Accuracy = 0.654802 Top 5 Accuracy = 0.981963 Loss =  1949512.000000
Epoch: 2050 Test-Accuracy = 0.618561 Test-Top 5 Accuracy = 0.954862 Test-Loss =  3375240.500000
Epoch: 2060 Accuracy = 0.652992 Top 5 Accuracy = 0.979738 Loss =  1947692.750000
Epoch: 2060 Test-Accuracy = 0.617039 Test-Top 5 Accuracy = 0.952308 Test-Loss =  3400486.500000
Epoch: 2070 Accuracy = 0.654919 Top 5 Accuracy = 0.982244 Loss =  1945114.875000
Epoch: 2070 Test-Accuracy = 0.619006 Test-Top 5 Accuracy = 0.954651 Test-Loss =  3385847.500000
Epoch: 2080 Accuracy = 0.653426 Top 5 Accuracy = 0.980505 Loss =  1942137.250000
Epoch: 2080 Test-Accuracy = 0.617671 Test-Top 5 Accuracy = 0.952683 Test-Loss =  3411603.500000
Epoch: 2090 Accuracy = 0.654357 Top 5 Accuracy = 0.981658 Loss =  1938651.875000
Epoch: 2090 Test-Accuracy = 0.618116 Test-Top 5 Accuracy = 0.954159 Test-Loss =  3407766.500000
Epoch: 2100 Accuracy = 0.654146 Top 5 Accuracy = 0.981424 Loss =  1937035.000000
Epoch: 2100 Test-Accuracy = 0.617812 Test-Top 5 Accuracy = 0.953995 Test-Loss =  3423393.500000
Epoch: 2110 Accuracy = 0.654246 Top 5 Accuracy = 0.981682 Loss =  1933944.625000
Epoch: 2110 Test-Accuracy = 0.618140 Test-Top 5 Accuracy = 0.953901 Test-Loss =  3424594.750000
Epoch: 2120 Accuracy = 0.652899 Top 5 Accuracy = 0.980036 Loss =  1931825.625000
Epoch: 2120 Test-Accuracy = 0.616031 Test-Top 5 Accuracy = 0.951863 Test-Loss =  3445383.750000
Epoch: 2130 Accuracy = 0.654773 Top 5 Accuracy = 0.982642 Loss =  1931266.750000
Epoch: 2130 Test-Accuracy = 0.617765 Test-Top 5 Accuracy = 0.955611 Test-Loss =  3435830.500000
Epoch: 2140 Accuracy = 0.654363 Top 5 Accuracy = 0.981331 Loss =  1928146.750000
Epoch: 2140 Test-Accuracy = 0.617624 Test-Top 5 Accuracy = 0.953316 Test-Loss =  3453455.500000
Epoch: 2150 Accuracy = 0.654943 Top 5 Accuracy = 0.982543 Loss =  1925880.500000
Epoch: 2150 Test-Accuracy = 0.617929 Test-Top 5 Accuracy = 0.955026 Test-Loss =  3458761.500000
Epoch: 2160 Accuracy = 0.654591 Top 5 Accuracy = 0.982373 Loss =  1922320.750000
Epoch: 2160 Test-Accuracy = 0.617695 Test-Top 5 Accuracy = 0.954627 Test-Loss =  3457967.000000
Epoch: 2170 Accuracy = 0.654978 Top 5 Accuracy = 0.982262 Loss =  1918719.625000
Epoch: 2170 Test-Accuracy = 0.617905 Test-Top 5 Accuracy = 0.954370 Test-Loss =  3478985.500000
Epoch: 2180 Accuracy = 0.656571 Top 5 Accuracy = 0.983919 Loss =  1917736.500000
Epoch: 2180 Test-Accuracy = 0.619873 Test-Top 5 Accuracy = 0.956642 Test-Loss =  3475361.500000
Epoch: 2190 Accuracy = 0.654480 Top 5 Accuracy = 0.982145 Loss =  1916299.375000
Epoch: 2190 Test-Accuracy = 0.617695 Test-Top 5 Accuracy = 0.954534 Test-Loss =  3479088.500000
Epoch: 2200 Accuracy = 0.655130 Top 5 Accuracy = 0.982572 Loss =  1913632.250000
Epoch: 2200 Test-Accuracy = 0.617929 Test-Top 5 Accuracy = 0.954979 Test-Loss =  3500575.000000
Epoch: 2210 Accuracy = 0.655663 Top 5 Accuracy = 0.983123 Loss =  1914395.500000
Epoch: 2210 Test-Accuracy = 0.618655 Test-Top 5 Accuracy = 0.955752 Test-Loss =  3484969.250000
Epoch: 2220 Accuracy = 0.655774 Top 5 Accuracy = 0.983123 Loss =  1909353.875000
Epoch: 2220 Test-Accuracy = 0.618842 Test-Top 5 Accuracy = 0.955354 Test-Loss =  3513091.250000
Epoch: 2230 Accuracy = 0.656055 Top 5 Accuracy = 0.983257 Loss =  1904775.500000
Epoch: 2230 Test-Accuracy = 0.618491 Test-Top 5 Accuracy = 0.955822 Test-Loss =  3521884.500000
Epoch: 2240 Accuracy = 0.655932 Top 5 Accuracy = 0.983163 Loss =  1903761.875000
Epoch: 2240 Test-Accuracy = 0.618514 Test-Top 5 Accuracy = 0.955845 Test-Loss =  3518323.500000
Epoch: 2250 Accuracy = 0.655628 Top 5 Accuracy = 0.982584 Loss =  1901637.375000
Epoch: 2250 Test-Accuracy = 0.617976 Test-Top 5 Accuracy = 0.954721 Test-Loss =  3540861.750000
Epoch: 2260 Accuracy = 0.655552 Top 5 Accuracy = 0.982654 Loss =  1899311.125000
Epoch: 2260 Test-Accuracy = 0.617952 Test-Top 5 Accuracy = 0.954979 Test-Loss =  3534831.500000
Epoch: 2270 Accuracy = 0.655183 Top 5 Accuracy = 0.982150 Loss =  1899078.625000
Epoch: 2270 Test-Accuracy = 0.617226 Test-Top 5 Accuracy = 0.954323 Test-Loss =  3556326.500000
Epoch: 2280 Accuracy = 0.655768 Top 5 Accuracy = 0.983310 Loss =  1893951.750000
Epoch: 2280 Test-Accuracy = 0.617976 Test-Top 5 Accuracy = 0.955845 Test-Loss =  3549348.000000
Epoch: 2290 Accuracy = 0.658322 Top 5 Accuracy = 0.985160 Loss =  1893266.500000
Epoch: 2290 Test-Accuracy = 0.620295 Test-Top 5 Accuracy = 0.957181 Test-Loss =  3549977.750000
Epoch: 2300 Accuracy = 0.656401 Top 5 Accuracy = 0.983609 Loss =  1894619.375000
Epoch: 2300 Test-Accuracy = 0.618397 Test-Top 5 Accuracy = 0.955518 Test-Loss =  3584580.000000
Epoch: 2310 Accuracy = 0.655733 Top 5 Accuracy = 0.983076 Loss =  1888310.000000
Epoch: 2310 Test-Accuracy = 0.617554 Test-Top 5 Accuracy = 0.955424 Test-Loss =  3569114.000000
Epoch: 2320 Accuracy = 0.657601 Top 5 Accuracy = 0.984557 Loss =  1886903.000000
Epoch: 2320 Test-Accuracy = 0.619639 Test-Top 5 Accuracy = 0.956900 Test-Loss =  3575298.000000
Epoch: 2330 Accuracy = 0.657554 Top 5 Accuracy = 0.984458 Loss =  1886466.000000
Epoch: 2330 Test-Accuracy = 0.619873 Test-Top 5 Accuracy = 0.956337 Test-Loss =  3595119.500000
Epoch: 2340 Accuracy = 0.656418 Top 5 Accuracy = 0.983544 Loss =  1883548.875000
Epoch: 2340 Test-Accuracy = 0.618210 Test-Top 5 Accuracy = 0.955377 Test-Loss =  3594672.000000
Epoch: 2350 Accuracy = 0.655780 Top 5 Accuracy = 0.982783 Loss =  1882102.250000
Epoch: 2350 Test-Accuracy = 0.617414 Test-Top 5 Accuracy = 0.954674 Test-Loss =  3609957.750000
Epoch: 2360 Accuracy = 0.656278 Top 5 Accuracy = 0.983732 Loss =  1888402.875000
Epoch: 2360 Test-Accuracy = 0.618796 Test-Top 5 Accuracy = 0.955658 Test-Loss =  3611172.500000
Epoch: 2370 Accuracy = 0.655985 Top 5 Accuracy = 0.983562 Loss =  1877432.500000
Epoch: 2370 Test-Accuracy = 0.618163 Test-Top 5 Accuracy = 0.955330 Test-Loss =  3623707.750000
Epoch: 2380 Accuracy = 0.656395 Top 5 Accuracy = 0.983855 Loss =  1876925.250000
Epoch: 2380 Test-Accuracy = 0.618163 Test-Top 5 Accuracy = 0.955728 Test-Loss =  3617167.000000
Epoch: 2390 Accuracy = 0.658298 Top 5 Accuracy = 0.985254 Loss =  1873197.625000
Epoch: 2390 Test-Accuracy = 0.619686 Test-Top 5 Accuracy = 0.956923 Test-Loss =  3628210.750000
Epoch: 2400 Accuracy = 0.658749 Top 5 Accuracy = 0.985389 Loss =  1870515.625000
Epoch: 2400 Test-Accuracy = 0.619826 Test-Top 5 Accuracy = 0.957157 Test-Loss =  3647437.000000
Epoch: 2410 Accuracy = 0.659639 Top 5 Accuracy = 0.986010 Loss =  1868539.750000
Epoch: 2410 Test-Accuracy = 0.620529 Test-Top 5 Accuracy = 0.957626 Test-Loss =  3645456.000000
Epoch: 2420 Accuracy = 0.657150 Top 5 Accuracy = 0.984159 Loss =  1876191.500000
Epoch: 2420 Test-Accuracy = 0.618796 Test-Top 5 Accuracy = 0.956127 Test-Loss =  3644602.000000
Epoch: 2430 Accuracy = 0.655932 Top 5 Accuracy = 0.983134 Loss =  1875856.625000
Epoch: 2430 Test-Accuracy = 0.616453 Test-Top 5 Accuracy = 0.954745 Test-Loss =  3683556.500000
Epoch: 2440 Accuracy = 0.660348 Top 5 Accuracy = 0.986531 Loss =  1865729.500000
Epoch: 2440 Test-Accuracy = 0.620997 Test-Top 5 Accuracy = 0.958001 Test-Loss =  3676840.500000
Epoch: 2450 Accuracy = 0.656828 Top 5 Accuracy = 0.983480 Loss =  1860971.125000
Epoch: 2450 Test-Accuracy = 0.617460 Test-Top 5 Accuracy = 0.955166 Test-Loss =  3686473.500000
Epoch: 2460 Accuracy = 0.658304 Top 5 Accuracy = 0.985096 Loss =  1860178.000000
Epoch: 2460 Test-Accuracy = 0.618538 Test-Top 5 Accuracy = 0.956361 Test-Loss =  3677436.500000
Epoch: 2470 Accuracy = 0.657215 Top 5 Accuracy = 0.984089 Loss =  1855072.000000
Epoch: 2470 Test-Accuracy = 0.617648 Test-Top 5 Accuracy = 0.955916 Test-Loss =  3693729.250000
Epoch: 2480 Accuracy = 0.658005 Top 5 Accuracy = 0.984616 Loss =  1853965.750000
Epoch: 2480 Test-Accuracy = 0.618257 Test-Top 5 Accuracy = 0.956548 Test-Loss =  3707164.250000
Epoch: 2490 Accuracy = 0.657882 Top 5 Accuracy = 0.984551 Loss =  1851519.375000
Epoch: 2490 Test-Accuracy = 0.617835 Test-Top 5 Accuracy = 0.956150 Test-Loss =  3711633.250000
Epoch: 2500 Accuracy = 0.657168 Top 5 Accuracy = 0.984358 Loss =  1856212.250000
Epoch: 2500 Test-Accuracy = 0.617905 Test-Top 5 Accuracy = 0.955822 Test-Loss =  3701866.000000
Epoch: 2510 Accuracy = 0.653695 Top 5 Accuracy = 0.980926 Loss =  1851312.750000
Epoch: 2510 Test-Accuracy = 0.614532 Test-Top 5 Accuracy = 0.952262 Test-Loss =  3732493.000000
Epoch: 2520 Accuracy = 0.659610 Top 5 Accuracy = 0.986086 Loss =  1848748.750000
Epoch: 2520 Test-Accuracy = 0.620084 Test-Top 5 Accuracy = 0.957930 Test-Loss =  3724539.000000
Epoch: 2530 Accuracy = 0.659019 Top 5 Accuracy = 0.985717 Loss =  1846959.500000
Epoch: 2530 Test-Accuracy = 0.619686 Test-Top 5 Accuracy = 0.957743 Test-Loss =  3738445.500000
Epoch: 2540 Accuracy = 0.656032 Top 5 Accuracy = 0.982836 Loss =  1843560.625000
Epoch: 2540 Test-Accuracy = 0.616922 Test-Top 5 Accuracy = 0.954932 Test-Loss =  3743749.000000
Epoch: 2550 Accuracy = 0.663967 Top 5 Accuracy = 0.988364 Loss =  1840385.375000
Epoch: 2550 Test-Accuracy = 0.623855 Test-Top 5 Accuracy = 0.960366 Test-Loss =  3750802.000000
Epoch: 2560 Accuracy = 0.661484 Top 5 Accuracy = 0.987380 Loss =  1838597.500000
Epoch: 2560 Test-Accuracy = 0.621114 Test-Top 5 Accuracy = 0.959429 Test-Loss =  3760095.250000
Epoch: 2570 Accuracy = 0.656576 Top 5 Accuracy = 0.982970 Loss =  1837828.500000
Epoch: 2570 Test-Accuracy = 0.616430 Test-Top 5 Accuracy = 0.954815 Test-Loss =  3772278.250000
Epoch: 2580 Accuracy = 0.660330 Top 5 Accuracy = 0.986712 Loss =  1834041.125000
Epoch: 2580 Test-Accuracy = 0.619826 Test-Top 5 Accuracy = 0.958399 Test-Loss =  3771557.500000
Epoch: 2590 Accuracy = 0.660295 Top 5 Accuracy = 0.986601 Loss =  1838473.250000
Epoch: 2590 Test-Accuracy = 0.620014 Test-Top 5 Accuracy = 0.958328 Test-Loss =  3784936.000000
Epoch: 2600 Accuracy = 0.660348 Top 5 Accuracy = 0.987052 Loss =  1841798.000000
Epoch: 2600 Test-Accuracy = 0.620716 Test-Top 5 Accuracy = 0.958633 Test-Loss =  3776836.000000
Epoch: 2610 Accuracy = 0.659868 Top 5 Accuracy = 0.986425 Loss =  1834760.625000
Epoch: 2610 Test-Accuracy = 0.620037 Test-Top 5 Accuracy = 0.958352 Test-Loss =  3801035.000000
Epoch: 2620 Accuracy = 0.656325 Top 5 Accuracy = 0.982847 Loss =  1826926.250000
Epoch: 2620 Test-Accuracy = 0.617109 Test-Top 5 Accuracy = 0.955190 Test-Loss =  3795124.000000
Epoch: 2630 Accuracy = 0.655610 Top 5 Accuracy = 0.982513 Loss =  1824609.000000
Epoch: 2630 Test-Accuracy = 0.616734 Test-Top 5 Accuracy = 0.954440 Test-Loss =  3806889.500000
Epoch: 2640 Accuracy = 0.653748 Top 5 Accuracy = 0.980897 Loss =  1826868.000000
Epoch: 2640 Test-Accuracy = 0.614696 Test-Top 5 Accuracy = 0.952566 Test-Loss =  3821515.750000
Epoch: 2650 Accuracy = 0.662468 Top 5 Accuracy = 0.988007 Loss =  1821247.625000
Epoch: 2650 Test-Accuracy = 0.622098 Test-Top 5 Accuracy = 0.959898 Test-Loss =  3815637.000000
Epoch: 2660 Accuracy = 0.658099 Top 5 Accuracy = 0.985570 Loss =  1820030.750000
Epoch: 2660 Test-Accuracy = 0.618819 Test-Top 5 Accuracy = 0.957110 Test-Loss =  3817045.000000
Epoch: 2670 Accuracy = 0.657496 Top 5 Accuracy = 0.984786 Loss =  1816869.750000
Epoch: 2670 Test-Accuracy = 0.618327 Test-Top 5 Accuracy = 0.956455 Test-Loss =  3838001.500000
Epoch: 2680 Accuracy = 0.662966 Top 5 Accuracy = 0.988674 Loss =  1817508.250000
Epoch: 2680 Test-Accuracy = 0.623410 Test-Top 5 Accuracy = 0.960202 Test-Loss =  3834675.500000
Epoch: 2690 Accuracy = 0.655212 Top 5 Accuracy = 0.982332 Loss =  1814466.000000
Epoch: 2690 Test-Accuracy = 0.615399 Test-Top 5 Accuracy = 0.953831 Test-Loss =  3850601.500000
Epoch: 2700 Accuracy = 0.655487 Top 5 Accuracy = 0.982912 Loss =  1814248.250000
Epoch: 2700 Test-Accuracy = 0.616453 Test-Top 5 Accuracy = 0.954674 Test-Loss =  3845103.000000
Epoch: 2710 Accuracy = 0.662456 Top 5 Accuracy = 0.987807 Loss =  1811359.500000
Epoch: 2710 Test-Accuracy = 0.622192 Test-Top 5 Accuracy = 0.959874 Test-Loss =  3868683.250000
Epoch: 2720 Accuracy = 0.661788 Top 5 Accuracy = 0.987796 Loss =  1812505.000000
Epoch: 2720 Test-Accuracy = 0.621958 Test-Top 5 Accuracy = 0.959453 Test-Loss =  3861417.500000
Epoch: 2730 Accuracy = 0.664640 Top 5 Accuracy = 0.988393 Loss =  1805953.875000
Epoch: 2730 Test-Accuracy = 0.624019 Test-Top 5 Accuracy = 0.960811 Test-Loss =  3874309.000000
Epoch: 2740 Accuracy = 0.655944 Top 5 Accuracy = 0.983240 Loss =  1814995.500000
Epoch: 2740 Test-Accuracy = 0.616641 Test-Top 5 Accuracy = 0.955119 Test-Loss =  3865395.500000
Epoch: 2750 Accuracy = 0.660248 Top 5 Accuracy = 0.987228 Loss =  1809850.000000
Epoch: 2750 Test-Accuracy = 0.620435 Test-Top 5 Accuracy = 0.958727 Test-Loss =  3890314.500000
Epoch: 2760 Accuracy = 0.659106 Top 5 Accuracy = 0.986478 Loss =  1802453.250000
Epoch: 2760 Test-Accuracy = 0.619545 Test-Top 5 Accuracy = 0.957555 Test-Loss =  3889498.000000
Epoch: 2770 Accuracy = 0.657642 Top 5 Accuracy = 0.984874 Loss =  1799563.500000
Epoch: 2770 Test-Accuracy = 0.618093 Test-Top 5 Accuracy = 0.955822 Test-Loss =  3895853.000000
Epoch: 2780 Accuracy = 0.659669 Top 5 Accuracy = 0.986466 Loss =  1798764.125000
Epoch: 2780 Test-Accuracy = 0.619873 Test-Top 5 Accuracy = 0.957555 Test-Loss =  3902038.250000
Epoch: 2790 Accuracy = 0.657666 Top 5 Accuracy = 0.984967 Loss =  1794555.875000
Epoch: 2790 Test-Accuracy = 0.618046 Test-Top 5 Accuracy = 0.956009 Test-Loss =  3910401.250000
Epoch: 2800 Accuracy = 0.663563 Top 5 Accuracy = 0.988633 Loss =  1805250.250000
Epoch: 2800 Test-Accuracy = 0.623480 Test-Top 5 Accuracy = 0.960109 Test-Loss =  3921772.500000
Epoch: 2810 Accuracy = 0.664219 Top 5 Accuracy = 0.988165 Loss =  1793099.375000
Epoch: 2810 Test-Accuracy = 0.623504 Test-Top 5 Accuracy = 0.959898 Test-Loss =  3919615.250000
Epoch: 2820 Accuracy = 0.660459 Top 5 Accuracy = 0.987339 Loss =  1791986.500000
Epoch: 2820 Test-Accuracy = 0.620295 Test-Top 5 Accuracy = 0.958914 Test-Loss =  3931441.000000
Epoch: 2830 Accuracy = 0.658708 Top 5 Accuracy = 0.985787 Loss =  1790927.375000
Epoch: 2830 Test-Accuracy = 0.619264 Test-Top 5 Accuracy = 0.956736 Test-Loss =  3928774.750000
Epoch: 2840 Accuracy = 0.660313 Top 5 Accuracy = 0.986706 Loss =  1786639.500000
Epoch: 2840 Test-Accuracy = 0.620060 Test-Top 5 Accuracy = 0.957860 Test-Loss =  3946427.750000
Epoch: 2850 Accuracy = 0.665495 Top 5 Accuracy = 0.988393 Loss =  1790617.250000
Epoch: 2850 Test-Accuracy = 0.624769 Test-Top 5 Accuracy = 0.959453 Test-Loss =  3953658.750000
Epoch: 2860 Accuracy = 0.659106 Top 5 Accuracy = 0.985377 Loss =  1795613.125000
Epoch: 2860 Test-Accuracy = 0.619545 Test-Top 5 Accuracy = 0.956876 Test-Loss =  3938747.750000
Epoch: 2870 Accuracy = 0.649596 Top 5 Accuracy = 0.971797 Loss =  1801798.750000
Epoch: 2870 Test-Accuracy = 0.611300 Test-Top 5 Accuracy = 0.942939 Test-Loss =  3958055.000000
Epoch: 2880 Accuracy = 0.660717 Top 5 Accuracy = 0.987878 Loss =  1788744.250000
Epoch: 2880 Test-Accuracy = 0.621419 Test-Top 5 Accuracy = 0.958703 Test-Loss =  3965945.500000
Epoch: 2890 Accuracy = 0.661718 Top 5 Accuracy = 0.986607 Loss =  1781995.250000
Epoch: 2890 Test-Accuracy = 0.621700 Test-Top 5 Accuracy = 0.958118 Test-Loss =  3968165.000000
Epoch: 2900 Accuracy = 0.662737 Top 5 Accuracy = 0.987128 Loss =  1777899.000000
Epoch: 2900 Test-Accuracy = 0.622590 Test-Top 5 Accuracy = 0.958328 Test-Loss =  3974060.250000
Epoch: 2910 Accuracy = 0.661215 Top 5 Accuracy = 0.986911 Loss =  1775375.000000
Epoch: 2910 Test-Accuracy = 0.621021 Test-Top 5 Accuracy = 0.958094 Test-Loss =  3978348.500000
Epoch: 2920 Accuracy = 0.661425 Top 5 Accuracy = 0.986654 Loss =  1773545.625000
Epoch: 2920 Test-Accuracy = 0.620857 Test-Top 5 Accuracy = 0.957930 Test-Loss =  3986303.500000
Epoch: 2930 Accuracy = 0.661683 Top 5 Accuracy = 0.986759 Loss =  1771951.000000
Epoch: 2930 Test-Accuracy = 0.621068 Test-Top 5 Accuracy = 0.958047 Test-Loss =  3991109.500000
Epoch: 2940 Accuracy = 0.661215 Top 5 Accuracy = 0.986525 Loss =  1771377.125000
Epoch: 2940 Test-Accuracy = 0.620482 Test-Top 5 Accuracy = 0.957883 Test-Loss =  3999385.500000
Epoch: 2950 Accuracy = 0.662433 Top 5 Accuracy = 0.987128 Loss =  1769342.250000
Epoch: 2950 Test-Accuracy = 0.621817 Test-Top 5 Accuracy = 0.958422 Test-Loss =  4005133.750000
Epoch: 2960 Accuracy = 0.664570 Top 5 Accuracy = 0.988001 Loss =  1778581.250000
Epoch: 2960 Test-Accuracy = 0.623785 Test-Top 5 Accuracy = 0.960202 Test-Loss =  3992227.000000
Epoch: 2970 Accuracy = 0.659956 Top 5 Accuracy = 0.986367 Loss =  1768702.625000
Epoch: 2970 Test-Accuracy = 0.619662 Test-Top 5 Accuracy = 0.957532 Test-Loss =  4018618.500000
Epoch: 2980 Accuracy = 0.659060 Top 5 Accuracy = 0.985389 Loss =  1771573.250000
Epoch: 2980 Test-Accuracy = 0.619428 Test-Top 5 Accuracy = 0.957345 Test-Loss =  4012916.000000
Epoch: 2990 Accuracy = 0.652536 Top 5 Accuracy = 0.978004 Loss =  1767169.375000
Epoch: 2990 Test-Accuracy = 0.613174 Test-Top 5 Accuracy = 0.949404 Test-Loss =  4021825.000000

Optimization Finished!
Training Accuracy= 0.664746
Training Top 5 Accuracy= 0.988352

Testing Accuracy= 0.623668
Testing Top 5 Accuracy= 0.959898

testing_predictions, testing_labels = session.run([tf.argmax(predictions,1), tf.argmax(labels,1)], 
                                                  feed_dict={features: inputX_test,
                                                             labels: inputY_test,
                                                             pkeep: 1})

print(classification_report(testing_labels, testing_predictions, target_names=y.columns))

                           precision    recall  f1-score   support

   country_destination_AU       0.01      0.04      0.01       110
   country_destination_CA       0.02      0.10      0.03       290
   country_destination_DE       0.02      0.11      0.03       219
   country_destination_ES       0.03      0.17      0.05       456
   country_destination_FR       0.06      0.08      0.07       998
   country_destination_GB       0.02      0.15      0.04       449
   country_destination_IT       0.04      0.18      0.06       565
  country_destination_NDF       1.00      1.00      1.00     24984
   country_destination_NL       0.01      0.06      0.02       155
   country_destination_PT       0.00      0.02      0.01        44
   country_destination_US       0.72      0.08      0.15     12427
country_destination_other       0.12      0.11      0.12      1994

              avg / total       0.80      0.62      0.64     42691

#Plot accuracies and cost summary
f, (ax1, ax2, ax3) = plt.subplots(3, 1, sharex=True, figsize=(10,8))

ax1.plot(accuracy_summary)
ax1.plot(test_accuracy_summary)
ax1.set_title('Top 1 Accuracy')

ax2.plot(accuracy_top5_summary)
ax2.plot(test_accuracy_top5_summary)
ax2.set_title('Top 5 Accuracy')

ax3.plot(loss_summary)
ax3.plot(test_loss_summary)
ax3.set_title('Loss')

plt.xlabel('Epochs (x10)')
plt.show()

# Find the probabilities for each prediction
test_final = df_test.as_matrix()
final_probabilities = session.run(predictions, feed_dict={features: test_final,
                                                          pkeep: 1})

# Explore some of the predictions
final_probabilities[0]

array([  7.34245255e-21,   5.02517741e-30,   1.42874395e-25,
         2.40796284e-15,   4.88082264e-07,   2.60900578e-07,
         1.31230571e-09,   9.99996901e-01,   3.10756943e-27,
         3.81869185e-14,   1.90081550e-06,   4.43023225e-07], dtype=float32)

# Encode the labels for the countries
le = LabelEncoder()
fit_labels = le.fit_transform(train.country_destination) 

# Get the ids for the test data
test_getIDs = pd.read_csv("test_users.csv")
testIDs = test_getIDs['id']

ids = []  #list of ids
countries = []  #list of countries
for i in range(len(testIDs)):
    # Select the 5 countries with highest probabilities
    idx = testIDs[i]
    ids += [idx] * 5
    countries += le.inverse_transform(np.argsort(final_probabilities[i])[::-1])[:5].tolist()
    if i % 10000 == 0:
        print ("Percent complete: {}%".format(round(i / len(test),4)*100))

#Generate submission
submission = pd.DataFrame(np.column_stack((ids, countries)), columns=['id', 'country'])
submission.to_csv('submission.csv',index=False)

Percent complete: 0.0%
Percent complete: 16.1%
Percent complete: 32.21%
Percent complete: 48.309999999999995%
Percent complete: 64.42%
Percent complete: 80.52%
Percent complete: 96.61999999999999%

# Check some of the submissions
submission.head(25)

# Compare the submission's distribution to the training data's distribution.
# Given that the data was randomly split, 
# a more equal distribution should lead to better scores in the Kaggle competition.
submission.country.value_counts()

US       62096
NDF      62074
FR       61792
other    61527
IT       45328
GB       15275
ES        1363
NL         679
AU         280
DE          60
CA           3
PT           3
Name: country, dtype: int64

train.country_destination.value_counts()

NDF      124543
US        62376
other     10094
FR         5023
IT         2835
GB         2324
ES         2249
CA         1428
DE         1061
NL          762
AU          539
PT          217
Name: country_destination, dtype: int64

Summary¶

Based on Kaggle's evaluation method (https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings#evaluation), the neural network scores just under 0.86 when all of the training data is used. The winning submission scored 0.88697 and the sample submission scored 0.68411. Looking at the leaderboard and some kernels, the most common algorithm for the better scores was XGBoost. Given that the purpose of this analyis was to further my knowledge of TensorFlow (in addition to the other aspects of machine learning - i.e. feature engineering, cleaning data, etc.), I do not feel the need to use XGBoost to try to make a better prediction. I am rather pleased with this model on a whole, given its ability to accurately predict which country a user will make his/her first trip in. The 'lazy' prediction method would be to use the top and top 5 most common countries for the predictions. This would equal an accuracy score of 58.35% for the top predictions and 95.98% for the top 5 predictions. For the testing data, my top predictions scored a higher accuracy of 62.37%, as well as for the top 5 predictions, at 95.99%. My predictions are also more useful since they make use of all twelve countries, instead of just the five most common.

	user_id	action_count	duration
0	lvs98g7ggz	12	113525.5
1	9hue70lsfi	22	439490.0
2	8wqf53khcc	468	7468072.0
3	q7jew74rm9	50	363874.0
4	f2t2nbphmv	43	3340242.0

	id	date_account_created	timestamp_first_active	date_first_booking	gender	age	signup_method	language	affiliate_channel	affiliate_provider	first_affiliate_tracked	signup_app	first_device_type	first_browser
0	5uwns89zht	2014-07-01	20140701000006	NaN	FEMALE	35.0	facebook	en	direct	direct	untracked	Moweb	iPhone	Mobile Safari
1	jtl0dijy2j	2014-07-01	20140701000051	NaN	-unknown-	NaN	basic	en	direct	direct	untracked	Moweb	iPhone	Mobile Safari
2	xx0ulgorjt	2014-07-01	20140701000148	NaN	-unknown-	NaN	basic	en	direct	direct	linked	Web	Windows Desktop	Chrome
3	6c6puo6ix0	2014-07-01	20140701000215	NaN	-unknown-	NaN	basic	en	direct	direct	linked	Web	Windows Desktop	IE
4	czqhjk3yfe	2014-07-01	20140701000305	NaN	-unknown-	NaN	basic	en	direct	direct	untracked	Web	Mac Desktop	Safari

	id	date_account_created	timestamp_first_active	date_first_booking	gender	age	signup_method	signup_flow	language	affiliate_channel	affiliate_provider	first_affiliate_tracked	signup_app	first_device_type	first_browser	country_destination
0	gxn3p5htnn	2010-06-28	20090319043255	NaN	-unknown-	NaN	facebook	0	en	direct	direct	untracked	Web	Mac Desktop	Chrome	NDF
1	820tgsjxq7	2011-05-25	20090523174809	NaN	MALE	38.0	facebook	0	en	seo	google	untracked	Web	Mac Desktop	Chrome	NDF
2	4ft3gnwmtx	2010-09-28	20090609231247	2010-08-02	FEMALE	56.0	basic	3	en	direct	direct	untracked	Web	Windows Desktop	IE	US
3	bjjt8pjhuk	2011-12-05	20091031060129	2012-09-08	FEMALE	42.0	facebook	0	en	direct	direct	untracked	Web	Mac Desktop	Firefox	other
4	87mebub9p4	2010-09-14	20091208061105	2010-02-18	-unknown-	41.0	basic	0	en	direct	direct	untracked	Web	Mac Desktop	Chrome	US

	user_id	action	action_type	action_detail	device_type	secs_elapsed
0	d1mm9tcy42	lookup	NaN	NaN	Windows Desktop	319.0
1	d1mm9tcy42	search_results	click	view_search_results	Windows Desktop	67753.0
2	d1mm9tcy42	lookup	NaN	NaN	Windows Desktop	301.0
3	d1mm9tcy42	search_results	click	view_search_results	Windows Desktop	22141.0
4	d1mm9tcy42	lookup	NaN	NaN	Windows Desktop	435.0
5	d1mm9tcy42	search_results	click	view_search_results	Windows Desktop	7703.0
6	d1mm9tcy42	lookup	NaN	NaN	Windows Desktop	115.0
7	d1mm9tcy42	personalize	data	wishlist_content_update	Windows Desktop	831.0
8	d1mm9tcy42	index	view	view_search_results	Windows Desktop	20842.0
9	d1mm9tcy42	lookup	NaN	NaN	Windows Desktop	683.0

	action	action_count	action_detail	action_type	affiliate_channel	affiliate_provider	age	apple_device	avg_duration	country_destination	date_account_created	date_first_booking	desktop_device	device_type	duration	first_affiliate_tracked	first_browser	first_device_type	gender	id	language	max_duration	min_duration	mobile_device	signup_app	signup_flow	signup_method	timestamp_first_active	unique_action_details	unique_action_types	unique_actions	unique_device_types	user_id
0	show	127.0	p3	view	sem-non-brand	google	62.0	0.0	27283.503937	other	2014-01-01	2014-01-04	1.0	Windows Desktop	3465005.0	omg	Chrome	Windows Desktop	MALE	d1mm9tcy42	en	606881.0	2.0	0.0	Web	0	basic	20140101000936	10.0	7.0	17.0	2.0	d1mm9tcy42
1	show	12.0	p3	view	direct	direct	NaN	1.0	24522.125000	NDF	2014-01-01	NaN	1.0	Mac Desktop	294265.5	untracked	Firefox	Mac Desktop	-unknown-	yo8nz8bqcq	en	115983.0	36.0	0.0	Web	0	basic	20140101001558	8.0	4.0	7.0	1.0	yo8nz8bqcq
2	create	16.0	-unknown-	-unknown-	sem-brand	google	NaN	0.0	71887.156250	NDF	2014-01-01	NaN	1.0	Windows Desktop	1150194.5	omg	Firefox	Windows Desktop	-unknown-	4grx6yxeby	en	336801.0	53.0	0.0	Web	0	basic	20140101001639	8.0	6.0	13.0	2.0	4grx6yxeby
3	ajax_refresh_subtotal	160.0	view_search_results	click	direct	direct	NaN	0.0	24384.262500	NDF	2014-01-01	NaN	1.0	Windows Desktop	3901482.0	linked	Chrome	Windows Desktop	-unknown-	ncf87guaf0	en	732296.0	0.0	0.0	Web	0	basic	20140101002146	13.0	7.0	19.0	3.0	ncf87guaf0
4	index	8.0	-unknown-	-unknown-	direct	direct	NaN	1.0	2163.187500	GB	2014-01-01	2014-01-02	0.0	iPhone	17305.5	untracked	-unknown-	iPhone	-unknown-	4rvqpxoh3h	en	14750.5	21.0	1.0	iOS	25	basic	20140101002619	1.0	1.0	7.0	1.0	4rvqpxoh3h

	action_count	age	apple_device	avg_duration	desktop_device	duration	max_duration	min_duration	mobile_device	signup_flow	timestamp_first_active	unique_action_details	unique_action_types	unique_actions	unique_device_types	action_count_quartile	age_group	duration_group	avg_duration_group	signup_flow_simple	holiday_account_created	business_day_first_booking	action_ajax_refresh_subtotal	action_create	action_index	action_show	action_detail_-unknown-	action_detail_p3	action_detail_view_search_results	action_type_-unknown-	action_type_click	action_type_view	affiliate_channel_direct	affiliate_channel_sem-brand	affiliate_channel_sem-non-brand	affiliate_provider_direct	affiliate_provider_google	device_type_Mac Desktop	device_type_Windows Desktop	device_type_iPhone	first_affiliate_tracked_linked	first_affiliate_tracked_omg	first_affiliate_tracked_untracked	first_browser_-unknown-	first_browser_Chrome	first_browser_Firefox	first_device_type_Mac Desktop	first_device_type_Windows Desktop	first_device_type_iPhone	gender_-unknown-	gender_MALE	language_en	signup_app_Web	signup_app_iOS	signup_method_basic	year_account_created_2014	month_account_created_1	weekday_account_created_2	year_first_booking_2009.0	year_first_booking_2014.0	month_first_booking_0.0	month_first_booking_1.0	weekday_first_booking_0.0	weekday_first_booking_4.0	weekday_first_booking_6.0
0	0.046623	0.772152	0.0	0.029229	1.0	0.089969	0.337160	0.000039	0.0	0.0	0.983616	0.181818	0.636364	0.232877	0.333333	1.00	1.0	1.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0
1	0.004405	0.405063	1.0	0.026270	1.0	0.007641	0.064436	0.000704	0.0	0.0	0.983616	0.145455	0.363636	0.095890	0.166667	0.25	0.0	0.5	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	1.0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	1.0	0.0	0.0	1.0	0.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0
2	0.005874	0.405063	0.0	0.077012	1.0	0.029865	0.187114	0.001037	0.0	0.0	0.983616	0.145455	0.545455	0.178082	0.333333	0.25	0.0	1.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0
3	0.058737	0.405063	0.0	0.026123	1.0	0.101302	0.406836	0.000000	0.0	0.0	0.983616	0.236364	0.636364	0.260274	0.500000	1.00	0.0	1.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0
4	0.002937	0.405063	1.0	0.002317	0.0	0.000449	0.008195	0.000411	1.0	1.0	0.983616	0.018182	0.090909	0.095890	0.166667	0.25	0.0	0.5	0.5	1.0	1.0	1.0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	1.0	0.0	0.0	0.0	0.0	1.0	1.0	0.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0

	country_destination_GB	country_destination_NDF	country_destination_other
0	0.0	0.0	1.0
1	0.0	1.0	0.0
2	0.0	1.0	0.0
3	0.0	1.0	0.0
4	1.0	0.0	0.0

	action	secs_elapsed
0	10	47320.5
1	11	72764.0
2	12	180407.0
3	15	54223.0
4	about_us	18627.5

	country_destination_GB	country_destination_NDF	country_destination_other
0	0.0	0.0	1.0
1	0.0	1.0	0.0
2	0.0	1.0	0.0
3	0.0	1.0	0.0
4	1.0	0.0	0.0

	country_destination_GB	country_destination_NDF	country_destination_other
0	0.0	0.0	1.0
1	0.0	1.0	0.0
2	0.0	1.0	0.0
3	0.0	1.0	0.0
4	1.0	0.0	0.0