这一篇咱们从基础的深度ctr模型谈起。我很喜欢Wide&Deep的框架感受以后不少改进均可以归入这个框架中。Wide负责样本中出现的频繁项挖掘,Deep负责样本中未出现的特征泛化。然后续的改进要么用不一样的IFC让Deep更有效的提取特征交互信息,要么是让Wide更好的记忆样本信息html
如下代码针对Dense输入感受更容易理解模型结构,其余针对spare输入的模型和完整代码 👇
https://github.com/DSXiangLi/CTRpython
点击率模型最初在深度学习上的尝试是从简单的MLP开始的。把高维稀疏的离散特征作Embedding处理,而后把Embedding拼接做为MLP的输入,通过多层全联接神经网络的非线性变换获得对点击率的预测。git
不知道你是否也像我同样困惑过,这个Embedding+MLP究竟学到了什么信息?MLP的Embedding和FM的Embedding学到的是一样的特征交互信息么?最近从大神那里听到一个蛮有说服力的观点,固然keep skeptical,欢迎一块儿讨论~
mlp能够学到全部特征低阶和高阶的信息表达,但依赖庞大的搜索空间。在样本有限,参数也有限的状况下每每只能学到有限的信息。所以才依赖于基于业务理解的特征工程来帮助mlp在有限的空间下学到更多有效的特征交互信息。FM的向量内积只是二阶特征工程的一种方法。以后针对deep的不少改进也是在探索如何把特征工程的业务经验用于更好的提取特征交互信息github
def build_features(numeric_handle): f_sparse = [] f_dense = [] for col, config in EMB_CONFIGS.items(): ind = tf.feature_column.categorical_column_with_hash_bucket(col, hash_bucket_size = config['hash_size']) one_hot = tf.feature_column.indicator_column(ind) f_sparse.append(one_hot) # Method1 for numeric feature if numeric_handle == 'bucketize': # Method1 'onehot': bucket to one hot for col, config in BUCKET_CONFIGS.items(): num = tf.feature_column.numeric_column( col ) bucket = tf.feature_column.bucketized_column( num, boundaries=config ) f_sparse.append(bucket) else : # Method2 'dense': concatenate with embedding for col, config in BUCKET_CONFIGS.items(): num = tf.feature_column.numeric_column( col ) f_dense.append(num) return f_sparse, f_dense @tf_estimator_model def model_fn(features, labels, mode, params): sparse_columns, dense_columns = build_features(params['numeric_handle']) with tf.variable_scope('EmbeddingInput'): embedding_input = [] for f_sparse in sparse_columns: sparse_input = tf.feature_column.input_layer(features, f_sparse) input_dim = sparse_input.get_shape().as_list()[-1] init = tf.random_normal(shape = [input_dim, params['embedding_dim']]) weight = tf.get_variable('w_{}'.format(f_sparse.name), dtype = tf.float32, initializer = init) embedding_input.append( tf.matmul(sparse_input, weight) ) dense = tf.concat(embedding_input, axis=1, name = 'embedding_concat') # if treat numeric feature as dense feature, then concatenate with embedding. else concatenate wtih sparse input if params['numeric_handle'] == 'dense': numeric_input = tf.feature_column.input_layer(features, dense_columns) numeric_input = tf.layers.batch_normalization(numeric_input, center = True, scale = True, trainable =True, training = (mode == tf.estimator.ModeKeys.TRAIN)) dense = tf.concat([dense, numeric_input], axis = 1, name ='numeric_concat') with tf.variable_scope('MLP'): for i, unit in enumerate(params['hidden_units']): dense = tf.layers.dense(dense, units = unit, activation = 'relu', name = 'Dense_{}'.format(i)) if mode == tf.estimator.ModeKeys.TRAIN: dense = tf.layers.dropout(dense, rate = params['dropout_rate'], training = (mode==tf.estimator.ModeKeys.TRAIN)) with tf.variable_scope('output'): y = tf.layers.dense(dense, units=1, name = 'output') return y
Wide&Deep是在上述MLP的基础上加入了Wide部分。做者认为Deep的部分负责generalization既样本中未出现模式的泛化和模糊查询,就是上面的Embedding+MLP。wide负责memorization既样本中已有模式的记忆,是对离散特征和特征组合作Logistics Regression。Deep和Wide一块儿进行联合训练。网络
这样说可能不彻底准确,做者在文中也提到wide部分只是用来锦上添花,来帮助Deep增长那些在样本中频繁出现的模式在预测目标上的区分度。因此wide不须要是一个full-size模型,而更多须要业务上判断比较核心的特征和交叉特征。app
ctr模型大可能是在探讨稀疏离散特征的处理,那连续特征应该怎么处理呢?有几种处理方式框架
连续特征离散化的优缺点
缺点dom
优势ide
def znorm(mean, std): def znorm_helper(col): return (col-mean)/std return znorm_helper def build_features(): f_onehot = [] f_embedding = [] f_numeric = [] # categorical features for col, config in EMB_CONFIGS.items(): ind = tf.feature_column.categorical_column_with_hash_bucket(col, hash_bucket_size = config['hash_size']) f_onehot.append( tf.feature_column.indicator_column(ind)) f_embedding.append( tf.feature_column.embedding_column(ind, dimension = config['emb_size']) ) # numeric features: both in numeric feature and bucketized to discrete feature for col, config in BUCKET_CONFIGS.items(): num = tf.feature_column.numeric_column(col, normalizer_fn = znorm(NORM_CONFIGS[col]['mean'],NORM_CONFIGS[col]['std'] )) f_numeric.append(num) bucket = tf.feature_column.bucketized_column( num, boundaries=config ) f_onehot.append(bucket) # crossed features for col1,col2 in combinations(f_onehot,2): # if col is indicator of hashed bucuket, use raw feature directly if col1.parents[0].name in EMB_CONFIGS.keys(): col1 = col1.parents[0].name if col2.parents[0].name in EMB_CONFIGS.keys(): col2 = col2.parents[0].name crossed = tf.feature_column.crossed_column([col1, col2], hash_bucket_size = 20) f_onehot.append(tf.feature_column.indicator_column(crossed)) f_dense = f_embedding + f_numeric #f_dense = f_embedding + f_numeric + f_onehot f_sparse = f_onehot #f_sparse = f_onehot + f_numeric return f_sparse, f_dense def build_estimator(model_dir): sparse_feature, dense_feature= build_features() run_config = tf.estimator.RunConfig( save_summary_steps=50, log_step_count_steps=50, keep_checkpoint_max = 3, save_checkpoints_steps =50 ) dnn_optimizer = tf.train.ProximalAdagradOptimizer( learning_rate= 0.001, l1_regularization_strength=0.001, l2_regularization_strength=0.001 ) estimator = tf.estimator.DNNLinearCombinedClassifier( model_dir=model_dir, linear_feature_columns=sparse_feature, dnn_feature_columns=dense_feature, dnn_optimizer = dnn_optimizer, dnn_dropout = 0.1, batch_norm = False, dnn_hidden_units = [48,32,16], config=run_config ) return estimator
https://github.com/DSXiangLi/CTR学习