申请试用
HOT
登录
注册
 
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs

Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs

Spark开源社区
/
发布于
/
8456
人观看
Building propensity models at Zynga used to be a time-intensive task that required custom data science and engineering work for every new model. We’ve built an automated model pipeline that uses PySpark and feature generation to automate this process. The challenge that we faced was that the Featuretools library that we wanted to use for automated feature engineering works only on Pandas data frames, limiting the size of data sets that we could handle. Our solution to this problem is to use Pandas UDFs to scale the feature engineering process to our entire player base. We start with our full set of players, partition the data into smaller chucks that can be loaded into memory, apply the feature engineering step on these subsets of data, and then combine the results back into one large data set. This presentation will outline how we use Pandas UDFs in production to automate propensity modeling at Zynga. The outcome of this approach is that we now have hundreds of propensity models in production that teams can use to personalize game experiences. Instead of spending time on feature engineering and model fitting, our data scientists are now spending more of their time engaging with game teams to help build new features.
0点赞
0收藏
3下载
确认
3秒后跳转登录页面
去登陆