Introduction

During the competition, my team and I developed an innovative hybrid model that integrates statistical analysis with machine learning algorithms to predict and recommend songs tailored to users based on their listening history.

Method

As illustrated in the diagram, the data is divided into training and testing sets. Since users exhibit different behaviors, we treat them differently based on their historical records.

We observed that users may share playlists, so it is accurate to directly predict the same last five songs for sessions with the same first 20 songs. Consequently, sessions in the testing data that have the same first 20 songs will receive the same five songs as the initial prediction.

After this process, the remaining testing data sessions whose first 20 song IDs do not match any others will be analyzed based on their listening patterns. We assume there are two types of users: those who replay songs they've just listened to, and those who prefer to explore new songs. Therefore, we calculate the ratio of replayed songs in the testing data. If the ratio is high, we base the prediction on the previous 20 songs. In the diagram, a sequential prediction method is applied to generate predictions based on the frequency of song titles appearing in the first 20 songs. For data with little repetition, we apply a collaborative filtering model, leveraging other users' data to predict the current session's next songs.