Recommender systems (RS), which aim to match users with their interested items, have played an important role in various online applications nowadays. Traditional recommendation algorithms mainly focus on learning effective preference models from historical user-item interaction data, e.g., matrix factorization . With the rapid development of Web technologies, various kinds of side information have become available in RSs . At an early stage, the used context information is usually unstructured, and its availability is limited to specific data domains or platforms.
More and more efforts have been made recently by both research and industry communities for structuring world knowledge or domain facts in a variety of data domains. One of the most typical organization forms is knowledge base (KB) . KBs provide a general and unified way to organize and associate information entities, which have been shown to be useful in many applications. For instance, KBs have been used in recommender systems, called knowledge-aware recommender systems . To develop a knowledge-aware recommender system, a key issue is how to obtain rich and structured KB information for RS items. Overall, there are two main solutions from existing studies. First, side information has been collected from the RS platform and used as contextual features [5, 6, 7, 8, 9], and some studies further construct tiny and simple KB-like knowledge structure [10, 11, 12]. The number of attributes or relations is usually small, and much useful item information is likely to be missing. Second, several works propose to link RS with private KBs [13, 14, 15]. The linkage results are not publicly available. We are also aware of some closely related studies [16, 17], which aim to link RS items with DBpedia entities. By comparsion, our focus is on Freebase  and YAGO , which are now widely used in many nature language processing (NLP) or related domains [20, 21, 22].
To address the need for the linked data set of RS and KBs, we present a data set which links two public KBs with recommender systems, named KB4Rec v1.0, freely available at https://github.com/RUCDM/KB4Rec. Our basic idea is to heuristically link items from RSs with entities from public large-scale KBs. On the RS side, we select three widely used data sets (i.e., MovieLens , LFM-1b  and Amazon book ) covering three different data domains, namely movie, music and book; on the KB side, we select the two well-known KBs (i.e., Freebase and YAGO). We try to maximize the applicability of our linked data set by selecting very popular RS data sets and KBs. We do not share the original data sets, since they are maintained by original researchers or publishers. These original copies are easily accessible online.
In our KB4Rec v1.0 data set, we have organized the linkage results as linked ID pairs, which consists of a RS item ID and a KB entity ID. All the IDs are inner values from the original data sets. Once such a linkage has been accomplished, it is able to reuse existing large-scale KB data for RSs. For example, the movie “Avatar” from MovieLens data set  has a corresponding entity entry in Freebase, and we are able to obtain its attribute information by retrieving all its associated relation triples in Freebase. Based on the linked data set, we first preform some qualitative analysis experiments, and then we discuss the effect of two important factors (i.e., popularity and recency) on whether a RS item can be linked to a KB entity. Finally, we compare several knowledge-aware recommendation algorithms on our linked data set.
With our linkage results and original data copies, it is easy to develop an evaluation set for knowledge-aware recommendation algorithms. We believe such a data set is beneficial to the development of knowledge-aware recommender systems.