Repository | Book | Chapter

How linked data can aid machine learning-based tasks

Michalis Mountantonakis, Yannis Tzitzikas

pp. 155-168

The discovery of useful data for a given problem is of primary importance since data scientists usually spend a lot of time for discovering, collecting and preparing data before using them for various reasons, e.g., for applying or testing machine learning algorithms. In this paper we propose a general method for discovering, creating and selecting, in an easy way, valuable features describing a set of entities for leveraging them in a machine learning context. We demonstrate the feasibility of this approach by introducing a tool (research prototype), called (mathtt{LODsyndesis}_mathcal{ML}), which is based on Linked Data technologies, that (a) discovers automatically datasets where the entities of interest occur, (b) shows to the user a big number of useful features for these entities, and (c) creates automatically the selected features by sending SPARQL queries. We evaluate this approach by exploiting data from several sources, including British National Library, for creating datasets in order to predict whether a book or a movie is popular or non-popular. Our evaluation contains a 5-fold cross validation and we introduce comparative results for a number of different features and models. The evaluation showed that the additional features did improve the accuracy of prediction.

Publication details

DOI: 10.1007/978-3-319-67008-9_13

Full citation:

Mountantonakis, M. , Tzitzikas, Y. (2017)., How linked data can aid machine learning-based tasks, in J. Kamps, G. Tsakonas, Y. Manolopoulos, L. Iliadis & I. Karydis (eds.), Research and advanced technology for digital libraries, Dordrecht, Springer, pp. 155-168.

This document is unfortunately not available for download at the moment.