Relational Machine Learning

Our goal is to automate transformation of relational data (think about SQL database) into a single table of features, which can be used for classification, regression, clustering, outlier detection... And by this automation alleviate the biggest hurdle in the process of data mining - data preprocessing.

Thanks to Hadoop, Storm and Spark we know how to process big amount of data, even in real-time. These frameworks are even well prepared to work with complex data like text or images. But if our data don't conform one of the canonical forms (single table, graph, text or an image), we have a problem.

All canonical forms share a simple data structure: text is composed of words, images of pixels, tables of columns, graphs of nodes and edges. But relational data are one level above - they are made from tables, graphs (tables and foreign key constrains form a graph), text and images. That complicates the things so much that relational data are not considered as one of the canonical forms and they have to be manually transformed into a simpler, canonical form (commonly into a single table). This approach is:

  1. Slow (people have to eat and sleep)
  2. Error prone (people make mistakes)
  3. Inflexible (world is changing, but transformations remain the same) 
  4. Expensive (people have to eat)

Our goal is to automate transformation of relational data (think about SQL database) into a single table of features, which can be used for classification, regression, clustering, outlier detection... And by this automation alleviate the biggest hurdle in the process of data mining - data preprocessing.

You cannot improve what you cannot measure. Hence the development was divided into following measurable steps:

  1. Create a repository of relational data sets.
  2. Evaluate different approaches (in process of publication).
  3. User testing.
  4. Push the solution into the world.

Contact Jan Motl, if you are interested into testing the tool.

Follow Us

Copyright (c) Data Science Laboratory @ FIT CTU 2014–2016. All rights reserved.