A new big data management system called Data Civilizer aims to simplify the process of aggregating large datasets for analysis.
Data Civilizer was developed by an international team of computer scientists led by researchers from the Massachusetts Institute of Technology. The system automatically identifies connections between different data sources and enables users to perform queries across all of the available data.
Data Civilizer first analyzes every data column of every table at its disposal, and then produces a statistical summary of the data in each column, such as the frequency and range of values and words. Data Civilizer also keeps a master index of every word occurring in every table. All of the column summaries are compared against each other, and the system identifies pairs of columns that appear to have similarities.
Every pair of columns is assigned a similarity score and is integrated into a map that traces the connections between individual columns and tables. When a user composes a query, Data Civilizer scans the map to find any related data, and the results of the queries can be saved as new data files.
Merck currently is exploring how to use the system to organize its chemical-biology datasets, which link chemical compounds, diseases, and drug targets.
From MIT News
View Full Article
Abstracts Copyright © 2017 Information Inc., Bethesda, Maryland, USA