Constructing

1. DVN-Collection

DVN supports quick and easy access to multivariate data in order to address access problems arising from differences in storage methods and formats between different data sources

Supported data sources: government, financial institutions, operators, e-commerce providers, the Internet, vertical industries, enterprises, individuals, etc.

Supported methods of data collection from data sources: files, databases (relational databases, NoSQL databases, distributed databases, etc.), APIs, data stream, FTP, crawlers, etc.

Supported data formats for parsing: text, JSON, XML, unstructured data, custom parsers, etc.

2. DVN-Cleaning

For suspicious data as well as error values, missing values, and abnormal values in data, DVN applies multi-level “cleansing” to solve the problems of data preprocessing and quality improvement through concatenating fields to form formatted data.

The process also includes data desensitization, data consistency checks, data quality assessment, etc. In addition, virtual data governance positions are set up to attract organizations or individuals to get involved in the process of data quality.

3. DVN-Fusion

DVN integrates multiple data sources to achieve cross-data-source tag calculation and dimensionality expansion to solve the problems of multi-data-source aggregation and integration.

The data integration of DVN achieves cross-data source tag calculation, complementing another data source with the feature of a data source, to more fully describe the true properties of the data. At the same time, cross-validation of multi-source data can be performed to further assess the data quality of the data source and evaluate the data integration. The integrated data is aggregated into a multidimensional Data Cube for later use.

4. DVN-DataApp

DVN data applications adopt the formats of data identification results, which are automatically encapsulated into different forms of data application services.

 

The calculated tag image data is saved in the format of a Key-Value, and the system imports it into a distributed database, providing API for external real-time queries. The Data Cube provides computing services for external batch jobs。