Visions: An Open-Source Library for Semantic Data

Abstract

Many common data workflows such as loading tabular data from plain text files, data compression, and machine learning data processing rely on semantically meaningful representations of the data’s type. Most type inference algorithms, including those used by pandas and within the tidyverse employ rule-based heuristics tightly coupled to the machine type implementation used by the library. In practice these two representations are distinct. For example, while the set of real numbers between 0 and 1 are stored on the computer disk as float, their semantics might instead be a probability. Visions is an expressive, user-configurable framework for capturing the semantic relations between data types forming a development bedrock supporting a range of potential applications.

Publication
Journal of Open Source Software
Source Themes