The OpenWater and PureMemory prototypes

Design Team Members: Mitchell Dickinson, Pirathayini Srikantha, Rishi Ramraj, andYichun Li

Supervisor: Professor Khuzaima Daudjee

Background

OpenWater is a distributed file management system that allows users to share data with ease and efficiency. This system is unique, offering features like contextual queries based on the adaptive learning of user interactions with the system. No additional hardware is required, as physical storage space available in local nodes of a network will be leveraged - thus saving expenses. As information is readily available, collaboration will increase, causing a direct impact on productivity.

Project description

OpenWater is an advanced file management system designed for small to medium sized businesses. This system offers a unique set of features that allow effective data collaboration - bringing corporate data to users' fingertips. While existing data management and collaboration tools like wikis and content management systems provide some desired functionality, limitations exist. Users belonging to the OpenWater system will be able to share files and perform contextual queries through a fully automated system.  Data committed to OpenWater does not require additional hardware to be stored. Instead, storage space available on local nodes is leveraged, significantly reducing administration and maintenance costs. Documents are stored redundantly to maximize data availability with fully integrated access rights to ensure that no unauthorized data access is permitted. OpenWater utilizes an adaptive learning mechanism to return relevant results for contextual searches. This algorithm intelligently analyzes queries made by users and other statistics over time to optimize relevancy of query result sets. These features do not require users to know the physical location of specific data beforehand.  OpenWater provides an integrated view of the knowledge resources along with an abstraction of complex interactions between data. This significantly increases collaboration and productivity within a company.

Design methodology

The OpenWater design has been refined iteratively and the following is the finalized design for implementation of the prototype. The prototype is codenamed PureMemory and will serve as the symposium demonstration of OpenWater. OpenWater consists of two layers: Daemon and The Alpha. Each layer is responsible for performing particular tasks executed by the user. These systems interact together to present a unified knowledge pool to its users as shown in the following figure.

diagram including all componenets of OpenWater and Pure memory

OpenWater is intended to provide a foundation on which intelligent information systems can be developed and configured. OpenWater's data storage engine is designed to install seamlessly on top of an existing network. 

Daemon

This data maintenance engine, codenamed Daemon, stores various data and contextual metadata.  Data types such as strings, long strings and binary files are supported by Daemon.  This data is stored as a distributed database on individual physical entities of the network. All data accessible through OpenWater can be thought of as a union of all content stored on local nodes of the network.  The Daemon provides the interface for interaction with the distributed database through actions that include adding new data, updating existing data, or querying the distributed database.  Relationships between data aggregated overtime are also stored.  The content of the entire distributed database system can be considered as a directed graph (digraph), on which data is represented by nodes, and data relationships are represented by directed arcs. 

The Alpha

The second component of OpenWater is codenamed The Alpha.  The goal of The Alpha is to extract understanding from the digraph so that a variety of problems can be addressed, such as: answering fuzzy questions, proposing a hypothesis, identification of risk associated with a question, and so on. Solutions to these problems can be used for applications such as returning relevant results for contextual queries performed by a user.  These problems can be addressed by extracting contextual data about the digraph.  This contextual data is maintained and updated via user interactions with the specific information system.  Consequently, the more The Alpha gets used, the more intelligent data will be stored on Daemon, and the better OpenWater will be at solving problems.

PureMemory

Various applications for the Alpha can be developed.  For the purpose of reducing the scope of the project, a research demo codenamed PureMemory is proposed as the specific implementation of OpenWater for the 2009 Systems Design Symposium.  PureMemory addresses the problem of finding a set of claims from a data pool that can support a given argument.  By limiting the final OpenWater application to serve only this problem, implementations of Daemon and The Alpha can be reduced in scope according to the requirements of PureMemory.