Virtual Data Container

VDC - A new way of visualising Data and offering them as products & services

View on GitHub

Welcome to Virtual Data Container's Official Page

Here you will have all the access to the VDC's code, as well as their latest releases. hi

What is a VDC?

The Virtual Data Container (shortly refered to as "VDC") represents a new way of presenting the data. As the modern world constantly evolves, data are becoming a really valuable asset. During the past decade, the total amount of data generated by the world has skyrocketed. It seems only reasonable to move on to new ways of handling data, but also presenting / selling them as products. What if we were able to choose among various services, applications that suit our needs, simply presented and easily organised inside a virtual container? What if, choosing the best apps and services was just as simple as choosing a product from the Super Market? The Virtual Data Container aims to become exactly that. A new way of presenting data, services and applications, as mentioned earlier.

In a few words, a Virtual Data Container could be a set of data, a web service, an operating system, an application etc., organised in a virtual container image and presented to the potential buyers / users. This is the future, transforming existing operations and datasets to services and making them easily accesible to the end users. A VDC is versatile, easily upgradable, and can be tailored to a specific customers' needs. However, in case a VDC is structured as a concrete and preset / prefixed service, potential users can find out if a its good for them or not.

As data can be distributed among resources both on the Cloud and the Edge, Virtual Data Containers (the VDCs) are proposed as a mean for timely and securely offering data also transparently with respect to their location and format. In more detail, a VDC:

Design & Development

hi The first step of an application life-cycle concerns the work performed by a data administrator (a.k.a. data provider) who, based on the managed data sources, creates a VDC JSON-Schema (“Artifact” or “Blueprint”) which specifies the characteristics of a VDC in terms of following: In this GitHub repository, we provide a VDC Schema creator, for the data providers to be able to easily introduce themselves to the VDC “network”. That we will analyse later on.

Following the Service-Oriented Computing principles, the visibility principle requires to publish a description of a service to make it visible to all the potential users. As a consequence, the data administrator publishes the VDC Artifact. Once published, the developers come into play. As the information included in a VDC Blueprint-Artifact also concerns functional and non-functional aspects, a developer relies on this information to select the most suitable VDC according to its purposes. It is worth noticing that, based on the nature of his/her needs, the developer could select different VDCs referring to different purposes. Finally, the developer designs and develops the software and deploys it on the available resources which can be located on the edge or in the cloud. The initial deployment is the key element in the approach; as in this phase, it is required to know which are all the possible resources on which the VDC can be executed. A standard Fog environment implies that DaaS can be provided using resources belonging to both the provider and the consumer. Without loss of generality, we can assume that the provider resources are always in the cloud, while the consumer resources are always on the edge. In this way, a VDC living in the cloud has more capacity and it probably lives close to the data source to which it is connected. Conversely, a VDC living on the edge has the advantage of living closer to the user, thus reducing latency when providing the requested data. Deciding where to deploy the VDC depends on the resources required by the VDC (e.g., it might happen that the amount of resources to process the data before making them available to the user cannot be provided at the edge), the network characteristics (e.g., the connection at the consumer side can support a high-rate transmission), and security (e.g., not all the data can be moved to the consumer side, thus even the processing cannot be placed at the edge).

Execution

hi

The VDC Deployment Environment can be built on top of a Kubernetes cluster, or a Docker / Docker Swarm cluster. In fact, given a VDC Blueprint / Artifact, based on the cookbook section, a docker container is generated and deployed. Furthermore, given a VDC Blueprint-Artifact, many application developers can select it for their own application. As a consequence, several applications can operate with different VDCs. Moreover, as the same VDC Blueprint can be adopted in different applications, each of these applications includes instances generated from the VDC Artifact; thus, they are connected to the same data sources. Thanks to the abstraction layer provided by the VDC, applications deployed through the platform can access the required data regardless of their nature and location (cloud or edge). Due to the distributed nature of the applications to be managed, to the execution environment being distributed by definition and because of the different computational power offered by the devices, it might happen that only a subset of the modules can be installed on a specific edge device. For this reason, at deployment time, not only the data-intensive application is distributed over the cloud and edge federations, but also the execution environment is properly deployed and configured to support the data and computation movement. The decision on where to locate both the application and the data required by the application itself is taken at design time, but can be updated during the application execution, according to the detected state of the execution environment.

How can I know which VDC best suits my needs?

For a VDC's operation to be fully understandable by potential users / buyers, each and every one will have a unique JSON formatted document, named "Blueprint" or "Artifact". This Blueprint is divided into four main sections (one can extend or reduce those sections). Each section refers to a specific group of people, for each group to better understand the VDC's structure and way of operation. The first section is intended to provide general information to one of the company’s decision strategy makers, usually the one responsible for it’s vision. The second one should provide all the information needed for a company’s executive who has primary responsibility for managing it’s finances and all the issues of financial nature. The third section should refer to the company’s executive whose main, primary focus is on scientific and technological issues within the organization. Last but not least, the fourth section should be written with the company’s developers in mind. They shall have access to all the vital information for their knowledge upon the VDC to be thorough.

The VDC Artifact-Blueprint

As we already mentioned, the original VDC Blueprint is divided into four main sections. Let us see the sections with more detail:

First Artifact Section

Type of attributes that could be included in the JSON Blueprint are:
The first section of the Artifact records general information related to the VDC composition system, like a business description of it, legislative compliances, licences, the category and an abstract reference to its inputs and outputs. It should properly introduce the reader and potential buyer to the VDC (the service composed software), analysing all the basic and key elements. This section should not detail specialized VDCs’ sectors, for such a description will be cited in later sections of the Artifact. The first section’s main aim is to provide all the information needed for the decision maker (like a company’s senior officer) to consider acquiring the VDC.

Second Artifact Section

Type of attributes that could be included in the JSON Schema from this section are the following two:
The second section of the Artifact deals with the pricing of the VDC as a whole and all the financial matters regarding it. Mentioned many times earlier, the VDC is composed of a series of different microservices, combined under one (or more) stable API. Such an architecture makes it mandatory for a specific pricing model to be constructed. Although one model could be enough to explain the pricing dependencies in a company’s executive, it seems a better idea to implement two kinds of pricing schemes in order to give a better view for the total cost needed to acquire the VDC.

Third Artifact Section

There are few attributes that could be included in the JSON Blueprint from this section:
The third Artifact section includes an abstract technical overview regarding the versioning, the libraries, the scalability, the limitations and the supporting databases for each service inside the VDC. Furthermore, it analyzes the VDC's full deployment sequence, such as the dependencies between microservices and, of course, the sequence itself. In addition, this section will list the cookbooks for each service inside the VDC. Last but not least, it will analyse all the Artifact orchestrating tools with a list of deployment and orchestration ones. This way, a potential buyer would get the “full picture” of the VDC’s technical requirements.

Fourth Artifact Section

Type of attributes that could be included to the Artifact from this fourth and final section are the following:
The fourth Artifact section refers to the developers of a company. The ones responsible for the proper function of all the software inside a business corporation are the developers. They make sure that everything works according to the plan and ensure the stability for every kind of product is running on a computer. This means that the team will have to be in a standby mode every single hour of the day, for the company’s software to always work properly. In the case of the VDC, the developers team will have to deeply understand how it works. This section shall explain all the components of the VDC in order to give the developers a complete overview.

You can see an example of the four sections combined (in pseudo JSON structure) below:
hi
Of course, one can extend or reduce the sections of his/her Blueprint. It is up to the provider to create the proper Blueprint-Artifact. However, the aforementioned four sections are considered as the main ones, since they include a broad variety of information regarding the VDC and its internal structure.

Blueprint-Artifact JSON Schemas / Implementations

In order to have a better understanding regarding the Blueprint-Artifact JSON Schema as a concept, it seems only appropriate to examine some examples (two in particular - a third one coming soon). You can browse the DITAS, Generic and DataPorts (last one still in progress) VDC Implementations, in order to take a further look at the vision of VDC and its schema architecture. These schemas share many similarities, which can be considered as a positive characteristic of the Blueprint Schemas. The more similar they are, the better for anyone that chooses to learn more about the VDC and its architecture. All schemas can be found under the “jsonSchemas” folder, in the main repository.