Key components

Key Components of Native VRTogether Platform

Capture

3D Capture

Description: Volumetric Capture (VolCap) is a multi – RGB-D sensor 3D capturing, streaming and recording application. The toolset is designed as a distributed system where a number of processing units each manage and collect data from a single sensor using a headless application. A set of sensors is orchestrated by a centralized UI application that is also the delivery point of the connected sensor streams. Volumetric Reconstruction (VolReco) uses the data provided by the former application, in order to create locally, at first, a mesh and, eventually, transmit it.

Licenses:

Capturing:
- Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt
- Kinect SDK v2 License: https://download.microsoft.com/download/0/D/C/0DC5308E-36A7-4DCD-B299-B01CDFC8E345/Kinect-SDK2.0-EULA_en-US.pdf
Reconstruction:
- CUDA Eula license: http://docs.nvidia.com/cuda/eula/index.html
- Boost license: http://www.boost.org/LICENSE_1_0.txt
- MIT License (rendering): https://opensource.org/licenses/MIT
- OpenCV BSD license: https://opencv.org/license.html
- Flann BSD license: https://github.com/mariusmuja/flann/blob/master/COPYING

Links:

Capturing: https://github.com/VCL3D/VolumetricCapture
Reconstruction: Not available

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL5: Existing multi-view 3D Capturing and Reconstruction platform (with KinectV2)	TRL6: Updates on the existing multi-view 3D Capturing and Reconstruction platform (with KinectV2) for VRT end- product for performance improvements	TRL7: Integrated synchronization and streaming utilities for 3D data in the new 3D Capturing Platform (with Intel RealSense). TRL7: New Reconstruction software (with Intel RealSense)	TRL7: Integration of new Kinect 4 Azure RGBD Sensors in new 3D Capturing Platform. Capturing Platform will support different RGBD sensors. TRL7: Reconstruction pipeline with new Kinect Azure sensors Performance optimizations based on the reconstruction rate, targeting real-time performance. Performance optimizations are related with the increased bandwidth needs and the 3D data processing due to the increased depth resolution.
Features	Use of 4 KinectV2 sensors with: – Color resolution: 1280×720 – Depth resolution: 512×424	Use of 4 KinectV2 sensors with: – Color resolution: 1280×720 – Depth resolution: 512×424 Skeleton tracking	Use of 4 IntelRealsense D415 sensors with: – Color resolution: 1280×720 – Depth resolution: 320×180	Use of 4 Kinect Azure sensors: – Color resolution: 1280×720 – Depth resolution: 320×288
KPIs	Reconstruction rate: 5fps	Reconstruction rate: 9fps	Reconstruction rate: 22 fps

Simple Point Cloud Capture

Description

This component offers live capture and reconstruction of 3D point clouds using IntelRealSense D400 devices. The component can run with zero or more sensors. If no sensors are found a synthetic point cloud is generated, if multiple sensors are found the point clouds from each sensor are transformed and merged together

Links:

Github

Licenses:

To be Announced

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	N/A The component was not originally planned	TRL4: Prototype implementation using a single sensor	TRL5: Integration to operational pipeline. Optimization of memory use	TRL7: Optimization for viewport adaptive streaming
Features		Support for one Intel RealSense D400 sensor Synthetic pointcloud is generated if no sensors are detected	Support for zero or more sensors. Reference Implementation for testing with 4 sensors	Pointclouds captured will be segmented into spatial tiles along with relevant metadata Optimization of the trade-off between latency, visual quality and resource consumption
KPIs		Together with the compression module end to end pipeline is able to operate at 5 fps with 1 camera on commodity hardware	Together with the compression module end to end pipeline is able to operate at 15fps with 4 cameras on commodity hardware Self view latency of 300ms

Encoding

TVM encoding & transmission

Description:

The encoding component is responsible for the compression and decompression of the transmitted data. Over the years, CERTH has experimented with a variety of techniques and algorithms for this purpose (OpenCTM, Draco, Corto), but eventually ended up using Draco for performance reasons. Regarding the transmission, CERTH is using RabbitMQ, an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers.

Licenses:

Encoding:
- Libjpeg API library and associated programs: IJG (Independent JPEG Group) License (https://spdx.org/licenses/IJG.html)
- TurboJPEG API library and associated programs: Modified (3-clause) BSD License (https://opensource.org/licenses/BSD-3-Clause)
- Libjpeg-turbo SIMD extensions: zlib License (https://opensource.org/licenses/Zlib)
- Draco license: Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0.txt)
- Corto license: GNU LESSER GENERAL PUBLIC LICENSE (http://www.gnu.org/licenses/lgpl-3.0.en.html)

Transmission:
- RabbitMQ License: https://www.rabbitmq.com/mpl.html
- Boost license: http://www.boost.org/LICENSE_1_0.txt
- Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL5: OpenCTM encoding and simple RMQ transmission in existing multi-view 3D platform	TRL7: Integration of SoA algorithms (Corto, Draco) for mesh compression.	TRL7: Integration of mesh compression new features / updates and optimized RMQ using efficient task management (RxCpp) to handle messages with the most appropriate way for TVM transmission	TRL7: Hybrid distribution operability via RMQ and DASH for TVM transmission.
Features	Supporting OpenCTM profile 64x128x64	Supporting two new encoding profiles: libcorto and libdraco	Message handling with RxCpp
KPIs	OpenCTM profile 64x128x64: – Encoding: ~65ms TVM Bandwidth: ~5Mbps TVM pipeline end-to-end delay: ~225ms	libdraco profile 64x128x64: -Encoding: ~17ms libcorto profile 64x128x64: – Encoding: ~10ms TVM Bandwidth: ~2Mbps TVM pipeline end-to-end delay: ~183ms	libdraco profile:64x128x64: – Encoding: ~15ms libcorto profile:64x128x64: – Encoding: ~1ms TVM Bandwidth: ~3Mbps TVM pipeline end-to-end delay: 55ms

Point Cloud encoding & decoding

Description

This component offers a generic real-time dynamic point cloud codec for 3D immersive video. The component features low delay encoding and decoding at multiple levels of detail. Geometry coding is based on octree occupancy and attribute compression is based on existing image coding standards

Licenses:

To be Announced

Links:

Github

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL6: Reference implementation of core modules for encoding and decoding	TRL6: Ad-hoc integration of encoder and decoder into GStreamer based delivery pipeline	TRL7: Integration to operational pipeline. Optimization of memory use	TRL7: Optimization for parallelized multiple encoding
Features	Low delay encoding and decoding Lossy geometry coding using octree occupancy Lossy attribute coding using existing image coding standards	Integration for use with a GStreamer based delivery pipeline	Optimization of memory use and performance for dense pointclouds captured with 4 cameras	Optimization of codec to support multiple levels of detail for network adaptive streaming Parallelized Implementation to code point clouds segmented spatially into tiles for viewport adaptive streaming
KPIs	Encode times for Dimitris-2-Zippering Sequence (~320k points) OctreeDepth8: ~80ms OctreeDepth9: ~160ms OctreeDepth10:~260ms	Together with simple capturer end to end pipeline is able to operate at 5 fps with 1 camera on commodity hardware	Together with simple capturer end to end pipeline is able to operate at 15fps with 4 cameras on commodity hardware. End to end latency of 400 ms

Orchestration & Delivery

Media/Session Orchestrator

Description:

The VRT orchestrator, as a centralised server component, is primarily in charge of the management of a set of end-user profiles which are connected into a platform session. The orchestrator is able to control the various connection events in order to grant a unified experience and ensure that whole media pipelines are properly switched across the various connected nodes. It is also the reference point used for the synchronisation of the media pipelines. Then, from the backend viewpoint, the orchestrator is in charge of managing and supervising the transmission and the delivery of all of the media stream pipelines.

Licenses:

Core: MIT
UMTS: To be announced
LRTS: To be announced

	Status Year O	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL0	TRL2	TRL5	TRL6/7
Features	Non existing component at initial time	Static platform configuration	Native pipeline cloud components supervision PC & TVM pipelines supervision Live presenter supervision Users sessions management Socket.io audio transmission	Year 2 + Interactive events transmission and supervision Virtualisation pre-packaging ?
KPIs	NA	2 users managed statically for each session One session at a time	4 users managed dynamically Up to 5 sessions can run simultaneously 300 ms audio latency	10 users managed dynamically Up to 10 sessions can run simultaneously

RTMP Live Video Transmission - Gstreamer Unity Bridge

Description

Software component that connects GStreamer with Unity. It is able to play any media URI provided by GStreamer (1.X) pipelines into Unity 3D textures

Licenses:

Open-Source

Links:

Github

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL5 (Component developed in EU H2020 ImmersiaTV project)	(Not Used in the Project during Year 1)	TRL5	(Not Used in the Project during Year 3)
Features	– Ingest of DASH and RTMP streams (or other media URIs) from GStreamer into Unity		– Same features, but component updated to interact with newer versions of GStreamer.
KPIs	– Used for stored DASH streams – Smooth playout – No delays tests conducted so far		– End-to-End latency for a live RTMP stream: Latency ~1.5 s

(Point Cloud/Audio/Live) DASH Sender - bin2dash

Description

bin2dash is an open API allowing to package any volumetric data in the industry-standard MP4and then stream it using MPEG-DASH.

License:

Proprietary

Links:

Source Code Location: https://baltig.viaccess-orca.com:8443/VRT/nativeclient-group/EncodingEncapsulation

	Status Year O	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL4	TRL6	TRL7	TRL7
Features	Integrated component (plugged with capture and encoding)	New modular structure and new Metadata handling	Improved with more data types, more testing use-cases	Tiled Point Clouds. Support for TVMs
KPIs	Latency: 2 frames + internal threading	Latency: 1 frame + internal threading	Latency: 1 frame (a complete frame is needed for copy)	Latency: 1 frame (a complete frame is needed for copy)

(Point Cloud/Audio/Live) DASH Distributor - evanescent

Description

Evanescent is a server component that allows to receive any kind of DASH stream and simply forward them to other receiver clients. In this sense, Evanescent is an SFU (Stream Forward Unit). More technically, it receives low latency dash chunks and serves them back to a low latency dash receiver using http chunked transfers. Evanescent is implemented in C++ based on Signals sub-modules.

License:

Proprietary

Links:

Source Code Location: https://git.gpac-licensing.com/alaiwans/Evanescent

Documentation: LD_LIBRARY_PATH=$DIR $DIR/evanescent.exe [–tls] [–port port_number]

Installation guide: just copy the files, see details at https://baltig.viaccess-orca.com:8443/VRT/deliverymcu-group/DeliveryMCU/blob/master/README.md.

	Status Year O	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL0	TRL4	TRL5	TRL5
Features	NA	Basic system with HTTP and HTTPS/SSL	SFU testing toward MCU	Intelligence in processing tiled content.
KPIs	NA	Latency: 1 HTTP chunk request	Latency: 1 HTTP chunk request	Latency: 1 HTTP chunk request with tiled content

(Point Cloud/Audio/Live) DASH Receiver - Signals Unity Bridge (SUB)

Description

This component is the receiving end of the pipeline. It allows to unpackage any standard MPEG-DASH streams into raw streams to be used by the receiver client. It also contains a preliminary algorithm with respect to tiling support. It has been extended to read files and RTMP streams for debugging and interoperability purposes.

License:

This component is written in C++. It has dependencies on Motion Spell’s Signals (C++, IP-owned, proprietary license), GPAC (C, LGPLv2+ license), FFmpeg (C, LGPLv2+ license), cURL (C, MIT/X license)

Links:

Installation Guide: https://baltig.viaccess-orca.com:8443/VRT/nativeclient-group/SUB/releases. Copy the files in a folder (the SUB is a dynamic library).

Input, output, Configuration: The input is a MPEG-DASH stream URL. The configuration is done automatically from the content of the stream. The output is a timed sequence of binary buffers with metadata, to be given to a decoder/renderer.

	Status Year O	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL4	TRL5	TRL6	TRL6
Features	Basic DASH reception	RTMP input	Support for more streamers (e.g. not only bin2dash)	Support for tiling and TVM.
KPIs	Latency: 2 ISOBMFF/MP4 fragments	Latency: 1 ISOBMFF/MP4 fragment or 1 frame (RTMP)+network delays	Latency (est.): 1 ISOBMFF/MP4 fragment or 1 frame (RTMP)+network	Latency: 1 HTTP chunk request with tiled content

(Point Cloud/Audio/Live) Live presenter

Description

The component receives a live rtmp stream from a 3d stereoscopic camera using nginx and nginx-rtmp (both bsd-2 license), crops the left and right eye to fit the presenter and transcodes it into a final stream (using FFMpeg which is GPL) which can be fetched by the player using rtmp. The whole pipeline is low latency and runs in a docker. This component also handles dash format as input and output.

License:

This component uses Nginx for the reception and publishing, and FFMpeg for the transcoding.

	Status Year O	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	NA	TRL5	TRL6	TRL6
Features	NA	Overall Component · RTMP input · DASH output	Overall Component · RTMP input/output	Overall Component No core modifications expected. This module heavily depends on the capture material capabilities.
KPIs	NA	Latency <= 2s	Latency <= 1s	Latency <= 1s

Point Cloud - Multi Control Unit (PC-MCU)

Description

(Virtualized) Holoconferencing cloud-based component that aims at reducing the end-user client computational resources and bandwidth usage, providing the following key features: fusion of different volumetric videos, Level of Detail (LoD) adjustment and Field of View (FoV) aware delivery.

License:

License: To Be announced

Links:

	Status Year O	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	NA	TRL5	TRL6	TRL6
Features	NA	Overall Component · RTMP input · DASH output	Overall Component · RTMP input/output	Overall Component No core modifications expected. This module heavily depends on the capture material capabilities.
KPIs	NA	Latency <= 2s	Latency <= 1s	Latency <= 1s

Rendering and Display

Unity Player

Description:

Unity-based player capable in charge of the reception, integration and presentation of all available streams and VR content for the envisioned VR-Together scenarios. It also supports various types of interaction and synchronization features.

Licenses:

LGPL
EULA
Oculus SDK

Links:

Demo Video

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	TRL0 (Not available at the start of the project)	TRL6	TRL7	TRL7
Features	N/A	– TVM 1.0 Rendering – P2P streaming	– TVM 2.0 Rendering – Orchestrator based streaming – Stereoscopic video live stream – Multi-threaded pipelines – GUB Integration	– SUB Integration – PC pipeline – PC-MCU Integration – Multi-threading improvements to enable multiple decoding – Interactions envisioned in pilot 3 – Voice recognition
KPIs	N/A	– 2 Users support	– 4 Users support	– 5+ Users support

Key Components of VRTogether Web Platform

RGBD Capture

Description:

The main aim of the RGBD Capture module is to provide a lightweight, simple to setup sensing solution that is easy to deploy (e.g. in peoples homes). Thus the capture module is responsible for a photo-realistic capture of the user, based on a single RGBD sensor (i.e. Kinect / RealSense). The module has an adaptable interface to connect to different RGBD drivers in order to get the colour image and depth image from the hardware sensor. As of now this may include foreground+background removal and replacement of the HMD with a pre-captured representation of the human face (HMD removal). The image is further processed and finally converted to a 2D RGB + grayscale depth image. The final image is displayed on the screen for capture in the WEB browser. The camera calibration is sent to the browser for geometrically correct 3D rendering using WebGL shaders. The capture is rendered as:

3D self-view for self-presence and
view for the other participants for shared presence

all in real-time. All modules are optimized for capturing and rendering users in real-time in a 3D virtual environment.

Licenses:

To be announced

Links:

Capture and Transmit RGBD Data for 3D rendering (paper)
HMD removal (video)

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	5	6	6	7
Features	– RGB with Chroma background removal. – support of KinectV2 – shaders to real-time reconstruct & render in browser-based VR	– improved background removal – added support for RealSense sensor – 3D shaders for self-view and user rendering. – Sensor calibration of Oculus Rift and KinectV2 for RGBD-based HMD removal	– HMD removal fully integrated – improvements on capture conversion performance	– added support for Kinect4Azure – one executable with different modes (i.e. 2D and 3D capture mode)

WEBRTC-VR-MCU

Description:

The WebRTC VR MCU system is capable of ingesting any number of WebRTC client inputs. The MCU system synchronises and composes its inputs into a single output stream which is then published using WebRTC. As a result, clients are able to retrieve all relevant streams together instead of separately. This optimizes the network bandwidth due to more efficient routing (each client only sends its video stream to the MCU, and no longer to all other clients), as well as the decoding resources (clients have a limited amount of hardware decoders, which can now be dedicated to decoding all streams instead of just one) of said clients.

Incoming WebRTC streams are demuxed into RTP streams and are fed into a pluggable media pipeline. The media pipeline performs the compositing operation and outputs a single RTP stream which is broadcast to each client using WebRTC return streams. The media pipeline of the MCU system has been designed as a containerized service such that given enough hardware, it is horizontally scalable over multiple parallel sessions. These services are managed using Docker Swarm and the orchestrator component.

Licenses:

To be announced

Links:

MMSys Demo Paper

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	2	3	5	6
Features	– Only baseline code – No MCU (Full P2P mesh)	– initial concept of component stubs	– Horizontally scalable MCU – Orchestration – Integration with web Player	– Number of users 16 – Fully containerized deployment – Support for low-powered (mobile) devices
KPIs	Number of simultaneous users 3	Number of simultaneous users 4	Number of simultaneous users 8	Number of simultaneous users 16

Web-based player

Description:

The entry point for the TogetherVR Web client is offered by the web server back-end, which is based on Node.js, React and Aframe. This allows any modern WebVR enabled browser to display the VR content on a screen or an OpenVR (https://github.com/ValveSoftware/openvr) enabled VR-HMD (a.o. the Oculus Rift CV1 or the Valve Index). Furthermore the client can access the image produced by the TNO RGBD Capture module to be displayed as self-view to the user itself or to be sent via WebRTC to one or multiple other clients. In order to support such a multiuser connection, the client is connected to a second WEB server that is handling all sorts of multimedia orchestration. The rendering of users in the VR environment is done via custom WebGL shaders that alpha-blend user representations into the surroundings for a natural visual representation. Audio is also captured, transmitted and made audible as spatial audio with the help of the Google Resonance APIs.

Licenses:

To be announced

Links:

Explanation Video
Overview Paper
Look & Feel of 1st 360-degree prototype (video)

	Status Year 0	Status Year 1	Status Year 2	(Expected) Status Year 3
TRL	6	7	7	7
Features	– 360-degree only – Peer to peer streaming	– Pilot 1 content – Volumetric – Streaming via MCU – Self-view – Spatial audio – Admin panel	– comptel client Refactor based on redux/reflux – Pilot 2 content – Optimized MCU pipeline – Use capture (hardware) parameters for improved rendering	– Integrated technical measurements – New virtual room(s) with a focus on communication – Support for low-powered (mobile) devices
KPIs	MOS score for 360-degree exp. (5-point): – overall exp. 4.01 – video quality 3.59	MOS score for 360-degree exp. (5-point): – overall exp. 4.35 – video quality 3.65	MOS score for 3D exp. (9-point): – overall exp. 6.92	Full System end to end (glass-to-glass) delays: – RGB delay — p2p 396ms — MCU 564ms – RGBD delay — p2p 384ms — MCU 622ms

Key components

Key Components of Native VRTogether Platform

Capture

3D Capture

Simple Point Cloud Capture

Encoding

TVM encoding & transmission

Point Cloud encoding & decoding

Orchestration & Delivery

Media/Session Orchestrator

RTMP Live Video Transmission - Gstreamer Unity Bridge

(Point Cloud/Audio/Live) DASH Sender - bin2dash

(Point Cloud/Audio/Live) DASH Distributor - evanescent

(Point Cloud/Audio/Live) DASH Receiver - Signals Unity Bridge (SUB)

(Point Cloud/Audio/Live) Live presenter

Point Cloud - Multi Control Unit (PC-MCU)

Rendering and Display

Unity Player

Key Components of VRTogether Web Platform

RGBD Capture

WEBRTC-VR-MCU

Web-based player

Project Coordinator

This project has been funded by the European Commission as part of the H2020 program, under the grant agreement 762111.