Key components
Key Components of Native VRTogether Platform
3D Capture
Description: Volumetric Capture (VolCap) is a multi – RGB-D sensor 3D capturing, streaming and recording application. The toolset is designed as a distributed system where a number of processing units each manage and collect data from a single sensor using a headless application. A set of sensors is orchestrated by a centralized UI application that is also the delivery point of the connected sensor streams. Volumetric Reconstruction (VolReco) uses the data provided by the former application, in order to create locally, at first, a mesh and, eventually, transmit it.
Licenses:
- Capturing:
- Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt
- Kinect SDK v2 License: https://download.microsoft.com/download/0/D/C/0DC5308E-36A7-4DCD-B299-B01CDFC8E345/Kinect-SDK2.0-EULA_en-US.pdf
- Reconstruction:
- CUDA Eula license: http://docs.nvidia.com/cuda/eula/index.html
- Boost license: http://www.boost.org/LICENSE_1_0.txt
- MIT License (rendering): https://opensource.org/licenses/MIT
- OpenCV BSD license: https://opencv.org/license.html
- Flann BSD license: https://github.com/mariusmuja/flann/blob/master/COPYING
Links:
- Capturing: https://github.com/VCL3D/VolumetricCapture
- Reconstruction: Not available
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL5: Existing multi-view 3D Capturing and Reconstruction platform (with KinectV2) | TRL6: Updates on the existing multi-view 3D Capturing and Reconstruction platform (with KinectV2) for VRT end- product for performance improvements | TRL7: Integrated synchronization and streaming utilities for 3D data in the new 3D Capturing Platform (with Intel RealSense). TRL7: New Reconstruction software (with Intel RealSense) |
TRL7: Integration of new Kinect 4 Azure RGBD Sensors in new 3D Capturing Platform. Capturing Platform will support different RGBD sensors. TRL7: Reconstruction pipeline with new Kinect Azure sensors Performance optimizations based on the reconstruction rate, targeting real-time performance. Performance optimizations are related with the increased bandwidth needs and the 3D data processing due to the increased depth resolution. |
Features | Use of 4 KinectV2 sensors with: – Color resolution: 1280×720 – Depth resolution: 512×424 |
Use of 4 KinectV2 sensors with: – Color resolution: 1280×720 – Depth resolution: 512×424 Skeleton tracking |
Use of 4 IntelRealsense D415 sensors with: – Color resolution: 1280×720 – Depth resolution: 320×180 |
Use of 4 Kinect Azure sensors: – Color resolution: 1280×720 – Depth resolution: 320×288 |
KPIs | Reconstruction rate: 5fps | Reconstruction rate: 9fps | Reconstruction rate: 22 fps |
Simple Point Cloud Capture
Description
This component offers live capture and reconstruction of 3D point clouds using IntelRealSense D400 devices. The component can run with zero or more sensors. If no sensors are found a synthetic point cloud is generated, if multiple sensors are found the point clouds from each sensor are transformed and merged together
Links:
Licenses:
To be Announced
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | N/A The component was not originally planned |
TRL4: Prototype implementation using a single sensor |
TRL5: Integration to operational pipeline. Optimization of memory use | TRL7: Optimization for viewport adaptive streaming |
Features | Support for one Intel RealSense D400 sensor Synthetic pointcloud is generated if no sensors are detected |
Support for zero or more sensors. Reference Implementation for testing with 4 sensors | Pointclouds captured will be segmented into spatial tiles along with relevant metadata Optimization of the trade-off between latency, visual quality and resource consumption |
|
KPIs | Together with the compression module end to end pipeline is able to operate at 5 fps with 1 camera on commodity hardware | Together with the compression module end to end pipeline is able to operate at 15fps with 4 cameras on commodity hardware Self view latency of 300ms |
Encoding
TVM encoding & transmission
Description:
The encoding component is responsible for the compression and decompression of the transmitted data. Over the years, CERTH has experimented with a variety of techniques and algorithms for this purpose (OpenCTM, Draco, Corto), but eventually ended up using Draco for performance reasons. Regarding the transmission, CERTH is using RabbitMQ, an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers.
Licenses:
- Encoding:
- Libjpeg API library and associated programs: IJG (Independent JPEG Group) License (https://spdx.org/licenses/IJG.html)
- TurboJPEG API library and associated programs: Modified (3-clause) BSD License (https://opensource.org/licenses/BSD-3-Clause)
- Libjpeg-turbo SIMD extensions: zlib License (https://opensource.org/licenses/Zlib)
- Draco license: Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0.txt)
- Corto license: GNU LESSER GENERAL PUBLIC LICENSE (http://www.gnu.org/licenses/lgpl-3.0.en.html)
- Transmission:
- RabbitMQ License: https://www.rabbitmq.com/mpl.html
- Boost license: http://www.boost.org/LICENSE_1_0.txt
- Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL5: OpenCTM encoding and simple RMQ transmission in existing multi-view 3D platform | TRL7: Integration of SoA algorithms (Corto, Draco) for mesh compression. | TRL7: Integration of mesh compression new features / updates and optimized RMQ using efficient task management (RxCpp) to handle messages with the most appropriate way for TVM transmission | TRL7: Hybrid distribution operability via RMQ and DASH for TVM transmission. |
Features | Supporting OpenCTM profile 64x128x64 |
Supporting two new encoding profiles: libcorto and libdraco | Message handling with RxCpp | |
KPIs | OpenCTM profile 64x128x64: – Encoding: ~65ms TVM Bandwidth: ~5Mbps TVM pipeline end-to-end delay: ~225ms |
libdraco profile 64x128x64: -Encoding: ~17ms libcorto profile 64x128x64: – Encoding: ~10ms TVM Bandwidth: ~2Mbps TVM pipeline end-to-end delay: ~183ms |
libdraco profile:64x128x64: – Encoding: ~15ms libcorto profile:64x128x64: – Encoding: ~1ms TVM Bandwidth: ~3Mbps TVM pipeline end-to-end delay: 55ms |
Point Cloud encoding & decoding
Description
This component offers a generic real-time dynamic point cloud codec for 3D immersive video. The component features low delay encoding and decoding at multiple levels of detail. Geometry coding is based on octree occupancy and attribute compression is based on existing image coding standards
Licenses:
To be Announced
Links:
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL6: Reference implementation of core modules for encoding and decoding | TRL6: Ad-hoc integration of encoder and decoder into GStreamer based delivery pipeline |
TRL7: Integration to operational pipeline. Optimization of memory use | TRL7: Optimization for parallelized multiple encoding |
Features | Low delay encoding and decoding Lossy geometry coding using octree occupancy Lossy attribute coding using existing image coding standards |
Integration for use with a GStreamer based delivery pipeline | Optimization of memory use and performance for dense pointclouds captured with 4 cameras | Optimization of codec to support multiple levels of detail for network adaptive streaming Parallelized Implementation to code point clouds segmented spatially into tiles for viewport adaptive streaming |
KPIs | Encode times for Dimitris-2-Zippering Sequence (~320k points) OctreeDepth8: ~80ms OctreeDepth9: ~160ms OctreeDepth10:~260ms |
Together with simple capturer end to end pipeline is able to operate at 5 fps with 1 camera on commodity hardware | Together with simple capturer end to end pipeline is able to operate at 15fps with 4 cameras on commodity hardware. End to end latency of 400 ms |
Orchestration & Delivery
Media/Session Orchestrator
Description:
The VRT orchestrator, as a centralised server component, is primarily in charge of the management of a set of end-user profiles which are connected into a platform session. The orchestrator is able to control the various connection events in order to grant a unified experience and ensure that whole media pipelines are properly switched across the various connected nodes. It is also the reference point used for the synchronisation of the media pipelines. Then, from the backend viewpoint, the orchestrator is in charge of managing and supervising the transmission and the delivery of all of the media stream pipelines.
Licenses:
- Core: MIT
- UMTS: To be announced
- LRTS: To be announced
Status Year O | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL0 | TRL2 | TRL5 | TRL6/7 |
Features | Non existing component at initial time | Static platform configuration | Native pipeline cloud components supervision PC & TVM pipelines supervision Live presenter supervision Users sessions management Socket.io audio transmission |
Year 2 + Interactive events transmission and supervision Virtualisation pre-packaging ? |
KPIs | NA | 2 users managed statically for each session One session at a time |
4 users managed dynamically Up to 5 sessions can run simultaneously 300 ms audio latency |
10 users managed dynamically Up to 10 sessions can run simultaneously |
RTMP Live Video Transmission - Gstreamer Unity Bridge
Description
Software component that connects GStreamer with Unity. It is able to play any media URI provided by GStreamer (1.X) pipelines into Unity 3D textures
Licenses:
Open-Source
Links:
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL5 (Component developed in EU H2020 ImmersiaTV project) | (Not Used in the Project during Year 1) | TRL5 | (Not Used in the Project during Year 3) |
Features | – Ingest of DASH and RTMP streams (or other media URIs) from GStreamer into Unity | – Same features, but component updated to interact with newer versions of GStreamer. | ||
KPIs | – Used for stored DASH streams – Smooth playout – No delays tests conducted so far |
– End-to-End latency for a live RTMP stream: Latency ~1.5 s |
(Point Cloud/Audio/Live) DASH Sender - bin2dash
Description
bin2dash is an open API allowing to package any volumetric data in the industry-standard MP4and then stream it using MPEG-DASH.
License:
Proprietary
Links:
Source Code Location: https://baltig.viaccess-orca.com:8443/VRT/nativeclient-group/EncodingEncapsulation
Status Year O | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL4 | TRL6 | TRL7 | TRL7 |
Features | Integrated component (plugged with capture and encoding) | New modular structure and new Metadata handling | Improved with more data types, more testing use-cases | Tiled Point Clouds. Support for TVMs |
KPIs | Latency: 2 frames + internal threading | Latency: 1 frame + internal threading | Latency: 1 frame (a complete frame is needed for copy) | Latency: 1 frame (a complete frame is needed for copy) |
(Point Cloud/Audio/Live) DASH Distributor - evanescent
Description
Evanescent is a server component that allows to receive any kind of DASH stream and simply forward them to other receiver clients. In this sense, Evanescent is an SFU (Stream Forward Unit). More technically, it receives low latency dash chunks and serves them back to a low latency dash receiver using http chunked transfers. Evanescent is implemented in C++ based on Signals sub-modules.
License:
Proprietary
Links:
Source Code Location: https://git.gpac-licensing.com/alaiwans/Evanescent
Documentation: LD_LIBRARY_PATH=$DIR $DIR/evanescent.exe [–tls] [–port port_number]
Installation guide: just copy the files, see details at https://baltig.viaccess-orca.com:8443/VRT/deliverymcu-group/DeliveryMCU/blob/master/README.md.
Status Year O | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL0 | TRL4 | TRL5 | TRL5 |
Features | NA | Basic system with HTTP and HTTPS/SSL | SFU testing toward MCU | Intelligence in processing tiled content. |
KPIs | NA | Latency: 1 HTTP chunk request | Latency: 1 HTTP chunk request | Latency: 1 HTTP chunk request with tiled content |
(Point Cloud/Audio/Live) DASH Receiver - Signals Unity Bridge (SUB)
Description
This component is the receiving end of the pipeline. It allows to unpackage any standard MPEG-DASH streams into raw streams to be used by the receiver client. It also contains a preliminary algorithm with respect to tiling support. It has been extended to read files and RTMP streams for debugging and interoperability purposes.
License:
This component is written in C++. It has dependencies on Motion Spell’s Signals (C++, IP-owned, proprietary license), GPAC (C, LGPLv2+ license), FFmpeg (C, LGPLv2+ license), cURL (C, MIT/X license)
Links:
Installation Guide: https://baltig.viaccess-orca.com:8443/VRT/nativeclient-group/SUB/releases. Copy the files in a folder (the SUB is a dynamic library).
Input, output, Configuration: The input is a MPEG-DASH stream URL. The configuration is done automatically from the content of the stream. The output is a timed sequence of binary buffers with metadata, to be given to a decoder/renderer.
Status Year O | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL4 | TRL5 | TRL6 | TRL6 |
Features | Basic DASH reception | RTMP input | Support for more streamers (e.g. not only bin2dash) | Support for tiling and TVM. |
KPIs | Latency: 2 ISOBMFF/MP4 fragments | Latency: 1 ISOBMFF/MP4 fragment or 1 frame (RTMP)+network delays | Latency (est.): 1 ISOBMFF/MP4 fragment or 1 frame (RTMP)+network | Latency: 1 HTTP chunk request with tiled content |
(Point Cloud/Audio/Live) Live presenter
Description
The component receives a live rtmp stream from a 3d stereoscopic camera using nginx and nginx-rtmp (both bsd-2 license), crops the left and right eye to fit the presenter and transcodes it into a final stream (using FFMpeg which is GPL) which can be fetched by the player using rtmp. The whole pipeline is low latency and runs in a docker. This component also handles dash format as input and output.
License:
This component uses Nginx for the reception and publishing, and FFMpeg for the transcoding.
Status Year O | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | NA | TRL5 | TRL6 | TRL6 |
Features | NA | Overall Component · RTMP input · DASH output |
Overall Component · RTMP input/output |
Overall Component No core modifications expected. This module heavily depends on the capture material capabilities. |
KPIs | NA | Latency <= 2s | Latency <= 1s | Latency <= 1s |
Point Cloud - Multi Control Unit (PC-MCU)
Description
(Virtualized) Holoconferencing cloud-based component that aims at reducing the end-user client computational resources and bandwidth usage, providing the following key features: fusion of different volumetric videos, Level of Detail (LoD) adjustment and Field of View (FoV) aware delivery.
License:
License: To Be announced
Links:
Status Year O | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | NA | TRL5 | TRL6 | TRL6 |
Features | NA | Overall Component · RTMP input · DASH output |
Overall Component · RTMP input/output |
Overall Component No core modifications expected. This module heavily depends on the capture material capabilities. |
KPIs | NA | Latency <= 2s | Latency <= 1s | Latency <= 1s |
Rendering and Display
Unity Player
Description:
Unity-based player capable in charge of the reception, integration and presentation of all available streams and VR content for the envisioned VR-Together scenarios. It also supports various types of interaction and synchronization features.
Licenses:
- LGPL
- EULA
- Oculus SDK
Links:
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 | |
---|---|---|---|---|
TRL | TRL0 (Not available at the start of the project) | TRL6 | TRL7 | TRL7 |
Features | N/A | – TVM 1.0 Rendering – P2P streaming |
– TVM 2.0 Rendering – Orchestrator based streaming – Stereoscopic video live stream – Multi-threaded pipelines – GUB Integration |
– SUB Integration – PC pipeline – PC-MCU Integration – Multi-threading improvements to enable multiple decoding – Interactions envisioned in pilot 3 – Voice recognition |
KPIs | N/A | – 2 Users support | – 4 Users support | – 5+ Users support |
Key Components of VRTogether Web Platform
RGBD Capture
Description:
The main aim of the RGBD Capture module is to provide a lightweight, simple to setup sensing solution that is easy to deploy (e.g. in peoples homes). Thus the capture module is responsible for a photo-realistic capture of the user, based on a single RGBD sensor (i.e. Kinect / RealSense). The module has an adaptable interface to connect to different RGBD drivers in order to get the colour image and depth image from the hardware sensor. As of now this may include foreground+background removal and replacement of the HMD with a pre-captured representation of the human face (HMD removal). The image is further processed and finally converted to a 2D RGB + grayscale depth image. The final image is displayed on the screen for capture in the WEB browser. The camera calibration is sent to the browser for geometrically correct 3D rendering using WebGL shaders. The capture is rendered as:
- 3D self-view for self-presence and
- view for the other participants for shared presence
all in real-time. All modules are optimized for capturing and rendering users in real-time in a 3D virtual environment.
Licenses:
- To be announced
Links:
- Capture and Transmit RGBD Data for 3D rendering (paper)
- HMD removal (video)
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 |
|
---|---|---|---|---|
TRL | 5 | 6 | 6 | 7 |
Features | – RGB with Chroma background removal. – support of KinectV2 – shaders to real-time reconstruct & render in browser-based VR |
– improved background removal – added support for RealSense sensor – 3D shaders for self-view and user rendering. – Sensor calibration of Oculus Rift and KinectV2 for RGBD-based HMD removal |
– HMD removal fully integrated – improvements on capture conversion performance |
– added support for Kinect4Azure – one executable with different modes (i.e. 2D and 3D capture mode) |
WEBRTC-VR-MCU
Description:
The WebRTC VR MCU system is capable of ingesting any number of WebRTC client inputs. The MCU system synchronises and composes its inputs into a single output stream which is then published using WebRTC. As a result, clients are able to retrieve all relevant streams together instead of separately. This optimizes the network bandwidth due to more efficient routing (each client only sends its video stream to the MCU, and no longer to all other clients), as well as the decoding resources (clients have a limited amount of hardware decoders, which can now be dedicated to decoding all streams instead of just one) of said clients.
Incoming WebRTC streams are demuxed into RTP streams and are fed into a pluggable media pipeline. The media pipeline performs the compositing operation and outputs a single RTP stream which is broadcast to each client using WebRTC return streams. The media pipeline of the MCU system has been designed as a containerized service such that given enough hardware, it is horizontally scalable over multiple parallel sessions. These services are managed using Docker Swarm and the orchestrator component.
Licenses:
- To be announced
Links:
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 |
|
---|---|---|---|---|
TRL | 2 | 3 | 5 | 6 |
Features | – Only baseline code – No MCU (Full P2P mesh) |
– initial concept of component stubs | – Horizontally scalable MCU – Orchestration – Integration with web Player |
– Number of users 16 – Fully containerized deployment – Support for low-powered (mobile) devices |
KPIs | Number of simultaneous users 3 | Number of simultaneous users 4 | Number of simultaneous users 8 | Number of simultaneous users 16 |
Web-based player
Description:
The entry point for the TogetherVR Web client is offered by the web server back-end, which is based on Node.js, React and Aframe. This allows any modern WebVR enabled browser to display the VR content on a screen or an OpenVR (https://github.com/ValveSoftware/openvr) enabled VR-HMD (a.o. the Oculus Rift CV1 or the Valve Index). Furthermore the client can access the image produced by the TNO RGBD Capture module to be displayed as self-view to the user itself or to be sent via WebRTC to one or multiple other clients. In order to support such a multiuser connection, the client is connected to a second WEB server that is handling all sorts of multimedia orchestration. The rendering of users in the VR environment is done via custom WebGL shaders that alpha-blend user representations into the surroundings for a natural visual representation. Audio is also captured, transmitted and made audible as spatial audio with the help of the Google Resonance APIs.
Licenses:
- To be announced
Links:
- Explanation Video
- Overview Paper
- Look & Feel of 1st 360-degree prototype (video)
Status Year 0 | Status Year 1 | Status Year 2 | (Expected) Status Year 3 |
|
---|---|---|---|---|
TRL | 6 | 7 | 7 | 7 |
Features | – 360-degree only – Peer to peer streaming |
– Pilot 1 content – Volumetric – Streaming via MCU – Self-view – Spatial audio – Admin panel |
– comptel client Refactor based on redux/reflux – Pilot 2 content – Optimized MCU pipeline – Use capture (hardware) parameters for improved rendering |
– Integrated technical measurements – New virtual room(s) with a focus on communication – Support for low-powered (mobile) devices |
KPIs | MOS score for 360-degree exp. (5-point): – overall exp. 4.01 – video quality 3.59 |
MOS score for 360-degree exp. (5-point): – overall exp. 4.35 – video quality 3.65 |
MOS score for 3D exp. (9-point): – overall exp. 6.92 |
Full System end to end (glass-to-glass) delays: – RGB delay — p2p 396ms — MCU 564ms – RGBD delay — p2p 384ms — MCU 622ms |