Loading…

Key components

Key Components of Native VRTogether Platform

Capture

3D Capture

Description: Volumetric Capture (VolCap) is a multi – RGB-D sensor 3D capturing, streaming and recording application. The toolset is designed as a distributed system where a number of processing units each manage and collect data from a single sensor using a headless application. A set of sensors is orchestrated by a centralized UI application that is also the delivery point of the connected sensor streams. Volumetric Reconstruction (VolReco) uses the data provided by the former application, in order to create locally, at first, a mesh and, eventually, transmit it.

Licenses: 

Links: 

Status Year 0 Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL5: Existing multi-view 3D Capturing and Reconstruction platform (with KinectV2) TRL6: Updates on the existing multi-view 3D Capturing and Reconstruction platform (with KinectV2) for VRT end- product for performance improvements TRL7: Integrated synchronization and streaming utilities for 3D data in the new 3D Capturing Platform (with Intel RealSense).
TRL7: New Reconstruction software (with Intel RealSense)
TRL7: Integration of new Kinect
4 Azure RGBD Sensors in new 3D Capturing Platform. Capturing Platform will support different RGBD sensors.
TRL7: Reconstruction pipeline with new Kinect Azure
sensors
Performance optimizations based on the reconstruction rate, targeting real-time performance. Performance optimizations are related with the increased bandwidth needs and the 3D data processing due to the increased depth resolution.
Features Use of 4 KinectV2 sensors with:
– Color resolution: 1280×720
– Depth resolution: 512×424
Use of 4 KinectV2 sensors with:
– Color resolution: 1280×720
– Depth resolution: 512×424
Skeleton tracking
Use of 4 IntelRealsense D415 sensors with:
– Color resolution: 1280×720
– Depth resolution: 320×180
Use of 4 Kinect Azure sensors:
– Color resolution: 1280×720
– Depth resolution: 320×288
KPIs Reconstruction rate: 5fps Reconstruction rate: 9fps Reconstruction rate: 22 fps

Simple Point Cloud Capture

Description

This component offers live capture and reconstruction of 3D point clouds using IntelRealSense D400 devices. The component can run with zero or more sensors. If no sensors are found a synthetic point cloud is generated, if multiple sensors are found the point clouds from each sensor are transformed and merged together 

Links:

Github

Licenses:

To be Announced

Status Year 0 Status Year 1 Status Year 2 (Expected) Status Year 3
TRL N/A
The component was not originally planned
TRL4:
Prototype implementation using a single sensor
TRL5: Integration to operational pipeline. Optimization of memory use TRL7: Optimization for viewport adaptive streaming
Features Support for one Intel RealSense D400 sensor
Synthetic pointcloud is generated if no sensors are detected
Support for zero or more sensors. Reference Implementation for testing with 4 sensors Pointclouds captured will be segmented into spatial tiles along with relevant metadata
Optimization of the trade-off between latency, visual quality and resource consumption
KPIs Together with the compression module end to end pipeline is able to operate at 5 fps with 1 camera on commodity hardware Together with the compression module end to end pipeline is able to operate at 15fps with 4 cameras on commodity hardware
Self view latency of 300ms

Encoding

TVM encoding & transmission

Description:

The encoding component is responsible for the compression and decompression of the transmitted data. Over the years, CERTH has experimented with a variety of techniques and algorithms for this purpose (OpenCTM, Draco, Corto), but eventually ended up using Draco for performance reasons. Regarding the transmission, CERTH is using RabbitMQ, an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers.

Licenses: 

Status Year 0 Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL5: OpenCTM encoding and simple RMQ transmission in existing multi-view 3D platform TRL7: Integration of SoA algorithms (Corto, Draco) for mesh compression. TRL7: Integration of mesh compression new features / updates and optimized RMQ using efficient task management (RxCpp) to handle messages with the most appropriate way for TVM transmission TRL7: Hybrid distribution operability via RMQ and DASH for TVM transmission.
Features Supporting OpenCTM profile
64x128x64
Supporting two new encoding profiles: libcorto and libdraco Message handling with RxCpp
KPIs OpenCTM profile
64x128x64:
– Encoding: ~65ms
TVM Bandwidth: ~5Mbps
TVM pipeline end-to-end delay: ~225ms
libdraco profile
64x128x64:
-Encoding: ~17ms
libcorto profile
64x128x64:
– Encoding: ~10ms
TVM Bandwidth: ~2Mbps
TVM pipeline end-to-end delay: ~183ms
libdraco profile:64x128x64:
– Encoding: ~15ms
libcorto profile:64x128x64:
– Encoding: ~1ms
TVM Bandwidth: ~3Mbps
TVM pipeline end-to-end delay: 55ms

Point Cloud encoding & decoding

Description

This component offers a generic real-time dynamic point cloud codec for 3D immersive video. The component features low delay encoding and decoding at multiple levels of detail. Geometry coding is based on octree occupancy and attribute compression is based on existing image coding standards

Licenses:

To be Announced

Links:

Github

Status Year 0 Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL6: Reference implementation of core modules for encoding and decoding TRL6:
Ad-hoc integration of encoder and decoder into GStreamer based delivery pipeline
TRL7: Integration to operational pipeline. Optimization of memory use TRL7: Optimization for parallelized multiple encoding
Features Low delay encoding and decoding
Lossy geometry coding using octree occupancy
Lossy attribute coding using existing image coding standards
Integration for use with a GStreamer based delivery pipeline Optimization of memory use and performance for dense pointclouds captured with 4 cameras Optimization of codec to support multiple levels of detail for network adaptive streaming
Parallelized Implementation to code point clouds segmented spatially into tiles for viewport adaptive streaming
KPIs Encode times for Dimitris-2-Zippering Sequence (~320k points)
OctreeDepth8: ~80ms
OctreeDepth9: ~160ms
OctreeDepth10:~260ms
Together with simple capturer end to end pipeline is able to operate at 5 fps with 1 camera on commodity hardware Together with simple capturer end to end pipeline is able to operate at 15fps with 4 cameras on commodity hardware. End to end latency of 400 ms

Orchestration & Delivery

Media/Session Orchestrator

   

Description:

The VRT orchestrator, as a centralised server component, is primarily in charge of the management of a set of end-user profiles which are connected into a platform session. The orchestrator is able to control the various connection events in order to grant a unified experience and ensure that whole media pipelines are properly switched across the various connected nodes. It is also the reference point used for the synchronisation of the media pipelines. Then, from the backend viewpoint, the orchestrator is in charge of managing and supervising the transmission and the delivery of all of the media stream pipelines.

Licenses: 

  • Core: MIT
  • UMTS: To be announced
  • LRTS: To be announced
Status Year O Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL0 TRL2 TRL5 TRL6/7
Features Non existing component at initial time Static platform configuration Native pipeline cloud components supervision
PC & TVM pipelines supervision
Live presenter supervision
Users sessions management
Socket.io audio transmission
Year 2
+
Interactive events transmission and supervision
Virtualisation pre-packaging ?
KPIs NA 2 users managed statically for each session
One session at a time
4 users managed dynamically
Up to 5 sessions can run simultaneously
300 ms audio latency
10 users managed dynamically
Up to 10 sessions can run simultaneously

RTMP Live Video Transmission - Gstreamer Unity Bridge

Description

Software component that connects GStreamer with Unity. It is able to play any media URI provided by GStreamer (1.X) pipelines into Unity 3D textures

Licenses:

Open-Source

Links:

Github

Status Year 0 Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL5 (Component developed in EU H2020 ImmersiaTV project) (Not Used in the Project during Year 1) TRL5 (Not Used in the Project during Year 3)
Features – Ingest of DASH and RTMP streams (or other media URIs) from GStreamer into Unity – Same features, but component updated to interact with newer versions of GStreamer.
KPIs – Used for stored DASH streams
– Smooth playout
– No delays tests conducted so far
– End-to-End latency for a live RTMP stream: Latency ~1.5 s

(Point Cloud/Audio/Live) DASH Sender - bin2dash

Description

bin2dash is an open API allowing to package any volumetric data in the industry-standard MP4and then stream it using MPEG-DASH.

License:

Proprietary

Links:

Source Code Location: https://baltig.viaccess-orca.com:8443/VRT/nativeclient-group/EncodingEncapsulation

Status Year O Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL4 TRL6 TRL7 TRL7
Features Integrated component (plugged with capture and encoding) New modular structure and new Metadata handling Improved with more data types, more testing use-cases Tiled Point Clouds. Support for TVMs
KPIs Latency: 2 frames + internal threading Latency: 1 frame + internal threading Latency: 1 frame (a complete frame is needed for copy) Latency: 1 frame (a complete frame is needed for copy)

(Point Cloud/Audio/Live) DASH Distributor - evanescent

Description

Evanescent is a server component that allows to receive any kind of DASH stream and simply forward them to other receiver clients. In this sense, Evanescent is an SFU (Stream Forward Unit). More technically, it receives low latency dash chunks and serves them back to a low latency dash receiver using http chunked transfers. Evanescent is implemented in C++ based on Signals sub-modules.

License:

Proprietary

Links:

Source Code Location: https://git.gpac-licensing.com/alaiwans/Evanescent

Documentation: LD_LIBRARY_PATH=$DIR $DIR/evanescent.exe [–tls] [–port port_number]

Installation guide: just copy the files, see details at https://baltig.viaccess-orca.com:8443/VRT/deliverymcu-group/DeliveryMCU/blob/master/README.md.

Status Year O Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL0 TRL4 TRL5 TRL5
Features NA Basic system with HTTP and HTTPS/SSL SFU testing toward MCU Intelligence in processing tiled content.
KPIs NA Latency: 1 HTTP chunk request Latency: 1 HTTP chunk request Latency: 1 HTTP chunk request with tiled content

(Point Cloud/Audio/Live) DASH Receiver - Signals Unity Bridge (SUB)

Description

This component is the receiving end of the pipeline. It allows to unpackage any standard MPEG-DASH streams into raw streams to be used by the receiver client. It also contains a preliminary algorithm with respect to tiling support. It has been extended to read files and RTMP streams for debugging and interoperability purposes.

License:

This component is written in C++. It has dependencies on Motion Spell’s Signals (C++, IP-owned, proprietary license), GPAC (C, LGPLv2+ license), FFmpeg (C, LGPLv2+ license), cURL (C, MIT/X license) 

Links:

Installation Guide: https://baltig.viaccess-orca.com:8443/VRT/nativeclient-group/SUB/releases. Copy the files in a folder (the SUB is a dynamic library).

Input, output, Configuration: The input is a MPEG-DASH stream URL. The configuration is done automatically from the content of the stream. The output is a timed sequence of binary buffers with metadata, to be given to a decoder/renderer.

Status Year O Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL4 TRL5 TRL6 TRL6
Features Basic DASH reception RTMP input Support for more streamers (e.g. not only bin2dash) Support for tiling and TVM.
KPIs Latency: 2 ISOBMFF/MP4 fragments Latency: 1 ISOBMFF/MP4 fragment or 1 frame (RTMP)+network delays Latency (est.): 1 ISOBMFF/MP4 fragment or 1 frame (RTMP)+network Latency: 1 HTTP chunk request with tiled content

(Point Cloud/Audio/Live) Live presenter

Description

The component receives a live rtmp stream from a 3d stereoscopic camera using nginx and nginx-rtmp (both bsd-2 license), crops the left and right eye to fit the presenter and transcodes it into a final stream (using FFMpeg which is GPL) which can be fetched by the player using rtmp. The whole pipeline is low latency and runs in a docker. This component also handles dash format as input and output.

License:

This component uses Nginx for the reception and publishing, and FFMpeg for the transcoding.

Status Year O Status Year 1 Status Year 2 (Expected) Status Year 3
TRL NA TRL5 TRL6 TRL6
Features NA Overall Component
· RTMP input
· DASH output
Overall Component
· RTMP input/output
Overall Component
No core modifications expected. This module heavily depends on the capture material capabilities.
KPIs NA Latency <= 2s Latency <= 1s Latency <= 1s

Point Cloud - Multi Control Unit (PC-MCU)

Description

(Virtualized) Holoconferencing cloud-based component that aims at reducing the end-user client computational resources and bandwidth usage, providing the following key features: fusion of different volumetric videos, Level of Detail (LoD) adjustment and Field of View (FoV) aware delivery.

License:

License: To Be announced

Links:

Status Year O Status Year 1 Status Year 2 (Expected) Status Year 3
TRL NA TRL5 TRL6 TRL6
Features NA Overall Component
· RTMP input
· DASH output
Overall Component
· RTMP input/output
Overall Component
No core modifications expected. This module heavily depends on the capture material capabilities.
KPIs NA Latency <= 2s Latency <= 1s Latency <= 1s

Rendering and Display

Unity Player

             

Description:

Unity-based player capable in charge of the reception, integration and presentation of all available streams and VR content for the envisioned VR-Together scenarios. It also supports various types of interaction and synchronization features.

Licenses: 

Links: 

Status Year 0 Status Year 1 Status Year 2 (Expected) Status Year 3
TRL TRL0 (Not available at the start of the project) TRL6 TRL7 TRL7
Features N/A – TVM 1.0 Rendering
– P2P streaming
– TVM 2.0 Rendering
– Orchestrator based streaming
– Stereoscopic video live stream
– Multi-threaded pipelines
– GUB Integration
– SUB Integration
– PC pipeline
– PC-MCU Integration
– Multi-threading improvements to enable multiple decoding
– Interactions envisioned in pilot 3
– Voice recognition
KPIs N/A – 2 Users support – 4 Users support – 5+ Users support

Key Components of VRTogether Web Platform

RGBD Capture

Description:

The main aim of the RGBD Capture module is to provide a lightweight, simple to setup sensing solution that is easy to deploy (e.g. in peoples homes). Thus the capture module is responsible for a photo-realistic capture of the user, based on a single RGBD sensor (i.e. Kinect / RealSense). The module has an adaptable interface to connect to different RGBD drivers in order to get the colour image and depth image from the hardware sensor. As of now this may include foreground+background removal and replacement of the HMD with a pre-captured representation of the human face (HMD removal). The image is further processed and finally converted to a 2D RGB + grayscale depth image. The final image is displayed on the screen for capture in the WEB browser. The camera calibration is sent to the browser for geometrically correct 3D rendering using WebGL shaders. The capture is rendered as:

  • 3D self-view for self-presence and 
  • view for the other participants for shared presence 

all in real-time. All modules are optimized for capturing and rendering users in real-time in a 3D virtual environment.

Licenses: 

  • To be announced

Links: 

Status Year 0 Status Year 1 Status Year 2 (Expected)
Status Year 3
TRL 5 6 6 7
Features – RGB with Chroma background removal.
– support of KinectV2
– shaders to real-time reconstruct & render in browser-based VR
– improved background removal
– added support for RealSense sensor
– 3D shaders for self-view and user rendering.
– Sensor calibration of Oculus Rift and KinectV2 for RGBD-based HMD removal
– HMD removal fully integrated
– improvements on capture conversion performance
– added support for Kinect4Azure
– one executable with different modes (i.e. 2D and 3D capture mode)

WEBRTC-VR-MCU

Description:

The WebRTC VR MCU system is capable of ingesting any number of WebRTC client inputs. The MCU system synchronises and composes its inputs into a single output stream which is then published using WebRTC. As a result, clients are able to retrieve all relevant streams together instead of separately. This optimizes the network bandwidth due to more efficient routing (each client only sends its video stream to the MCU, and no longer to all other clients), as well as the decoding resources (clients have a limited amount of hardware decoders, which can now be dedicated to decoding all streams instead of just one) of said clients.

Incoming WebRTC streams are demuxed into RTP streams and are fed into a pluggable media pipeline. The media pipeline performs the compositing operation and outputs a single RTP stream which is broadcast to each client using WebRTC return streams. The media pipeline of the MCU system has been designed as a containerized service such that given enough hardware, it is horizontally scalable over multiple parallel sessions. These services are managed using Docker Swarm and the orchestrator component. 

Licenses: 

  • To be announced

Links: 

Status Year 0 Status Year 1 Status Year 2 (Expected)
Status Year 3
TRL 2 3 5 6
Features – Only baseline code
– No MCU (Full P2P mesh)
– initial concept of component stubs – Horizontally scalable MCU
– Orchestration
– Integration with web Player
– Number of users 16
– Fully containerized deployment
– Support for low-powered (mobile) devices
KPIs Number of simultaneous users 3 Number of simultaneous users 4 Number of simultaneous users 8 Number of simultaneous users 16

Web-based player

Description:

The entry point for the TogetherVR Web client is offered by the web server back-end, which is based on Node.js, React and Aframe. This allows any modern WebVR enabled browser to display the VR content on a screen or an OpenVR (https://github.com/ValveSoftware/openvr) enabled VR-HMD (a.o. the Oculus Rift CV1 or the Valve Index). Furthermore the client can access the image produced by the TNO RGBD Capture module to be displayed as self-view to the user itself or to be sent via WebRTC to one or multiple other clients. In order to support such a multiuser connection, the client is connected to a second WEB server that is handling all sorts of multimedia orchestration. The rendering of users in the VR environment is done via custom WebGL shaders that alpha-blend user representations into the surroundings for a natural visual representation. Audio is also captured, transmitted and made audible as spatial audio with the help of the Google Resonance APIs.

Licenses: 

  • To be announced

Links: 

Status Year 0 Status Year 1 Status Year 2 (Expected)
Status Year 3
TRL 6 7 7 7
Features – 360-degree only
– Peer to peer streaming
– Pilot 1 content
– Volumetric
– Streaming via MCU
– Self-view
– Spatial audio
– Admin panel
– comptel client Refactor based on redux/reflux
– Pilot 2 content
– Optimized MCU pipeline
– Use capture (hardware) parameters for improved rendering
– Integrated technical measurements
– New virtual room(s) with a focus on communication
– Support for low-powered (mobile) devices
KPIs MOS score for 360-degree exp. (5-point):
– overall exp. 4.01
– video quality 3.59
MOS score for 360-degree exp. (5-point):
– overall exp. 4.35
– video quality 3.65
MOS score for 3D exp. (9-point):
– overall exp. 6.92
Full System end to end (glass-to-glass) delays:
– RGB delay
— p2p 396ms — MCU 564ms
– RGBD delay
— p2p 384ms
— MCU 622ms