Interactive Remote Path-Tracing Using the HPC Cluster at HLRS

170717 DKoerner Artikel header

Using the HPC-Cluster like a traditional render farm is straightforward. Each frame is rendered on a single compute node and the result is stored somewhere on disk. However, this application does not really play to the strength of the HPC cluster, which is the high-throughput and low-latency interconnect between nodes.

One of the missions of the Media Solution Center is, to find and help develop applications in media production, which specifically play to the strength of the HPC-Cluster. In the domain of rendering, such an application is interactive path-tracing. Here, instead of rendering a single image per node, one image is rendered on many nodes and low-latency is critical for a smooth and satisfying user interaction. In addition, the user will not be on-site at the HLRS, but rather off-site at his workplace. Therefore, the application backend is distributed on many cluster nodes and connected over the internet to a frontend, such as a web application.

This video shows the prototype we developed in order to study this application scenario:


Screen recording of the interactive path-tracing prototype in action. Here we used 30 nodes of the HPC cluster, which amount to 720 CPU cores.


In the following, we outline some interesting aspects of the prototype implementation. The complete source code can be found on GitHub:

Rendering back-end

The backend is the part of the application, which runs on the compute nodes of the cluster. It receives and applies scene updates from the frontend and continuously produces increasingly better images of the current state of the scene. The updates reflect changes the user wants to make to the scene, such as a change in camera angle, scene parameter or the creation or removal of objects in the scene.

An important question here is the choice of rendering software. Clearly, developing a custom renderer for our application was not an option and therefore we set out to look for a renderer, which makes best use of the fast interconnect. In addition, we wanted the renderer to be Open-source. This would allow us to make changes in case this was required for our application.

After looking at options such as Cycles, Luxrender, Appleseed and others, we finally settled with OSPray (, a renderer which is being developed by a team at Intel. The main reason for our decision was, that OSPRay is been designed for running on HPC-clusters from the ground up. In particular, it uses the Message Passing Interface (MPI) for inter-node communication and therefore can work with the Aries interconnect, which is used by the cluster at HLRS. Something no other Open-Source renderer could offer.

One drawback is, that the renderer is primarily being designed for applications in scientific visualization (SCIVIS). However, a current trend in SCIVIS is to use physically based rendering in order to produce more appealing images with better readability. Consequently, the renderer has physically based lights, cameras and materials and therefore all basic building blocks in place for doing light transport simulation. This was good enough for our prototype, but we would advise against using OSPRay for any type of production rendering.

Another important aspect is the compression of the image data which is being sent to the frontend. We use the most basic approach of sending JPEG-compressed images out. However, if network latency and distance becomes an issue, more advanced approaches such as video streaming can be considered.


This part connects the backend with the frontend. Images are constantly transferred from the former to the latter, while scene updates are sent from the frontend to the backend only, when there is some user interaction. This posed a challenge during development, as there is no direct communication between compute nodes and the outside world possible. This is for security reasons and also because traditionally, applications in high performance computing do not require interference from a remote user during simulation on the cluster. For our application, this is different.

With our prototype, we solved this problem by implementing a small proxy application, which is run within the HLRS network but outside the cluster. From there, the proxy can connect to the compute node and the frontend at the same time (using an SSH tunnel). As soon as the link is established, it simply relays messages from one point to the other in both directions.

A useful side effect of having a proxy is, that we can use it to bridge between different communication protocols. For our browser frontend, we use Websockets, while the backend always expects basic TCP packets. The proxy allows to have a Websocket connection to the frontend, and a standard TCP connection to the backend (for which we use ZeroMQ). But it would also possible to have a TCP connection to a standalone client application as well.


Figure 1: Outline of the communication setup for our prototype



The frontend is a JavaScript web-application, which is run in the browser through an html website. It constantly receives images from the backend and displays them. User interactions, such as parameter edits or changes in camera angle are packed into commands which are represented in binary for easy transfer over the wire.

The question on how to represent a scene and its changes in a distributed application poses some interesting challenges to scene API’s, which are programming interfaces for abstracting away scene representation and storage. Modern scene API’s, such as Pixar’s Universal Scene Description ( or the Nodal Scene Interface from 3Delight ( will be able to accommodate distributed applications like ours. For simplicity, we chose to implement our own small API, which is very reminiscent of NSI.


Figure 2: Scene API in interactive distributed applications



Our prototype implementation shows, that remote interactive applications on the cluster are feasible. However, running the prototype currently requires setting up the SSH tunnel and manually starting the backend and proxy applications through a terminal. In the future, we would like to make this process more straightforward and convenient by providing a platform, which allows easy access to some infrastructure functionalities, while hiding away its complexity.

Another unresolved question is about scheduling interactive sessions through a scheduling system which has been designed around non-interactive batch processing. Requesting 30 nodes on the cluster requires a significant amount of wait time in average. Something which is not acceptable for interactive applications.



Complete source code:


Pixar’s Universal Scene Description

Nodal Scene Interface from 3Delight



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.