Synthetic Ground Truth Generation for Testing, Technology Evaluation and Verification (SyntTEV)

Nowadays, several computer devices are used to visually detect objects, people and activities. Their quality and performance depends on limited datasets created and annotated by error-prone and expensive human handwork. But to reach high quality for complex detection tasks extensive datasets with errorless annotations are needed. To overcome this dilemma we create a system for automatic generation of synthetic ground truth data to allow learning of complex detection tasks as well as testing, verification and evaluation.


INTRODUCTION
Today, many systems like mobile phones 1 and autonomous driving vehicles 2 use computers to detect humans. Also industrial quality inspection devices as well as assistance systems for hindered [1] or old people use computers to detect humans and try to interpret their behaviour [2] to improve the quality of Human-Computer Interaction. This ability, their correctness and the quality heavily based on datasets showing the expected behaviour in several different facets captured in sufficient scale [3]. But common datasets from real world contain only limited amounts of facets, like a small number of perspectives, a small amount of textures at the surface of objects, few repetitions of similar person activities or same resolutions. Creating a new dataset from real world or extending one is expensive, time consuming and needs error-prone human work for annotation.
Based on the experience of previous works with synthetic data to analyse different software and hardware systems [4,5], a system was created for automatic synthesising of humanoids, objects, scene environments and activities to produce scenarios of arbitrary structure forming new datasets with exact and well-defined ground truth. Due to the programmable nature of the system shown in Fig. 1, the variability and complexity of the facets are almost unlimited and only restricted by the amount of storage and time of the synthesis. Additionally, the content of the datasets may show hard to observe human behaviour, dangerous activities as well as accidents. Datasets containing these providing the capability to train assistance systems being able to detecting these occurrences.
On the other side it is possible to realize extensive quality checks, usability tests and performance evaluations on existing systems.

SYSTEM ARCHITECTURE AND TECHNICAL REALISATION
To generate the synthetic ground truth we use the 3D modelling tool MakeHuman 3 to create templates of photo realistic humanoids with different definitions of age, gender, height, width, muscularity, hairstyle and clothing. The anatomical structure of humans is approximated by 163 bonelike elements as shown in Fig. 2, permitting natural movements. Depending on the expected field of application, the test cases or by random selection the values of each definition is set, for instance dark clothing in case of testing the limitations of person detection systems in dark environments.
The definitions of motion be based on motion captures of humans and be modified and combined in well-defined ways to form the requested activity.
With the professional, open source 3D computer graphics software Blender 4 we define the scenes and set up the environment properties like light sources as well as cameras with their resolution and field of view, as shown in Fig.3.
The humanoids, their activities and the scene are combined to create the test scenario being rendered by Blender, as in Fig. 4. Modification of the inputs or their combination made it possible to generate nearly infinite numbers of different test cases being synthesized and overcome the limitations of real world datasets. At the same time a self-developed python-based program extract the exact and correct data from the internal storage of Blender to get the ground truth of the scenario. Therefore no further annotations are needed.
The synthesised scenario is applicable to examine visual systems like the OpenPose 5 pose detection system as shown in Fig. 5. Comparing the ground truth with these results made it possible to test the applicability of the system, to verify the correctness and to evaluate the performance.

SYSTEM APPLICATION AND EXPLORATORY ALGORITHM PROCESSING
In order to explorate the usefulness of the presented system we synthesise a scenario based on a model of our laboratory and a humanoid with beckon pose. The images of all ten sensors are captured and processed by OpenPose. The results are embedded into the images as overlay containing colored lines at the position of detection, as shown in Fig. 6.
As clearly recognisable the humanoid is well detected for views from all sides, but not from above. The detection of the nodes show good correlation to the ground truth from the scenario.

SUMMARY & FUTURE WORK
We create a system for the generation of datasets based on well-defined synthetic scenarios. We show the structure of our system with objects, humanoids and activities as well as a short empirical exploration to show the usefulness and the potential of the system and the synthesised datasets. Future developments will increase the quantity of fundamental building block to simplify the definitions and include the synthesis of sound propagation.