Enhancing Global Collaboration Through Network-empowered Live Performance

James C. Oliverio Digital Worlds Institute University of Florida Florida 32611, USA james@digitalworlds.ufl.edu Angelos Barmpoutis Digital Worlds Institute University of Florida Florida 32611, USA angelos@digitalworlds.ufl.edu Chad Juehring Digital Worlds Institute University of Florida Florida 32611, USA chad.juehring@gmail.com Anton Yudin Digital Worlds Institute University of Florida Florida 32611, USA antonyyudin@gmail.com


INTRODUCTION
Research and development of real-time arts performance systems has been underway at the University of Florida Digital Worlds Institute since 2001.Significant attributes of this research include the successful facilitation of synchronous globalscale performing arts events, the evolution of process and practice for arts and engineering collaborations between multi-point performance sites across the high-speed network, and the development and utilization of a unique toolkit of techniques and technologies.
Examples of global-scale networked performances profiled herein include: the synchronous musical union of ethnic performers located in seven cities across five continents for "In Common: Time" at SIGGRAPH 2005; a quartet of modern dancers located in four remote cities across Asia and North American motion-captured and mapped into a single shared Cartesian coordinate space while performing on virtual percussion instruments with 3D audio in "Same Space Same Time" (2010); the integration of multiple remote audiences providing feedback on their mobile devices (aggregated, visualized and displayed to the performers in realtime) during a multi-continental performance featuring network-attached Kinect devices driving synchronous representations of the distributed performers through a gaming engine for "Icons of Innovation" at IDMAA 2012.
In addition to developing the methodologies necessary to integrate various traditional and emergent technologies into these multi-faceted real-time performance systems, a number of novel techniques and collaborative relationships have resulted from this work.Using the aforementioned distributed performances as exemplars, we will provide insight into the aesthetic, procedural, technological, and logistic considerations inherent in working with artists, engineers, and media producers across multiple time zones, cultures, and sub-nets.After more than a decade of work in this emergent area, important outcomes include both personal and institutional relationships that have developed and blossomed at a global scale, as well as a much broader cross-cultural understanding.
We have learned a considerable number of lessons that can help optimize the strategic planning and implementation of distributed performing arts events.We herein offer not only background and recommendations for those interested in working in this space, but also examples of the specific tools, techniques and technologies we have developed and integrated into the design and production of network-empowered live performance.

Enhancing global collaboration through network-empowered live performance
James C. Oliverio, Angelos Barmpoutis, Chad Juehring & Anton Yudin While early 21 st century audiences will understandably carry the aesthetic predispositions of late 20 th century media culture for some time to come, it is hoped that an appreciation for both the technological and artistic achievement necessary to create network-empowered live performance will develop internationally in the near future.We offer this overview of our work in hopes of helping to inform the emergence of said new aesthetic.

EARLY WORK: "DANCING BEYOND BOUNDARIES" AND "NON DIVISI"
The first real-time network-distributed performance undertaken by the University of Florida (UF) Digital Worlds Institute occurred in 2001 at the global Supercomputing Conference in Denver, Colorado.A team of choreographers, dancers, musicians, network engineers, and video producers located across North and South America came together over the Internet2 to create, rehearse and perform a new work entitled "Dancing Beyond Boundaries" at the conference site over a four-day period.(Oliverio et al 2002).Percussionists were located in Brazil, with instrumentalists in Florida, two large groups of dancers performing synchronously in Minnesota, and Florida, and dance soloists appearing before the conference audiences in Denver.The live performance elements were brought together via Access Grid video technologies.The response from the computer scientists and network engineers at the Supercomputing conference was positive and enthusiastic, and the awards committee felt compelled to create a special award to acknowledge the work; the "Dancing Beyond Boundaries" (DBB) team was honoured with a novel prize for "Most Creative and Courageous Use of the High-Speed Network".
Many of members of this original collaborative team subsequently continued their joint efforts, and a second major international collaboration ensued, this time adding new partners in Asia from the Korean Advanced Institute of Science and Technology (KAIST) and in South America from the RED Universitaria Nacional (REUNA) in Santiago, Chile.The resultant work was called "Non Divisi", taken from an orchestral music term meaning "not divided".
A new tool created at the UF Digital Worlds Institute dubbed "The NetroNome ™" was used to keep the live musicians in Chile and Florida together whilst synchronously performing a metrically-complex composition despite being separated by a geographic distance of over 4,000 miles (Oliverio et al. 2008).As a result of this tricontinental collaboration, professors and students from three previously unaligned institutions formed both personal and institutional relationships resulting in additional international performances, cross-cultural understanding and reciprocal faculty and student visits between the partnering institutions.

Empowering Technologies: DBB
The connectivity required to successfully join the multiple geographically-remote participants of DBB was powered by an early open-source video conferencing tool known as the Access Grid, running across the Internet2 (Simco 2002).Each of the participating locations was set up to allow multiple streams of live video and audio to be fed into the Grid.At the epicentre of the distributed live performance (the 2001 global Supercomputing Conference in Denver) all of the disparate streams were combined into a projected visual display that framed the live dancers on the conference floor.

Empowering Technologies: "Non Divisi"
The primary collaborative platform that facilitated Non-Divisi was also the Access Grid video conferencing environment, with multiple cameras located at each of the institutional locations.Multichannel audio was sourced and mixed in Chile and UF.The NetroNome™ system provided a latencyadjusted click that served as the "common conductor" between the geographically remote musicians.Since the network latency between each of the three locations was unique at each site, an essential feature of the NetroNome™ software design was the ability to vary each master conductor stream individually.The NetroNome™ was a collaborative development between software engineer Andy Quay, composer James Oliverio and Joella Wilson.
The conductor stream itself was generated from a MIDI file of the musical score, permitting the complexities of multiple time-signature changes to be acknowledged by all of the musicians across North and South America.Multiple percussionists in Chile provided the rhythmic underpinnings for a musical ensemble at the University of Florida consisting of marimba, flute, bass clarinet, bass guitar and additional percussion.The composite live audio mix then motivated dancers in Korea and at the UF Digital Worlds Institute to collaborate in the choreographic, rehearsal and performance aspects of the work."Hands Across the Ocean" (HAO) depicted a student in a traditional (boring) classroom setting who falls asleep during a particularly pedantic lecture on aspects of diverse musical harmonic performance practices.As a part of the student's "daydream" she actually visits ethnic musicians from each of the cultures being discussed in the lecture.But instead of simply hearing "about" each musical culture, she is able to interact with the diverse musicians in real-time, learning about their instruments and melodic traditions directly from the respective practitioners.She is able to do this (even without knowing their respective languages) because of the gesture and aural-based communications afforded by the real-time video and audio system.When the student is startled out of her reverie by the instructor asking her what she learned about the harmonic characteristics of various pieces of world music (whilst apparently asleep) the student stands and gives an impressively lucid account about each of the music cultures she visited.The work concludes with said student actually leading a multi-continental live musical performance demonstrating that it is essential to have multiple "voices" to actually make "harmony".
As a result of this demonstration, the sponsors of the event in Doncaster successfully made their case to the higher education authorities.Digital Worlds was also award the Peoria Prize for Creativity for producing the event (Discovery Worlds 2005).
In the summer of that same year, an expanded international team collaborated across multiple time zones to create "In Common: During the breaks at the globallyconnected rehearsals, participants (and even their younger siblings) continued to dialogue and learn more about their counterparts' culture and traditions, which then led to subsequent expanded relationships and collaborations after the event.

Empowering Technologies: "HAO" & "ICT"
Both HAO and ICT made extensive use of an updated version of the NetroNome™ to synchronize instrumental and vocal performances of on-camera participants.Enhanced grid software provided by IOCOM (n.d.) enabled literally as many as two dozen simultaneous video feeds to be assembled and plotted in a common graphical visual environment displayed to the live audiences.Vancouver, British Columbia (iDMAa, 2010).Four dancers, located respectively in Tokyo, New York, Florida, and Vancouver, were each motioncaptured locally, mapped onto color-coded avatars in real time, and then composited together into a shared virtual coordinate space over the Internet.In addition to a multi-continental choreographic process, the dancers were also asked to perform on virtual percussion instruments located above each of their heads at their respective locations.Audiences at IDMAA 2010 witnessed both the individual video streams of the live dancers onscreen and the composite virtual performance space in which the dancers (despite being separated by the Pacific Ocean and the width of North America) performed responsively to each other with multi-channel spatialized audio emanating from each of the respective locations.Onscreen particle systems were generated when one dancer's limb would intersect with another's' allowing the audience to see that these "virtual intersections" taking place.
SSST began with a series of duets coupling the Tokyo-based dancer with the Florida-based dancer, followed by a Vancouver-New York interaction.
Beyond the notion of each dancer performing on her own virtual percussion instrument, the performers were actually able to reach into their partner's space and trigger the partner's musical instrument as well as their own.
At the conclusion of SSST, even though the performers are physically distributed across the planet, all of the dancers manoeuvre themselves into the exact same position in the shared coordinate space, literally appearing to become "one" avatar due to a confluence currently only now possible in shared "virtual" space.

Empowering Technologies: SSST
SSST used a novel assemblage of typically unrelated devices and technologies.In each of the four locations, an Organic Motion (n.d.) wireless, markerless motion capture system was used to ingest the modern dance movement.The mocap data was then fed into a customized networkattached environment that visualized the movement into colour-coded avatars.The four different moving avatars were mapped simultaneously into a shared Cartesian coordinate environment.Dancers at each location saw themselves "embodied" with their remote partners and, due to the low latency of the system, the performers were able to synchronize and align their movements in much the same way they would typically do in a traditional stage.
From a technical point of view, one of the novel aspects of this system (dubbed "Manifold" by programmer Anton Yudin) is the "openness" of system.As noted in Figure 5, different parts of the system can be executed on markedly different computers, operating systems and platforms, yet they manipulate the same "spatial" data assembled by the central server.This allows creative applications beyond simply displaying the 3D representation of the motion-data; the audio module of the system also uses the 3D data and collision detection module, but never displays the data visually.The sound module has a different purposeto translate events happening in 3D space to sound commands (OSC messages).This was made possible through a common protocol employed to enable all systems to communicate with each other.The protocol is transported over HTTP, which makes it even more "accessible".In fact, the protocol allows either live or pre-recorded motions to be played back (or streamed to a client) and thus entire hybrid performances to be motionrecorded.

Figure 5: System components for live motion capture across four networked locations mapped into a common
Cartesian area for "Same Space Same Time" (2010)

"ICONS OF INNOVATION"
Extending the use of multiple network-attached motion capture systems from the higher-end Organic Motion system to a more generallyavailable (and affordable) consumer-level motion sensor, "Icons of Innovation" demonstrated another confluence of typically disparate technological and cultural components.As premiered at the IDMAA 2012 conference at the New World School of the Arts (NWSA) in Miami, "Icons" united two troupes of geographically separated dancerseach being captured by a Microsoft Kinect (Wikipedia 2015) devicethrough both network-based video and real-time avatar representations.A variety of iconic inventors and innovators from throughout world history were modelled into 3D avatars, and their subsequent motions were driven by the dancers, fed into the Unreal game engine, and then joined over the network across the performance locations at NWSA and the UF Digital Worlds Institute.Two additional audiences were located in Salt Lake City, Utah and in Santiago, Chile.
Audiences at all four locations were then given the opportunity to not only watch the live performances, but also to actually provide their real-time feedback on specific aspects of the presentation before or after they were to occur.For example, the audiences would be given a one-minute period whenever they saw a symbol appear on the main screen.During the interval they would be presented with up to four different choices that they could select via browser on their smartphones.A novel voting system was created and employed to then instantly tabulate the feedback from all four audiences, display the results to both audiences and performers, and thus determine the contents or direction the next scenario in the performance would take.
In a specific example, the distributed audience was asked if they would prefer to see either Steve Jobs or Albert Einstein portrayed in an ensuing scene to meet Gutenberg, DaVinci, and Edison.Once the polling had closed and the voting was tabulated and displayed, the performers would quickly acquire appropriate props and interact in movement vocabularies previously determined for each of the potentially chosen historic icons.At the conclusion of the performance, the distributed audience was once again asked to give their realtime feedback, this time for their favourite iconic figure from the entire production.

Empowering Technologies: "ICONS of INNOVATION"
The audience feedback and display system, created by Chad Juehring at the Digital Worlds Institute, was designed to be accessible from any modern smartphone with an Internet connection.It provided a simple and intuitive user interface for the audiences (again, distributed across four cities in North and South America) to provide real time feedback throughout the performance.Each of the four separate audiences had an identical set of real-time choices between scenarios, and once the polling had closed every location could see a visualization of the results by location and in summary.
Nearly every person with a smart phone in each of the distributed audiences was able to readily access the audience system and use it multiple times during the performances.The Kinect is a consumer-level device originally marketed as a peripheral device for natural user interaction with Microsoft's game console XBOX.
The Kinect device contains multiple sensors for tracking the actions of the users including a conventional RGB camera, a depth sensor, an array of microphones for voice identification and speech recognition, and an accelerometer for tracking the orientation of the floor.For the purposes of our performances we utilized the depth sensor of Kinect, which produces depth frame sequences using structured light generated by an infrared projector and monitored by an infrared camera, both of which are internal components of the Kinect hardware.
The resolution of the depth sensor was 320 × 240 pixels at 30 frames per second and was calibrated so that it records depth in the range from 0.8m to 4.0m, which corresponds to an adequate space for two artists to perform within the field of view.The input sequence of depth frames was processed in real-time using the skeleton fitting algorithm provided in the Microsoft Kinect SDK and estimated the positions and orientations of 20 major joints in the body of the users (in our application, performing artists).While the Kinect was originally intended to Enhancing global collaboration through network-empowered live performance James C. Oliverio, Angelos Barmpoutis, Chad Juehring & Anton Yudin be installed in front of a TV set in a typical living room setting, our performance application would not provide this physical configuration.
To overcome some of the technical limitations of the technology (such as limited resolution and fieldof-view) the algorithm was optimized to track up to two users at a time.Furthermore, to enhance the robustness of the estimated skeletons to be received from the dancers, the algorithm used various a-priori conditions such as the assumption that the user is always facing the camera.The Kinect sensor was connected via USB2 to a 64-bit computer with Intel Core i5 CPU at 2.30GHz and 4GB RAM.The computer was processing the skeleton streams received from the Kinect sensor and was streaming the position of each skeleton and the orientations of the joints to a server using the Unreal game engine.The Unreal engine received all skeleton data from the server and was then used to visualize animated avatars in real-time within a shared 3D virtual environment.

Figure 7:
The schematic of two geographically-displaced Kinect systems (in Miami and in Gainesville, Florida) joined over the network to allow four dancers' movements to be simultaneously captured, animated and joined in shared virtual space

CONCLUSIONS
At the core of our work is the belief that human creativity can be significantly enhanced with diverse influences and collaborative opportunities.By using the potential of the global Internet as not only a communications channel but also as an actual creative tool, we are empowered to seek and potentially achieve a much broader cross-cultural understanding.This can now come about through real-time interaction and the nurturing of personal and institutional relationships at the regional, national and international levels.
While contemporary culture certainly acknowledges (and uses) Internet-based technologies on a daily basis, it seems that the aesthetic dimension of said usage may often be limited to downloading music and video while returning countless "selfies" into the global repository of social media.With attendance at traditional live performance events often dwindling (and with it the attendant potential to appreciate the value of traditional theaters' constituent art forms) our work seeks to revitalize the immediacy of real-time dance, music and drama with the new means at our disposal.This is not to say that we should abandon traditional performance venues in favour of screenbased connectivity.On the contrary, we can create a resurgence in live performance by connecting traditional venues together to create networkenhanced performance spaces, real-time collaboration and interaction spaces into which diverse audiences can get to know not only the artists, but each other.
In joining international artists, engineers, scientists, media producers and audiences together in network-empowered performances, we are ultimately creating new and accessible platforms for global collaboration.While our initial work over the past decade has been undertaken primarily in research universities and academic conferences, the rapid proliferation of available bandwidth and connectivity offers an unprecedented opportunity to use the Internet to both connect and create.

Figure 1 :
Figure 1: Locations of the artists and engineers joined by the Internet2 for Dancing BeyondBoundaries (2001)

Figure 2 :
Figure 2: Detailed System components for each location connected by the Internet2 for Non Divisi (2003)

Figure 3 :
Figure 3: Real-time panoramic screen capture from "In Common: Time" (SIGGRAPH 2005).Performers (positioned Left to Right) in Korea, Australia, Los Angeles, Florida, Chile, and the UK are performing synchronously with the Netronome ™

Figure 6 :
Figure 6: Components of the "Icons of Innovation" audience interaction system, allowing real-time feedback via smartphones across four geographically-separated locations in North and South America To capture the movements of the two dance troupes (one at NWSA in Miami and the other at the Digital Worlds Institute at the University of Florida) we used the Kinect sensor by Microsoft.The Kinect is a consumer-level device originally marketed as a peripheral device for natural user interaction with Microsoft's game console XBOX.The Kinect device contains multiple sensors for tracking the actions of the users including a conventional RGB camera, a depth sensor, an array of microphones for voice identification and speech recognition, and an accelerometer for tracking the orientation of the floor.For the purposes of our performances we utilized the depth sensor of Kinect, which produces depth frame sequences using structured light generated by an infrared projector and monitored by an infrared camera, both of which are internal components of the Kinect hardware.