This probably is old news, but I was looking for introductory white papers about computer vision and came across this which I found fun and interesting so I risked sharing it:
Vocalist Jaap Blonk performing the Messa di Voce interactive software by Golan Levin and Zachary Lieberman (2003).
Messa di Voce, created by this article's author in collaboration with Zachary Lieberman, uses whole-body vision-based interactions similar to Krueger's, but combines them with speech analysis and situates them within a kind of projection-based augmented reality. In this audiovisual performance, the speech, shouts and songs produced by two abstract vocalists are visualized and augmented in real-time by synthetic graphics. To accomplish this, a computer uses a set of vision algorithms to track the locations of the performers' heads; this computer also analyzes the audio signals coming from the performers' microphones. In response, the system displays various kinds of visualizations on a projection screen located just behind the performers; these visualizations are synthesized in ways which are tightly coupled to the sounds being spoken and sung. With the help of the head-tracking system, moreover, these visualizations are projected such that they appear to emerge directly from the performers' mouths [Levin and Lieberman].
This article: http://www.flong.com/writings/texts/essay_cvad.html
More on the Audiovisual Performance & Installation: http://www.tmema.org/messa/messa.html
Messa di voce on youtube: http://www.youtube.com/results?search_query=messa+di+voce&search=Search