Transcendence: Behind the scenes

February 10, 2022

The “Transcendence” project was an exploration of applying technology to personal data in the form of face portrait images and recorded voice samples. This post describes some of the tech used to build it.

Data analysis

The two different data sets — 1 set of images, 1 set of audio recordings — was processed using two libraries: face-api.js and pitchfinder respectively.

Turns out both operations could take noticeable time to complete (on the order of tens of seconds), given large enough input sizes. In Chrome and Firefox this was not noticeable, but in Safari (as usual…) the UI would freeze and the pretty spinning loader would not actually be shown. (Presumably because it doesn’t repaint while being busy with the long running task?)

To get around this I turned to Web Workers, trying to shift all that heavy computing out of the main thread. Everything was a smooth ride for the audio processing. Not so much for the face detection — getting face-api.js to work in a Web Worker seemed to be an impossible task (or at least too time consuming for this simple proof-of-concept) despite claims of working examples. However, the basic (blocking) implementation is shown below.

Supporting multi upload, a set of images can be analysed to detect any contained face and extract information about the identified features of the face (face landmarks), estimated age and gender:

Similarly, a pitch detection algorithm was applied to the audio recordings, which samples the audio data and returns the detected frequencies:

Web Worker code

Note: The results are printed in the Javascript console.

Data visualization

The extracted data was visualized using Chart.js. Some of the noteworthy features utilized:

Using a gradient as line color (used for visualizing the binary gender classifications).
The annotation plugin for drawing horizontal lines and boxes (used for example to show the average male and female voice ranges).
The boxplot plugin for showing the five-number summary for a set of values (used for visualizing the voice frequency data).

Text-to-speech

The poem “This is not my voice” was rendered using Azure Text to Speech.

It has a nice variety of voices and offers a graphical web interface for editing the audio content (while also supporting raw SSML).