How Engineers Build A Comedy Club - Part III: Videos and Subtitles
Forewords. Welcome to the third installment of the series! These articles aim to present how my friends and I are building Chinese Comedy at Silicon Valley, a stand-up comedy club in Cupertino, CA, from a software engineer’s perspective.
In the last post, I went over the data pipeline behind our comedian roster. Another type of data that we frequently have to update is the video clips we record during our open-mic events. Let’s take a look at our hardware setup first!
Streaming. For those who can’t make it to the house, we livestream our open-mic events for free. It’s not until recently (est. Nov 2021) that YouTube relaxed its requirement for mobile streaming from 1,000 subscribers to 50, and we are planning to migrate to a phone-based streaming setup soon. For the time being, we are streaming from a webcam.
Recording. While performing, some comedians tend to move around. To avoid the hassle of panning and zooming manually, we opted for a DJI Pocket 2 as our main camera. This little gadget can keep our comedian in the shot by rotating its camera up to ridiculously wide angles. It’s basically a set-it-and-forget-it process, requiring the cameraperson to re-select a face to track only when performers switch positions.
Editing. Back home, our cameraperson splits the recorded video by performers. The segments are then uploaded to YouTube as drafts. From there, another volunteer picks which videos to release next week, leaving a mark in the title. Another staff then downloads the marked videos, generates subtitles with pyTranscriber, occasionally bursts into laughter because the performance is so funny, and uploads the SRT files back to YouTube. Yet another of us creates cover images, edits titles, and releases the videos afterwards.
Laughter detection. Behind the scene (no pun intended), pyTranscriber sends the speech audio to Google Speech Recognition API. This is not the only cool trick we can pull off with sound tracks, though: We are experimenting with automated laughter detection, a machine-learning model that extracts clips of laughters from a given audio stream. Our goal is to quantitatively measure how many times a given performance has cracked up its listeners. From there, we can evaluate — fairly and objectively — how entertaining each comedian is.
Putting data to use. The laughter statistics helps to tailor development plans for each performer. Our club strives to promote outstanding comedians: We can arrange solo performances for them or book them a tour to other clubs. On the other hand, we spare no effort in growing our less experienced amateurs. For example, we host virtual workshops every Wednesday night, which all first-timers to the open-mic (which falls on the Friday) are required to attend. Improving the quality of our performances is, we believe, our responsibility to our audiences.
As always, thanks for reading my article! I hope you enjoyed hearing about how we record, process, and making use of our videos. In the next installment, I will touch upon the various documentations that we compiled and abide. Stay tuned!