I'm building a cross-platform web app where audio is generated on-the-fly on the server and live streamed to a browser client, probably via the HTML5 audio element. On the browser, I'll have Javascript-driven animations that must precisely sync with the played audio. "Precise" means that the audio and animation must be within a second of each other, and hopefully within 250ms (think lip-syncing). For various reasons, I can't do the audio and animation on the server and live-stream the resulting video.
Ideally, there would be little or no latency between the audio generation on the server and the audio playback on the browser, but my understanding is that latency will be difficult to control and probably in the 3-7 second range (browser-, environment-, network- and phase-of-the-moon-dependent). I can handle that, though, if I can precisely measure the actual latency on-the-fly so that my browser Javascript knows when to present the proper animated frame.
So, I need to precisely measure the latency between my handing audio to the streaming server (Icecast?), and the audio coming out of the speakers on the computer hosting the speaker. Some blue-sky possibilities:
Add metadata to the audio stream, and parse it from the playing audio (I understand this isn't possible using the standard audio element)
Add brief periods of pure silence to the audio, and then detect them on the browser (can audio elements yield the actual audio samples?)
Query the server and the browser as to the various buffer depths
Decode the streamed audio in Javascript and then grab the metadata
Any thoughts as to how I could do this?
Utilize timeupdate
event of <audio>
element, which is fired three to four times per second, to perform precise animations during streaming of media by checking .currentTime
of <audio>
element. Where animations or transitions can be started or stopped up to several times per second.
If available at browser, you can use fetch()
to request audio resource, at .then()
return response.body.getReader()
which returns a ReadableStream
of the resource; create a new MediaSource
object, set <audio>
or new Audio()
.src
to objectURL
of the MediaSource
; append first stream chunks at .read()
chained .then()
to sourceBuffer
of MediaSource
with .mode
set to "sequence"
; append remainder of chunks to sourceBuffer
at sourceBuffer
updateend
events.
If fetch()
response.body.getReader()
is not available at browser, you can still use timeupdate
or progress
event of <audio>
element to check .currentTime
, start or stop animations or transitions at required second of streaming media playback.
Use canplay
event of <audio>
element to play media when stream has accumulated adequate buffers at MediaSource
to proceed with playback.
You can use an object with properties set to numbers corresponding to .currentTime
of <audio>
where animation should occur, and values set to css
property of element which should be animated to perform precise animations.
At javascript
below, animations occur at every twenty second period, beginning at 0
, and at every sixty seconds until the media playback has concluded.
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
<style>
body {
width: 90vw;
height: 90vh;
background: #000;
transition: background 1s;
}
span {
font-family: Georgia;
font-size: 36px;
opacity: 0;
}
</style>
</head>
<body>
<audio controls></audio>
<br>
<span></span>
<script type="text/javascript">
window.onload = function() {
var url = "/path/to/audio";
// given 240 seconds total duration of audio
// 240/12 = 20
// properties correspond to `<audio>` `.currentTime`,
// values correspond to color to set at element
var colors = {
0: "red",
20: "blue",
40: "green",
60: "yellow",
80: "orange",
100: "purple",
120: "violet",
140: "brown",
160: "tan",
180: "gold",
200: "sienna",
220: "skyblue"
};
var body = document.querySelector("body");
var mediaSource = new MediaSource;
var audio = document.querySelector("audio");
var span = document.querySelector("span");
var color = window.getComputedStyle(body)
.getPropertyValue("background-color");
//console.log(mediaSource.readyState); // closed
var mimecodec = "audio/mpeg";
audio.oncanplay = function() {
this.play();
}
audio.ontimeupdate = function() {
// 240/12 = 20
var curr = Math.round(this.currentTime);
if (colors.hasOwnProperty(curr)) {
// set `color` to `colors[curr]`
color = colors[curr]
}
// animate `<span>` every 60 seconds
if (curr % 60 === 0 && span.innerHTML === "") {
var t = curr / 60;
span.innerHTML = t + " minute" + (t === 1 ? "" : "s")
+ " of " + Math.round(this.duration) / 60
+ " minutes of audio";
span.animate([{
opacity: 0
}, {
opacity: 1
}, {
opacity: 0
}], {
duration: 2500,
iterations: 1
})
.onfinish = function() {
span.innerHTML = ""
}
}
// change `background-color` of `body` every 20 seconds
body.style.backgroundColor = color;
console.log("current time:", curr
, "current background color:", color
, "duration:", this.duration);
}
// set `<audio>` `.src` to `mediaSource`
audio.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener("sourceopen", sourceOpen);
function sourceOpen(event) {
// if the media type is supported by `mediaSource`
// fetch resource, begin stream read,
// append stream to `sourceBuffer`
if (MediaSource.isTypeSupported(mimecodec)) {
var sourceBuffer = mediaSource.addSourceBuffer(mimecodec);
// set `sourceBuffer` `.mode` to `"sequence"`
sourceBuffer.mode = "sequence";
fetch(url)
// return `ReadableStream` of `response`
.then(response => response.body.getReader())
.then(reader => {
var processStream = (data) => {
if (data.done) {
return;
}
// append chunk of stream to `sourceBuffer`
sourceBuffer.appendBuffer(data.value);
}
// at `sourceBuffer` `updateend` call `reader.read()`,
// to read next chunk of stream, append chunk to
// `sourceBuffer`
sourceBuffer.addEventListener("updateend", function() {
reader.read().then(processStream);
});
// start processing stream
reader.read().then(processStream);
// do stuff `reader` is closed,
// read of stream is complete
return reader.closed.then(() => {
// signal end of stream to `mediaSource`
mediaSource.endOfStream();
return mediaSource.readyState;
})
})
// do stuff when `reader.closed`, `mediaSource` stream ended
.then(msg => console.log(msg))
}
// if `mimecodec` is not supported by `MediaSource`
else {
alert(mimecodec + " not supported");
}
};
}
</script>
</body>
</html>
plnkr http://plnkr.co/edit/fIm1Qp?p=preview
There no way for you to measure latency directly, but any AudioElement generate events like 'playing' if it just played (fired quite often), or 'stalled' if stoped streaming, or 'waiting' if data is loading. So what you can do, is to manipulate your video based on this events.
So play while stalled or waiting is fired, then continue playing video if playing fired again.
But I advice you check other events that might affect your flow (error for example would be important for you).
https://developer.mozilla.org/en-US/docs/Web/API/HTMLAudioElement
What i would try is first create a timestamp with performance.now, process the data, and record it in a blob with the new web recorder api.
The web recorder will ask user access to his audio card, this can be a problem for your app, but it look like mandatory to get the real latency.
As soon this done, there is many way to measure the actual latency between the generation and the actual rendering. Basically, a sound event.
For further reference and example:
Recorder demo
https://github.com/mdn/web-dictaphone/
https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder_API/Using_the_MediaRecorder_API