Making WebRTC Simple with conversat.io

WebRTC is awesome, but it’s a bit unapproachable. Last week, my colleagues and I at &yet released a couple of tools we hope will help make it more tinkerable and pose a real risk of actually being useful.

As a demo of these tools, we very quickly built a simple product called conversat.io that lets you create free, multi-user video calls with no account and no plugins, just by going to a url in a modern browser. Anyone who visits that same URL joins the call.

conversat.io

The purpose of conversat.io is two fold. First, it’s a useful communication tool. Our team uses And Bang for tasks and group chat, so being able to drop a link to a video conversation “room” into our team chat that people can join is super useful. Second, it’s a demo of the SimpleWebRTC.js library and the little signaling server that runs it, signalmaster.

(Both SimpleWebRTC and signalmaster are open sourced on Github and MIT licensed. Help us make them better!)

Quick note on browser support

WebRTC currently only works in Chrome stable and FireFox Nightlies (with the media.peerconnection.enabled preference enabled in about:config).

Hopefully we’ll see much broader browser support soon. I’m particularly excited about having WebRTC available on smartphones and tablets.

Approachability and adoption

I firmly believe that widespread adoption of new web technologies is directly corellated to how easy they are to play with. When I was a new JS developer, it was jQuery’s approachability that made me feel empowered to build cool stuff.

My falling in love with javascript all started with doing this with jQuery:

$('#demo').slideDown();

And then seeing the element move on my screen. I knew nothing. But as cheesy as it sounds, this simple thing left me feeling empowered to build more interesting things.

Socket.io did the same thing for people wanting to build apps that pushed data from the server to the client:

// server:
client.emit("something", {
    some: "data" 
});
// client:
socket = io.connect();
socket.on("something", function (data) {
    // here's my data!
    console.log(data);
});

Rather than having to figure out how to set up long-polling, BOSH, and XMPP in order to get data pushed out to the browser, I could now just send messages to the browser. In fact, if I didn’t want to, I didn’t even have to think about serializing and de-serializing. I could now just pass simple javascript objects seamlessly back and forth between the client and server.

I’ve heard some “hardcore” devs complain that tools like this lead to too many poorly made tools and too many “wannabe” developers who don’t know what they’re doing. That’s garbage.

Approachable tools that make developers feel empowered to build cool stuff is the reason the web is as successful and vibrant as it is.

Tools like this are the gateway drug for getting us hooked on building things on these types of technologies. They introduce the concept and help us think about what could be built. Whether or not we ultimately end up building the final app with the tool whose simplicity introduced it to us is irrelevant.

The potential of WebRTC

I’m convinced WebRTC has the potential to have a huge impact on how we communicate. It already has for our team at &yet. Sure, we already used stuff like Skype, Facetime, and Google Hangouts. But the simplicity and convenience of just opening a URL in a browser and instantly being in a conversation is powerful.

Once this technology is broadly available and on mobile devices, it’s nothing short of a game changer for communications.

Challenges

There are definitely quite a few hurdles that get in the way of just playing with WebRTC: complexity and browser differences in instantiating peer connections, generating and processing signaling messages, and attaching media streams to video elements.

Even at the point you have those things, you still need a way to let two users find each other and have a mechanism for each user to send the proper signaling messages directly to the other user or users that they want to connect to.

SimpleWebRTC.js is our answer to the clientside complexities. It abstracts away API differences between Firefox and Chrome.

Using SimpleWebRTC

At its simplest, you just need to include the SimpleWebRTC.js script, provide a container for your local video, a container for the remote video(s) like this:

<!DOCTYPE html>
<html>
    <head>
        <script src="http://simplewebrtc.com/latest.js"></script> 
    </head>
    <body>
        <div id="localVideo"></div>
        <div id="remoteVideos"></div>
    </body>
</html>

Then in you just init a webrtc object and tell it which containers to use:

var webrtc = new WebRTC({
    // the id of (or actual element) to hold "our" video
    localVideoEl: 'localVideo',
 
    // the id of or actual element that will hold remote videos
    remoteVideosEl: 'remoteVideos',
 
     // immediately ask for camera access
    autoRequestMedia: true
});

At this point, if you run the code above, you’ll see your video turn on and render in the container you gave it.

The next step is to actually specify who you want to connect to.

For simplicity and maximum “tinkerability” we do this by asking that both users who want to connect to each other join the same “room”, which basically means: call “join” with the same string.

So, for demonstration purposes we’ll just tell our webrtc to join a certain room once it’s ready (meaning it’s connected to the signaling server). We do this like so:

// we have to wait until it's ready
webrtc.on('readyToCall', function () {
    // you can name it anything
    webrtc.joinRoom('your awesome room name');
});

Once a user has done this, he/she is ready and waiting for someone to join.

If you want to test this locally, you can either open it in Firefox and Chrome or in two tabs within Chrome. (Firefox doesn’t yet let two tabs both access local media).

At this point, you should automatically be connected and be having a lively (probably very echo-y!) conversation with yourself.

If you happen to be me, it’d look like this:

henrik in conversat.io

The signaling server

The example above will connect to a sandbox signaling server we keep running to make it easy to mess around with this stuff.

We aim to keep it available for people to use to play with SimpleWebRTC, but it’s definitely not meant for production use and we may kill it or restart it at any time.

If you want to actually build an app that depends on it, you can either run one yourself, or if you’d rather not mess with it, we can host, and keep up to date, and help scale one for you. The code for that server is on github.

You can just pass a URL to a different signaling server as part of your config by passing a “url” option when initiating your webrtc object.

So, what’s it actually doing under the hood?

It’s not too bad, really. You can read the full source of the client library here: https://github.com/HenrikJoreteg/SimpleWebRTC/blob/master/simplewebrtc.js and the signaling server here: https://github.com/andyet/signalmaster/blob/master/server.js

The process of starting a video call in conversat.io looks something like this:

  1. Establish connection to the signaling server. It does this with socket.io and connects to our sandbox signaling server at: http://signaling.simplewebrtc.com:8888

  2. Request access to local video camera by calling browser prefixed getUserMedia.

  3. Create or get local video element and attach the stream that we get from getUserMedia to the video element.

    firefox:

    element.mozSrcObject = stream; element.play();

    webkit:

    element.autoplay = true;
    element.src = webkitURL.createObjectURL(stream);
  4. Call joinRoom which sends a socket.io message to the signaling server telling it the name of the room name it wants to connect to. The signaling server will either create the room if it doesn’t exist or join it if it does. All I mean by “room” is that the particular socket.io session ID is grouped by that room name so we can broadcast messages about people joining/leaving that room to only the clients connected to that room.

  5. Now we play an awesome rocket lander game that @fritzy wrote while we wait for someone to join us:

  6. When someone else joins the same “room” we broadcast that to the other connected users and we create a Conversation object that we’ve defined which wraps the browser’s peerConnection. The peer connection represents, as you’d probably guess, the connection between you and another person.

  7. The signaling server broadcasts the new socket.io session ID to each user in the room and each user’s client creates a Conversation object for every other user in the room.

  8. At this point we have a mechanism of knowing who to connect to and how to send direct messages to each of their sessions.

  9. Now we use the peerConnection to create an “offer” and store our local offer and set it in our peer connection as the local description. This contains information about how another client can reach and talk to our browser.

    peerConnection.createOffer();

    We then send this over our socket.io connection to the other people in the room.

  10. When a client receives and offer we add it to our peer connection:

    var remote = new RTCSessionDescription(message.payload);
    peerConnection.setRemoteDescriptionremoteDescription);

    and generate an answer by calling peerConnection.createAnswer() and send that back to the person we got the offer from.

  11. When the answer is received we set it as the remote description. Then we create and send ICE Candidates much in the same way. This will negotiate our connection and connect us.

  12. If that process is successful we’ll get an onaddstream event from our peer connection and we can then create a video element and attach that stream to it. At this point the video call should be in progress.

If you wish to dig into it further, send pull requests and file issues on the SimpleWebRTC project on github.

The road ahead

This is just a start. Help us make this stuff better!

There’s a lot more we’d like to see with this:

  1. Making the signaling piece more pluggable (so you can use whatever you want).
  2. Adding support for pausing and resuming video/audio.
  3. It’d be great to be able to figure out who’s talking and emit an event to other connected users when that changes.
  4. Better control over handling/rejecting incoming requests.
  5. Setting max connections, perhaps determined based on HTML5 connection APIs?

Hit me up on twitter (@henrikjoreteg) if you do something cool with this stuff or run into issues or just want to talk about it. I’d love to hear from you.

Keep building awesome stuff, you amazing web people! Go go gadget Internet!

View full post on Mozilla Hacks – the Web developer blog

Leave a Reply