Multimedia Computing Project 2

Bolo - A Simple Audioconference

Due date: (about March 14th)


Index


Overview

An audioconference allows people to talk to each other from computers connected across a network. Although networked computers have been able to do audio well for over 10 years, the explosive growth of the World-Wide Web has fueled interest in the Internet and in it, interest in Internet telephony.

For this project, you are to write a basic audioconference, named Bolo and explore how some basic system parameters effect the quality of the audio stream. Bolo will incorporate speech detection, directly from your project 1, to avoid sending unnecessary silent packets onto the network.

Bolo can have a minimal user interface, but needs to support some command line parameters (or basic menu interface) to allow varying of system parameters. You are free to add any additional features as you would like.


Details

Bolo must run on Linux. You will have to get it working on two Linux boxes, actually.

Bolo will use standard Internet sockets to make connections between the audioconference processes. From any Internet host, a user should be able to connect to your other on any other Internet host, so you need a way to specify the hosts at run-time. You may wish to make the port numbers to which they connect dynamic, too, but that is optional.

Bolo must support both TCP and UDP sockets. You can have a default connection type, but there should be a way the user can specify the other socket type when Bolo starts.

Bolo should support a variety of samples intervals. Typical audioconferences sample the audio device every 20, 40 or 60 ms in order to keep latency low. You may choose one of these for the default, but must then provide a means to specify alternate sample sizes (up to a second) when Bolo starts. Running Bolo at larger sample intervals will give you an indication of how latency makes interactive communication difficult.

Bolo can enable basic speech detection if indicated at run-time. Since your sampling interval will likely be much smaller than the sample interval you used in project 1, searching backward for a zero-level crossing rate for 250ms is not practical. Thus, you can detect speech based on energy levels only. You can tune your speech detection threshold to work well in your environment. See computeEnergy.c and getThresh.c for one such example.

In order to evaluate how Internet packet loss affects audio, Bolo must be able to randomly drop packets it receives. Loss should be done on a packet level and at various rates when Bolo starts.


Hints

There are many different architectural solutions you can have for your Bolo. Rather than impose them, instead here is some sample code showing system calls that you may find helpful. Some must be used while others may be used depending upon your implementation:

All of the above sample calls work in Linux but may work in other environments, especially Unix environments, as well.

Use the man command to find out additional information on the system calls used.

While Bolo must be eventually evaluated with users at the console of two Linux boxes, you can and maybe should plan on doing much of the code development remotely. To do this, you can 'pre-record' some speech into several files and design Bolo to either read audio either from the sound card or from a file. The difference between reading from a file and reading from the sound card is only in the initial open() call and some initialization of the sound device. Similarly, you can have Bolo write to either the sound device or an output file. This "offline" design can enable you to develop the systems part of your code remotely, without needing to be on the console of two machines until the final evaluation.

(And in case some of you are wondering, the word Bolo comes from Hindi and is the imperative "speak", or more generally is the word for communication.)

Lastly, be warned. If you do not pause between sends (via usleep() or some other means) there will be trouble. Here is an excerpt:

From: "Charles R. Anderson" (cra@WPI.EDU)
To: netops@WPI.EDU
cc: Mark Claypool (claypool@WPI.EDU), (system@cs.wpi.edu)
Subject: Multicasting
Date: Thu, 12 Apr 2001 22:51:36 -0400 (EDT)

The recent 30-60 second network outages in Fuller Labs were caused by 10
megaBytes/sec of traffic being broadcast by a multicasting assignment gone
wrong in FL A21 (csacit6.WPI.EDU).  This effected a Denial of Service
(DoS) to the WPI mail and DNS name servers, CS servers, and the WPI
network in general.  I have spoken to the students involved and have
warned them to throttle their network traffic.

Prof. Claypool, it would be appreciated if you could warn your students to
limit the amount of traffic they send to multicast addresses.  Within a
subnet (Fuller Labs in this case), a multicast acts as a Layer 2
broadcast, which has the effect of a DoS attack.  Perhaps alternative
alternative network arrangements need to be made for these assignments,
since programming mistakes end up causing too much grief.

Thanks.

Charles R. Anderson                     cra@wpi.edu
Network Engineer                        (508) 831-6110
Computing and Communications Center     (508) 831-5115
Worcester Polytechnic Institute     Fax (508) 831-5483


Evaluation

You should first evaluate Bolo by conducting "pilot studies" with your group members (or just grab a friend if you don't have a group). Use Bolo for normal conversation, varying the system parameters. Record a perceptual quality (1-100) score for a 1-2 minute conversation for each type of connection. The cases you should specifically examine (individually, not in combination) are:

You may provide additional cases if you wish.

Second, you must evaluate your Bolo by running a simple user study. You should find 1-3 people outside of your group and have them use Bolo under loss rates of 0%, 5%, 20%. You should record a perceptual quality score (scale 1-100) and get some subjective comments ("sounded good", "was hard to talk", etc.)


Questions

When you are done with your project, provide brief answers to the following questions:

  1. What is a major detriment of using a TCP connection on a lossy network without speech detection?
  2. Does having a TCP vs. UDP connection matter on a LAN? Why or why not?
  3. Would a 480ms end-to-end network latency with a 20ms sample rate have the same impact on perceptual quality as a 500ms sample rate on a LAN? Why or why not?

Hand In

You must turn in:

Tar up (with gzip) your files, for example:

    mkdir proj2
    cp * proj2  /* copy all your files to submit to proj2 directory */
    tar -czf proj2.tgz proj2

then attach proj2.tgz to an email with "cs525_proj2" as the subject.


Return to the Multimedia Computing Home Page

Send all project questions to Mark Claypool.