How many of us have used Net2Phone to contact relatives overseas? It’s an interesting concept, cheap telephone calls over the internet. Only there’s a problem, it’s not going to last, or if it does, it won’t be as cheap, and if it remains cheap enough, it won’t be good enough. The reason; what my professor calls his "law": "Man works at constant annoyance". To explain the law consider an example, suppose that it takes 1 hour to go from point A to point B on a highway. Now suppose that a newer, larger highway is constructed in parallel. For a while we will be able to make the trip in 30 mins. This news will travel to other people, as a result, more people will start making the journey because "it only takes 30 mins now". A time will come when it will now take 1.5 hours to make the same trip, and people will streamline, thus making the traffic come back to the 1 hour per trip level, and it will stay like that. So humans tolerate a certain level of annoyance, thus my professor’s law. In other words, we humans expand our needs to fill any given improvement/opportunity.
Now replace the highway by internet bandwidth and traffic with data packets, and you will see where I am going with all this. It’s true that over the years there has been a huge increase in available bandwidth worldwide, what we don’t see is that the consumption is fast outpacing the expansion. Example, 10 years ago when the internet was introduced in academia, the average email size was 1Kb, now the average email size is about 1Mb with all the fancy picture attachments and backgrounds we send (not to mention junk mail). So as the law of annoyance tells us, throwing more bandwidth (read faster networks) at the problem won’t solve it, as we will continue to come up with new ways to waste it. What we have to do is find a way to do voice over the existing bandwidth.
Time to delve a little into the technical nitty gritty of things, with apologies to the non-technical (read socially capable) segment of society. There are two kinds of networks in the world today: the telephone net, and the data net. These two networks use two very different schemes for dividing bandwidth, a process called multiplexing. Voice nets use TDM (time division multiplexing) where every node (telephone in action) is assigned a fixed timeslice on the wire. Data nets use statistical multiplexing, where every node (computer) contends for the time on the wire and writes huge amounts of data in bursts when it gets access. These two schemes are very good for their individual uses, but fail when it comes to the other. Let’s go deeper, we define the "burstiness" of a data source as the maximum rate it can transmit divided by the average rate it transmits at. Common sense tells us that the burstiness of voice is 2, because the maximum rate is 1 (one party is continuously speaking) and the average rate is 0.5 (the other party is continuously listening). On the other hand, computers are very bursty devices, they will sit quiet for a long time and then suddenly spit out large amounts of data. Experiments show that a figure of 100 for the burstiness of a computer is a good average. So it turns out that TDM is a good scheme for a constant data output such as a telephone, it keeps transmitting regularly, and there is regularly something to be put on the wire. On the other hand, statistical multiplexing is good for computers; they don’t ask for, and aren’t given, the network until they want it, and then they put a lot of data on it. What happens when we try to do a "mix and match"?
Let’s see.
If we use TDM for data nets, the time slices, which are in a fixed order and of constant duration, are given to every computer in turn. Most of the time the computer doesn’t have anything to transmit in its time slice, and when it does have a lot of data to transmit, it doesn’t have the timeslice. The result, slow communication. What happens in the other case (when voice is transmitted over statistical multiplexing). Here I ask you to make a leap of faith, and trust me. The amount of additional bandwidth utilized (called statistical gain, or Gs) when statistical multiplexing is used instead of TDM is simply the burstiness of the source (think about it for a while, it will make sense), thus Gs = peak_rate/avg_rate. So the statistical gain for voice would be only 2, or twice as much voice data in the same amount of bandwidth. Even this can’t be achieved, because when we use data nets, there is an overhead with the transmission of data. With every packet there are "headers" which specify where the data needs to go. So the actual gain comes out to be negligible. I can already hear the tech-savvy crowd screaming "COMPRESSION". Yes, if voice is sent as digital data, we can compress it, but when we compress voice we are moving more towards steady-state, or reducing its burstiness. So the reduction we get in data size is made up for by the reduction in statistical gain, in other words, we are chasing our tail. In the end, we will need the same amount of infrastructure for digital voice communication as we need for telephony today, which means, the same amount of fiberoptic cable will need to traverse the oceans.
So are we destined to have low-quality, delayed and broken conversations over Net2Phone? The answer, happily, is no. When you call someone, an end-to-end connection is made between your phone and your callee’s phone, the path is defined, and the bandwidth is reserved for your conversation. On an IP network (internet for example) a connection is NOT made. "Now wait a minute, what happens when I connect to a website? Or send an email or CONNECT to MSN?" you ask. The answer is simple, it’s not a real connection, it’s simply simulated as a connection by your and the server’s software. The links in the middle (the routers on the internet backbone) are completely unaware of your "connection". No path is defined for your data packets, no bandwidth is reserved, and no "quality of service" is guaranteed. A note to the network programmers: A TCP socket is not a "connection" in the true sense of the word, it’s only simulated by your TCP stack. The solution: simple, develop a new IP with QoS (quality of service). In essence, what QoS hopes to do is provide levels of priority for types of data and route the more important ones first. It will also predefine a path (route to the target) and guarantee bandwidth between two machines on the net. In essence, it will establish a true connection between the machines. What will be the consequence of all this? Simple, put together the paragraph about statistical gain and the fact that telephone nets do this already, and the answer is there; telephone quality VoIP will cost exactly the same as telephone does today! Why? Because the IP service providers will charge you for the guaranteed bandwidth and the high priority routing through their routers. A share of the extra revenue will go to all the router operators in the middle (the internet core) who will also guarantee your priority and reserve bandwidth for you. So your voice data will get there ahead of my web data but you will pay a price for it. And it will come out to be the same price that you pay for telephone calls today.
So why will VoIP take over telephone nets in the distant future? The answer is not technology, but management. It makes good business sense for large providers to integrate their data and voice operations. Low maintenance overhead (single team will be able to handle both sectors). VoIP will happen, provided that QoS is available in the next generation internet.

