DoxPara Research
12-Jan-1998 / Dan Kaminsky Vectorcast
A Proposal Regarding the Efficient Distribution of Data On High Bandwidth Networks
Preface
This was my Systems Programming Term Project a while back. I wrote all the conceptual stuff, and came up with the original idea, while my group and I collaborated on the sample implementation. The implementation is...who knows where. This is what I have. Comments to me at [4]effugas@best.com. Oh, most of this text was written in December of 1997, but some minor updates have been done as of January 1998.

Introduction: Why Do We Network?
Think of a network, any network, in fact, every network. Ignore the how, the when, the who, the whatever, just think of the why. Why do we seek to connect one computer to another? It is not an inexpensive investment for any institution, so how do we justify the substantial TCO (Total Cost of Ownership) that computer networking incurs? We can derive our justification by examining computers as an extension of those who utilize them: The purpose of civilization is to uncouple existence with the means necessary to sustain existence, i.e. I do not need to know how to farm in order to eat, nor do I need to know how to fabricate a CPU in order to take advantage of one. Computers are networked according to the same logic: Since computers cannot self-fabricate all that is necessary for them to be most effective, they must be able to go elsewhere for what they require. Thus, computer networks are quite simply about getting information from point A to point B.

Unicast: The Old Standby
The major model utilized for file distribution on the global Internet, as well as on most smaller networks, is the unicast model: One server sends a requested file to the client who requested it. This process is repeated for each additional client that requests the file. The more client requests, the more files the original server must send. If an error is detected in the transmission, such as packets received out of order, the server automatically retransmits packets to compensate for that computer's error. To use a more human analogy, it's like being a teacher who uses one on one sessions to educate students. This has the advantage in that the teacher can quickly deduce what concepts the student failed to grasp and teach accordingly. While this model works for teaching a few students, it fails if the teacher becomes responsible for hundreds of students simultaneously.

Multicast: Heir Apparent?
A newer model that is emerging is the multicast model. In the multicast model the server sends out the data only once, and the network takes care of sending the file to everybody who requires it. The multicast model radically simplifies batch file transmission for the server but demands substantial modifications in network infrastructure, as routers must be reprogrammed to send the same packet to multiple locations. It also demands a massive change in client behavior because the required information is no longer available whenever the client desires it. Instead, the client must wait until every other client is scheduled to receive. Furthermore, individual error checking is much harder to implement because the server is built to send the same message out to everyone. To extend our analogy, the teacher, unable to handle the load of hundreds of students demanding individual attention, requires that the students come to a classroom at a specified hour and listen to the lecture. If the students aren't there at the specified time, they don't get the knowledge they want. The problem is, since the classroom is so large and the teacher wants to teach to the group as a whole, a single student's lack of understanding is much more difficult to handle.

Reality: Neither Paradigm Suffices
Neither of these distribution methods effectively fulfills the design requirement of getting needed information from point A to point B. The unicast model makes the server a slave to the masses and the multicast model makes the masses a slave to the server. There must be something better, and I propose that there is: Vectorcast.

Vectorcast: An Analogy
For students, one of the most effective strategies for comprehending lessons is asking a fellow student for assistance. If the student waited to ask the teacher every time he/she didn't understand a concept, they would be spending more time waiting for the teacher to be free as opposed to learning. Therefore, the student asks classmates a question, one of whom might know the answer. The required information has thus been transferred to the student, not through the original source, but from a fellow student. This is the heart of Vectorcast.

Sharing The Load
Vectorcast is based on a simple idea that, presuming there isn't a considerable barrier between "vectors", a location that was once the destination of information should automatically become a source. Suppose you have two computers in a lab. If computer A spends an hour downloading an application from Japan, and computer B decides to receive that file it should be able to download it from computer A. It is not rational to utilize transatlantic links when there is a computer on the local subnet that has already retrieved the desired information. Vectorcast says that not only should the computer seize what it needs from the closest possible source, but also that once it obtains the data, it becomes a possible source. Hence, if a file is really popular, by definition, it is available on a large number of computers, and thus a lot of systems will be available to send that file out to the remaining hordes that still haven't gotten the file.

Order From Chaos
Another issue for vectorcast is how does each computer know which system has what files? One computer could query every computer it knows, and request a listing, but the network traffic caused by this arrangement increases exponentially. Vectorcasting thus depends on a director, or a single machine that tells all the rest of the machines where to get what they're looking for. The machines are still going to a single source to fulfill requests, but the director doesn't repeatedly distribute massive files. Rather it tracks who has which huge files and redirects the vector to the closest available vector with the requested data. It is a paradigm shift from a model of the central powerful computer that provides information to the weak ones to an egalitarian scenario where everybody serves everybody and the only purpose of the "central" computer is to guide and list instead of to actually distribute.

A major alternate purpose for the director is verification--how does the client know what it is recieving is authentic? Checksum archives hosted on directors or trusted director-networks are the solution to this. While this increases the load somewhat, it is far more preferable to serve a 128 bit checksum rather than a 8MB file. As a side bonus, the download of the 8MB file can be tracked even though its being sent from an alternate location, thus solving the cached hits quandry--the net(AOL in particular) can not run without caching, but optional reporting of cached hits inevitably creates unlogged downloads.

Conclusion
Vectorcast does seem to me to be the only solution to the growing demand for bandwidth. Infinitely scalable, self-tuning(the less popular a file becomes, the fewer people keep it around), and judicious in its use of network resources, it should eventually overtake all other non-interactive data transmission methodologies, and possibly some interactive ones.

I am willing to work with those implementing Vectorcast applications. Email me.

Access Archives
Mission
DoxPara Research exists as a repository for information security analysis, UI theory, and the miscellaneous writings of its founder, Dan Kaminsky.

Authorship

Writings
ZapMail Redux
RFID Security
The Absentee SIGGRAPH 2002 Review
Deaf and Dumb: A Critique
Speech Vs. Vision
Why Most Albums Suck
Tracing Smart Fridges
Password Rejected
Trinity Redux
Thoughts On Secure Deletion in 2001: Part 1
Thoughts On Secure Deletion in 2001: Part 2
On The Nature Of Data Shredding
Cryptography Doesn't Save Napster, and The War Over Parodies
Passfaces: An Intriguing Way To Authenticate
BugTRAQ-- Re: Security Hole in Win2K's FTP server

Security and Networking
Insecurity By Design: The Unforseen Consequences Of Login Script
TCP Chorusing in the Windows9x TCP/IP Stack
Vectorcast

Editorials
Core Competencies: Why Open Source Is The Optimum Economic Paradigm For Software
Mandatory Registration: Bad Business

User Interface Proposals
Analogous Key Arrays
Cluehunting