I am the host of “.NET Rocks!”, an Internet audio talk show for .NET developers online at www.dotnetrocks.com and msdn.microsoft.com/dotnetrocks. My co-host Richard Campbell and I interview the movers and shakers in the .NET community. We now have over 160 shows archived online, and we publish a new show every Monday morning. For more history of the show check out the May/June 2004 issue of CoDe Magazine, in which the first column appeared. In each issue, I like to highlight some of my favorite moments from a recent show.
In show #157 we spoke with David Smith, a Microsoft Student Ambassador at Michigan State University, and a C# MVP who is very interested in distributed systems. We talked about BitTorrent, IPv6, and BTSharp, a set of components that he has written in C# 2.0 that provide a full BitTorrent implementation. This was an interesting discussion not just because of the programming involved, but in how protocols like BitTorrent change the nature of the Internet, and could bring about a new era in publishing. BTSharp is available at http://shrinkster.com/a38
Carl Franklin: Are you still the only .NET implementation of BitTorrent that is available?
David Smith: Actually [on] my Web site I have had a lot of people posting some links to their own projects. It turns out that a lot of people have been working on it, kind of in the woodwork, and some people actually have just ported over the current implementation of BitTorrent that exists in Python-but it really doesn’t take advantage of all of the great things that C# has to offer and it’s not very fun to program against. One of them that exists is BitTorrent.Net and that’s the most common one that some other developers have been playing with, but I have got a lot of positive feedback from BTSharp and people want me to keep working on it.
Carl Franklin: Well, let’s talk about what it is first. Let’s tell everybody what BitTorrent is. They [may] have seen it, they have heard of it-probably in the context of some nefarious activities-but let’s just clear the air and tell everybody what it is.
David Smith: Well, BitTorrent is an algorithm for distributing information and this can be your photos to your families, this can be a photo to your friends, some music that you want to share with somebody else, some music that you made or even a show that you recorded that you wanted to distribute to people online.
Carl Franklin: Well, anything digital, basically.
David Smith: Yeah, absolutely anything.
Richard Campbell: And it wasn’t the beginning of all this distributing versions of Linux?
David Smith: Yes, absolutely. Those Linux kernels are really, really big and those images are really, really big-and they didn’t want to have to pay for the server costs to keep those downloads going. So they thought they would push that burden off to the client and they could have each person who is downloading the file, help to share that same file.
Carl Franklin: Now David there have been a lot of peer-to-peer programs out there especially, you heard of Kazaa, and Napster did this similar thing. There are a lot of programs out there that people use to share. And let’s face it, they’re sharing copyrighted material: movies, MP3s, [etc…], but what’s different about BitTorrent from the rest of these? I mean, BitTorrent has got a lot more notice in the community because it’s so good-what makes it different than all these Gnutella things and all these other peer-to-peer protocols?
David Smith: I think it really has a lot to do with the user base. So the people [who] are using Kazaa and LimeWire, I mean you get those programs because you want to download music. BitTorrent has always been a much more open program from the start. It’s had a much more open audience, and the majority of the people that have been using BitTorrent have been using it for legitimate causes and good reasons, so BitTorrent has just got a better reputation.
Carl Franklin: Okay now, hold on now, because from where I sit BitTorrent has a pretty bad reputation because it’s being used extensively to shuttle around complete copies of DVDs and movies and all this stuff. The BitTorrent traffic that you see, that by the way makes up the number one use of bandwidth on the Internet-surpassing e-mail-isn’t because they are passing around legitimate copies of Linux, right?
David Smith: Yeah, absolutely. The inventor of BitTorrent, Bram Cohen, actually has been publicizing the fact [on] his Web site [that] he will not allow pirated materials to be searched for. It really takes a public voice to stand up and say, “We don’t want this technology to be used in order to abuse copyright.” It takes responsibility and it takes responsible people.
Carl Franklin: I will be the first to stand up for it. I am a content producer. It is in my best interest to get my content distributed as widely as possible for free, but that’s not the model that a lot of people have been using. But you’re right, I think BitTorrent has that reputation because it’s so good, it’s used for downloading and passing around large files-and it really is fast. Can you explain how BitTorrent works and why it’s as good as it is?
David Smith: Yes absolutely. It’s a very simple algorithm, actually. There exists a tracker, which could be a HTTP server or Web server somewhere on the Internet. And this Web server’s only job is to keep track of the IP addresses of the people who currently are downloading this file. Then somebody who wants to publish a file goes ahead and makes a Meta file and publishes it on some part of the Internet…
Carl Franklin: And this is the actual .Torrent file, right?
David Smith: Yes, this is the .Torrent file that has the file size, it has the file name, it has the directory structure of whatever you’re downloading, all the things that are needed to initialize the saving of this file to your disk and then you download the .Torrent file and then your BitTorrent client will parse that .Torrent file. It will then save, it will start the files and initialize the file sizes for everything and then you go to your tracker and when you first hit that tracker with your GET request, it will give you a list of IP addresses of everybody that’s sharing the file.
Carl Franklin: Everybody that’s also connected to the tracker, in other words.
David Smith: When you actually connect to the tracker, it’s just like hitting a Web page. It’s like going to Google and hitting that Web page and then you are done with it. Then you have all those IP addresses, and your connection to the tracker ends. And then you initialize TCP connections to 4 to 7 people at a time, and you send requests to those people trying to get different pieces of the file. It’s all distributed and it’s very random, and random is the best part of the algorithm.
Carl Franklin: Right. So in other words, let’s say if there is a hundred pieces in this file, and they are really like 512K chunks, right? So let’s say if there’s a hundred pieces and there’s five other people trying to download at the same time. I download piece 1, 49 and 74 and somebody else downloads 5, 56 and 82 then the tracker knows that we have those pieces?
David Smith: No. There’s a common misconception about BitTorrent [around] how much information the tracker keeps track of. So the most important thing to remember about the tracker’s role to play in this algorithm is that they only keep the IP address and that’s it.
Carl Franklin: So you connect simultaneously to all the different peers on different threads and you say “What pieces do you [have]?” and “Here is what I have.” and you do an exchange. Is that how it works?
David Smith: That’s exactly it. The first message you send to your peer when you connect to them is, first a hello message, and then a piece message. And the piece message says, “These are the pieces that I have.” So once you both receive those messages from each other, if you’re not interested in what that peer has to offer you, you can disconnect the connection as soon as you know that. But if you are interested then you can make a request for the piece and start sending away.
Carl Franklin: And when you get a new piece from another peer do you notify all the other peers that you have that piece or do they have to poll for you?
David Smith: There’s a message that gets sent out, it’s called a piece message.
Carl Franklin: And it’s not something they have to poll for? You stay connected on that port in other words?
David Smith: Yup.
Richard Campbell: Now how [does] this whole process get started? I mean, it presumes that the peers have pieces, there’s got to be a starting point.
David Smith: Exactly, so the very first person that will ever start a torrent is called a seeder. This seeder will start the torrent. He will stay connected until at least one other person has the file or at least, until the file has been distributed once. So if three people connect, as soon as I start seeding, then I have to distribute that file to at least across those three peers before I log out, before I shut down my computer. And then those three peers can end up reconstructing the file between them.
Carl Franklin: And is it worth it having more seeders than one, or is one sufficient?
David Smith: It’s a very common problem that having one person on a 56 K connection is not going to be able to start up a torrent very quickly. The ramp up time is very efficient but it’s still going to be at least a while.
Richard Campbell: And the algorithm’s smart enough to make sure that it’s shipping each piece of the file once before it starts reshipping to other folks?
David Smith: So that actually depends on the client.
Richard Campbell: Oh yeah? Some clients are that bright?
David Smith: Right. So in my client yes, it is that smart. In Bram’s client I would expect it to be that smart.
Carl Franklin: But you don’t know?
David Smith: I don’t know.
Richard Campbell: But either way someone behind a slow connection like that, still has to hold the whole file up at least once, even though it maybe has multiple sources, it’s still got to get up there.
David Smith: Yes
Carl Franklin: But you can see what’s happening here, is that as more clients get connected and they start downloading all these pieces from these other clients, obviously there is some efficiency stuff in there to break up the downloading of all these pieces over all these different clients. If one client has all the pieces and there are other clients out there that also have the same pieces, isn’t it going to download simultaneously from all the clients rather than just wait for one to send everything?
David Smith: A better way of looking at it is: I am looking for pieces 1 to 100 and there are 100 peers out there. If I wanted to download all those 100 pieces from one person, it would take a really long time. If I want to download one piece from each of the 100 peers out there, that will take a much shorter time because everybody else’s upload rate will be maybe 1 kilobyte a second rather than just one person upload rate being 200 kilobytes per second because that will be capped.
Carl Franklin: So the net effect is, if we could boil it down to this, the more people are downloading simultaneously the faster everybody gets the file.
David Smith: Absolutely.
Richard Campbell: Which is a total reverse of what we are used to. I mean, the normal model here is, I put a new file out on my server and everybody goes through it at once and I drop to my knees.
Carl Franklin: Yeah we have known about that for a long time.
Richard Campbell: Sure. But now we are going the other way. I put that file out there and the more people get involved the better it goes!
Carl Franklin: Yeah. And if that doesn’t excite you, people, then… you are not excitable. (laughs) That’s a fundamental shift in the way the bandwidth is used. More is better. I mean this is a way to-on demand, I mean this is the key-if you can combine the on-demandness of a download with the power of this multiple person downloading, then you have got something. Like, for example, broadcasts that are going to happen at a certain time-you know what I mean? As soon as that file is ready, all the clients get notified and are interested in it and then they are all downloading at the same time. And it takes very little time for everybody to get it. So this is why that we think, and I am sure you do too David, that podcasting and BitTorrent go hand in hand. And the software isn’t quite there yet, it’s still a little geeky and tweaky, but it’s getting there. Because we have got thousands of subscribers to our podcast when they all of a sudden see that there is a new file out there, they’re all going to get notified within the same 20 minutes to half hour, and then they’re all going to try to download it. Well, if they’re downloading through BitTorrent that’s great because we want them all to download at the same time. So, we can increase the speed at which they all get it.
David Smith: Yeah absolutely. Bram Cohen actually said on his Web site when he first made BitTorrent. He said, “Why did we make BitTorrent?” And his reason was, because he wanted to enable freedom of expression and I couldn’t say it better myself; like, when I read that I was like, yeah that’s exactly perfect. Because what BitTorrent does, is it allows more people to publish files to as many people as they would like to. However many people want to download the files, they can download it without paying for server costs, without having to know anything about the Internet or how to setup the server, and its just perfect for freedom and enabling podcasting.
Carl Franklin: Yeah it’s freedom of information.
Richard Campbell: A key technological issue here is this recognition that in broadband at least, your download speed and your upload speed are totally independent of each other. So while you’re downloading this file you are able to upload it without impacting your download.
This conversation continues online at http://shrinkster.com/ar8