Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Rough thought on a P2P package distribution model for Erlang

jeffm
Posted: Wed Sep 14, 2011 2:01 am Reply with quote
User Joined: 29 Sep 2008 Posts: 43
In my previous email I said that I thought P2P package distribution
system would be a good idea. This was due to it elimination the single
points of failure with relying of the future of websites. There are a
number of problems with using a P2P module. Chief among these are how to
get packages into the system and how to know that these packages are
trust worthy.

With that in mind here's some rough thoughts on a P2P module repository
for Erlang:

Publisher: the person who maintains the package. Typically, the author
of the module being published.
Node: a server which is a member of the P2P module repository system
Indexer: a person who creates an index of packages that they say meets
some criteria ie, they vouch for the packages.
Administrator: the person who looks after a node

The process would work something like this,

Some one writes a wonderful module the one everyone has been waiting for.
Either the original author or someone on their behalf packages it up.
The Publisher then makes this publicly available on a website or through
git/mercurial/etc
The Publisher notifies one or more indexers.
Each Indexers check that the package meets their criteria.
The Indexer then injects the package into the p2p distribution system
along with an updated signed versioned index file.
This index file lists which packages the Indexer has verified and the
cryptographic hash for each package.
The Administrators of other nodes select which Indexers they wish to
follow and keep copies each Indexers public key (obtained out of band).
The Nodes then replicates the index file of each Indexer of interest and
the packages listed by those index files.
These nodes then make this information available of ftp/http/p2p or
other means to other nodes and end developers.

Using an Indexer has a couple of advantages:
1) it eliminates the need for everyone to have certificates. Making
the system cleaner to use and lowering the barrier to entry of package
maintainers allow them to easily submit their work without distraction.
2) It maintains a concept similar to existing repositories with which
people are familiar. This makes it easy for people to bring up and
maintain additional nodes. It also means that the number of people that
have to wade though all the packages out there is reduced down to the
Indexer. You simple select the Indexer who has a package criteria which
reflects your own.

This is separate what packages are and who Erlang handles dependances.
This is merely a distribution model.

Excuse the broad description I merely intend this to give people ideas.

Jeff.

_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
View user's profile Send private message
Guest
Posted: Fri Sep 16, 2011 6:51 pm Reply with quote
Guest
How is this different from the already-solved problem of peer-to-peer authenticated file distribution?


Tracker-based systems like bittorrent, and fully peer-to-peer systems like freenet have been around for a very long time, and solve all of those problems, with different trade-offs for performance, security, susceptibility, etc.


Sincerely,


jw

--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.



On Tue, Sep 13, 2011 at 7:01 PM, jm <jeffm@ghostgun.com (jeffm@ghostgun.com)> wrote:
Quote:
In my previous email I said that I thought P2P package distribution system would be a good idea. This was due to it elimination the single points of failure with relying of the future of websites. There are a number of problems with using a P2P module. Chief among these are how to get packages into the system and how to know that these packages are trust worthy.

With that in mind here's some rough thoughts on a P2P module repository for Erlang:

Publisher: the person who maintains the package. Typically, the author of the module being published.
Node:       a server which is a member of the P2P module repository system
Indexer:   a person who creates an index of packages that they say meets some criteria ie, they vouch for the packages.
Administrator: the person who looks after a node

The process would work something like this,

Some one writes a wonderful module the one everyone has been waiting for.
Either the original author or someone on their behalf packages it up.
The Publisher then makes this publicly available on a website or through git/mercurial/etc
The Publisher notifies one or more indexers.
Each Indexers check that the package meets their criteria.
The Indexer then injects the package into the p2p distribution system along with an updated signed versioned index file.
This index file lists which packages the Indexer has verified and the cryptographic hash for each package.
The Administrators of other nodes select which Indexers they wish to follow and keep copies each Indexers public key (obtained out of band).
The Nodes then replicates the index file of each Indexer of interest and the packages listed by those index files.
These nodes then make this information available of ftp/http/p2p or other means to other nodes and end developers.

Using an Indexer has a couple of advantages:
 1) it eliminates the need for everyone to have certificates. Making the system cleaner to use and lowering the barrier to entry of package maintainers allow them to easily submit their work without distraction.
2) It maintains a concept similar to existing repositories with which people are familiar. This makes it easy for people to bring up and maintain additional nodes. It also means that the number of people that have to wade though all the packages out there is reduced down to the Indexer. You simple select the Indexer who has a package criteria which reflects your own.

This is separate what packages are and who Erlang handles dependances. This is merely a distribution model.

Excuse the broad description I merely intend this to give people ideas.

Jeff.

_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org (erlang-questions@erlang.org)
http://erlang.org/mailman/listinfo/erlang-questions




Post received from mailinglist
jeffm
Posted: Tue Sep 20, 2011 11:43 pm Reply with quote
User Joined: 29 Sep 2008 Posts: 43
I'm not saying it is any different. I'm just outlining a distribution
system for packages which is an alternative to the more tradition tiered
ftp or http site mirrors. Even in the sense I'm outlining it's not that
novel. One linux distribution, rubyx, used a similar model. Their
package distribution program was called "white water". The project
appears to be dead as the website seems to be unresponsive.

As you point out these systems "have been around for a very long time"
so why aren't we making better use of these technologies for things like
this?

Jeff.

On 17/09/11 4:51 AM, Jon Watte wrote:
> How is this different from the already-solved problem of peer-to-peer
> authenticated file distribution?
>
> Tracker-based systems like bittorrent, and fully peer-to-peer systems
> like freenet have been around for a very long time, and solve all of
> those problems, with different trade-offs for performance, security,
> susceptibility, etc.
>
> Sincerely,
>
> jw
>

_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
View user's profile Send private message
Guest
Posted: Wed Sep 21, 2011 12:26 pm Reply with quote
Guest
On Wed, Sep 21, 2011 at 01:43, jm <jeffm@ghostgun.com> wrote:

> As you point out these systems "have been around for a very long time" so
> why aren't we making better use of these technologies for things like this?

Mainly because there is no need to do it.

Data distribution systems like BitTorrent has the specific distinct
advantage that it can transfer data at high bandwidths, even if the
initial source is highly bandwidth constrained in its upstream.
Upstream bandwidth scales with demand in a BT network. Hence, the
applicability of BitTorrent (and like) protocols hinges on a need to
thwart an upstream bandwidth constraint. Example: You are Facebook and
need a 1 gigabyte image distributed quickly to 50.000 nodes from a
single deploy machine.

But as soon as there is demand for a package distribution, and the
demand is high enough, mirrors form and mirrors have ample amounts of
upstream bandwidth available. Far more than what is needed. Thus, the
additional complexity of adding BitTorrent into the mix isn't needed.

Add that HTTP transport is well-known and simple. You can say it is a
locally extreme value which is currently good enough. The proof is
Content Delivery Networks (CDNs) which does not use BitTorrent to
distribute content.


--
J.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
jeffm
Posted: Fri Sep 23, 2011 4:35 am Reply with quote
User Joined: 29 Sep 2008 Posts: 43
You assume speed is the only reason to use P2P. I'm more concerned with
maintaining up to date package repositories, redundancy, and security.
MIrrors that use ftp/http/rsync/etc have a tendency to fall out of date
if not well maintained. These mirrors also require that up to date lists
be maintained so the users (human or machine) can find active ones and
how do you distribute those? CDNs do this through things like
redirection and dynamic DNS records. The method that is used behind the
scenes to distribute content would vary from provider to provider. P2P
is just one possible technology (and that is all I'm claiming). Most
likely, these providers use a mix of technologies or a hybrid approach
for internal content distribution.

For example,
http://goanna.cs.rmit.edu.au/~xiaodong/mbc/Theses/jaison-minorThesis.pdf
mentions P2P as one technology.

and page 340 and page 341 of this paper has tables which list uses of
P2P one sub table is devoted to "Content Publishing and Storage Systems"
A Survey of Peer-to-Peer Content Distribution Technologies
STEPHANOS ANDROUTSELLIS-THEOTOKIS AND DIOMIDIS SPINELLIS
Athens University of Economics and Business

Jeff.

On 21/09/11 10:25 PM, Jesper Louis Andersen wrote:
> Mainly because there is no need to do it.
>
> Data distribution systems like BitTorrent has the specific distinct
> advantage that it can transfer data at high bandwidths, even if the
> initial source is highly bandwidth constrained in its upstream.
> Upstream bandwidth scales with demand in a BT network. Hence, the
> applicability of BitTorrent (and like) protocols hinges on a need to
> thwart an upstream bandwidth constraint. Example: You are Facebook and
> need a 1 gigabyte image distributed quickly to 50.000 nodes from a
> single deploy machine.
>
> But as soon as there is demand for a package distribution, and the
> demand is high enough, mirrors form and mirrors have ample amounts of
> upstream bandwidth available. Far more than what is needed. Thus, the
> additional complexity of adding BitTorrent into the mix isn't needed.
>
> Add that HTTP transport is well-known and simple. You can say it is a
> locally extreme value which is currently good enough. The proof is
> Content Delivery Networks (CDNs) which does not use BitTorrent to
> distribute content.
>
>

_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
View user's profile Send private message
Thomas Lindgren
Posted: Fri Sep 23, 2011 9:11 am Reply with quote
User Joined: 09 Mar 2005 Posts: 284
Twitter uses p2p to deploy their services:
View user's profile Send private message
Guest
Posted: Sat Sep 24, 2011 5:41 pm Reply with quote
Guest
Quote:
why aren't we making better use of these technologies for things like this?
 
Because, for most applications, it's not a solution that fits the problems people actually have.
If you want high bandwidth, reliability, security and distribution, you sign up with Akamai, or you put your stuff in an S3 bucket, or you stick a Riak store in your datacenter, and 99% of the time, this solves the problem with a lot less effort and complication than a p2p system.
 
Also, the difference between doing this only within a zone of trust (distribution of software to servers you control), and doing it on an insecure network (end-user peer-to-peer systems) is pretty significant!
rsync trees for pushing updates to a large cluster of hosts are pretty easy to set up, if you have SSH keys for all the target hosts, and the hosts can just assume that the data they get is not tainted. (we do this to deploy web code to a cluster of many hundreds of servers, for example)
 
In general, it's not how "smart" something is that matters to adoption, but how well it solves particular problems that many users have at some particular point in time.
 
Sincerely,
 
jw

--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.




On Tue, Sep 20, 2011 at 4:43 PM, jm <jeffm@ghostgun.com (jeffm@ghostgun.com)> wrote:
Quote:
I'm not saying it is any different. I'm just outlining a distribution system for packages which is an alternative to the more tradition tiered ftp or http site mirrors. Even in the sense I'm outlining it's not that novel. One linux distribution, rubyx, used a similar model. Their package distribution program was called  "white water".  The project appears to be dead as the website seems to be unresponsive.

As you point out these systems "have been around for a very long time" so why aren't we making better use of these technologies for things like this?

Jeff.

On 17/09/11 4:51 AM, Jon Watte wrote:
Quote:
How is this different from the already-solved problem of peer-to-peer authenticated file distribution?

Tracker-based systems like bittorrent, and fully peer-to-peer systems like freenet have been around for a very long time, and solve all of those problems, with different trade-offs for performance, security, susceptibility, etc.

Sincerely,

jw




_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org (erlang-questions@erlang.org)
http://erlang.org/mailman/listinfo/erlang-questions





Post received from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum