Erlang Mailing Lists

Author Message

<  Erlang questions mailing list  ~  Concurrent processes on multi-core platforms with lots of ch

mattevans
Posted: Mon Nov 30, 2009 5:28 pm Reply with quote
User Joined: 07 Jun 2009 Posts: 47
Hi,

I've been running messaging tests on R13B02, using both 8 core Intel and 8 core CAVIUM processors. The tests involve two or more processes that do nothing more than sit in a loop exchanging messages as fast as they can. These tests are, of course, not realistic (as in real applications do more than sit in a tight loop sending messages), so my findings will likely not apply to a real deployment.

First the good news: When running tests that do more than just message passing the SMP features of R13B02 are leaps and bounds over R12B05 that I was running previously. What I have however noticed is that in a pure messaging test (lots of messages, in a tight loop) we appear to run into caching issues where messages are sent between processes that happen to be scheduled on different cores. This got me into thinking about a future enhancement to the Erlang VM: Process affinity.

In this mode two or more processes that have a lot of IPC chatter would be associated into a group and executed on the same core. If the scheduler needed to move one process to another core - they would all be relocated.

Although this grouping of processes could be done automatically by the VM I believe the decision making overhead would be too great, and it would likely make some poor choices as to what processes should be grouped together. Rather I would leave it to the developer to make these decisions, perhaps with a library similar to pg2.

For example, library process affinity (paf) could have the functions:

paf:create(Name,[Opts]) -> ok, {error, Reason}
paf:join(Name,Pid,[Opts]) -> ok, {error, Reason}
paf:leave(Name,Pid) -> ok
paf:members(Name) -> MemberList

An affinity group would be created with options for specifying the maximum size of the group (to ensure we don't have all processes on one core), a default membership time within a group (to ensure we don't unnecessarily keep a process in the group when there is no longer a need) and maybe an option to allow the group to be split over different cores if the group size reaches a certain threshold.

A process would join the group with paf:join/3, and would be a member for the default duration (with options here to override the settings specified in paf:create). If the group is full the request is rejected (or maybe queued). After a period of time the process is removed from the group and a message {paf_leave, Pid} is sent to the process that issued the paf:join command. If needed the process could be re-joined at that time with another paf:join call.

Any takers? R14B01 perhaps Wink

Thanks

Matt


Post received from mailinglist
View user's profile Send private message
Guest
Posted: Mon Nov 30, 2009 6:37 pm Reply with quote
Guest
Evans, Matthew wrote:
> Hi,
>
> I've been running messaging tests on R13B02, using both 8 core Intel and 8 core CAVIUM processors. The tests involve two or more processes that do nothing more than sit in a loop exchanging messages as fast as they can. These tests are, of course, not realistic (as in real applications do more than sit in a tight loop sending messages), so my findings will likely not apply to a real deployment.
>
> First the good news: When running tests that do more than just message passing the SMP features of R13B02 are leaps and bounds over R12B05 that I was running previously. What I have however noticed is that in a pure messaging test (lots of messages, in a tight loop) we appear to run into caching issues where messages are sent between processes that happen to be scheduled on different cores. This got me into thinking about a future enhancement to the Erlang VM: Process affinity.
>
> In this mode two or more processes that have a lot of IPC chatter would be associated into a group and executed on the same core. If the scheduler needed to move one process to another core - they would all be relocated.
>
> Although this grouping of processes could be done automatically by the VM I believe the decision making overhead would be too great, and it would likely make some poor choices as to what processes should be grouped together. Rather I would leave it to the developer to make these decisions, perhaps with a library similar to pg2.
>
> For example, library process affinity (paf) could have the functions:
>
> paf:create(Name,[Opts]) -> ok, {error, Reason}
> paf:join(Name,Pid,[Opts]) -> ok, {error, Reason}
> paf:leave(Name,Pid) -> ok
> paf:members(Name) -> MemberList
>
> An affinity group would be created with options for specifying the maximum size of the group (to ensure we don't have all processes on one core), a default membership time within a group (to ensure we don't unnecessarily keep a process in the group when there is no longer a need) and maybe an option to allow the group to be split over different cores if the group size reaches a certain threshold.
>
> A process would join the group with paf:join/3, and would be a member for the default duration (with options here to override the settings specified in paf:create). If the group is full the request is rejected (or maybe queued). After a period of time the process is removed from the group and a message {paf_leave, Pid} is sent to the process that issued the paf:join command. If needed the process could be re-joined at that time with another paf:join call.
>
> Any takers? R14B01 perhaps Wink
>
> Thanks
>
> Matt
>
>
Hi Matt,

I think the "optimal" number of affinity groups and process bindings to
these groups would pretty much depend on the number of schedulers, and
in turn on the number of cores. Probably also depends on the CPU
architecture. Doesn't this road lead to giving up the high-levelness of
Erlang, where we produce hardware-dependent code? I don't want to end up
writing inline BEAM code in the future... Smile
Also, having such a rigid grouping is not a good idea as it completely
ignores the workload of the member processes. You might force a bunch of
heavy-weight processes onto the same core, losing more than you gain.

Anyway, in my opinion it would be a better approach if there was no
grouping at all: we would only define the affinity of a process to
another. This would result in a directed, fully-connected graph on all
processes. Default edge label could be 1, so you only need to increase
it when needed, something like process_flag(affinity, {Pid, 42}).
You could also define some sort of workload-coefficient, put it on the
vertices, and use a force-based alignment algorithm to assign processes
to schedulers Smile

Regards,
Zoltan.

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Post received from mailinglist
mattevans
Posted: Mon Nov 30, 2009 8:54 pm Reply with quote
User Joined: 07 Jun 2009 Posts: 47
Hi,

I actually agree with both you and Zoltan. I absolutely don't want the developer to have any understanding of CPU architecture. But, at the same time I'm not 100% convinced that the runtime should know what an application is doing or needs to do (other than in the way it does today).

This is one reason why I thought of something along the lines of a process group (Zoltan's idea has merits too). By that I mean the developer knows that process A, B, C and D are going to be involved in a large amount of IPC for the next few seconds (or longer, or shorter). The developer shouldn't care if it is single core or multi-core. All they are saying to the runtime is "please ensure that these processes are on the same scheduler for the next X micro seconds since they are going to be communicating a lot". Now the runtime could ignore that request if it thought things could run faster if they were on different cores.

I think my point is that you are correct in that developers should not care what the CPU topology is, but at the same time you don't want the VM making bad choices too.

Interesting discussion....

________________________________
From: Wallentin Dahlberg [mailto:wallentin.dahlberg@gmail.com]
Sent: Monday, November 30, 2009 2:46 PM
To: Evans, Matthew
Cc: erlang-questions@erlang.org
Subject: Re: [erlang-questions] Concurrent processes on multi-core platforms with lots of chatter

2009/11/30 Evans, Matthew <mevans@verivue.com<mailto:mevans@verivue.com>>
First the good news: When running tests that do more than just message passing the SMP features of R13B02 are leaps and bounds over R12B05 that I was running previously. What I have however noticed is that in a pure messaging test (lots of messages, in a tight loop) we appear to run into caching issues where messages are sent between processes that happen to be scheduled on different cores. This got me into thinking about a future enhancement to the Erlang VM: Process affinity.

In this mode two or more processes that have a lot of IPC chatter would be associated into a group and executed on the same core. If the scheduler needed to move one process to another core - they would all be relocated.

Although this grouping of processes could be done automatically by the VM I believe the decision making overhead would be too great, and it would likely make some poor choices as to what processes should be grouped together. Rather I would leave it to the developer to make these decisions, perhaps with a library similar to pg2.

Process/scheduler affinity has been discussed at length before and there are several schools of thought on the matter.

It is true that accounting and statistics gathering in the runtime has some overhead. However, this should be handled by the scheduler. The user (the erlang developer) might not have all the information at hand and if he doesn't he has to collect them which also costs. It would be easier for the system to schedule the process right. The scheduler still has collect other information and make decisions on memory models, memory distance, number of processing units, process to process message affinity, load balancing and other characteristics. It would be reasonable easy to model a scheduler algorithm after these charateristics and we can make it much more dynamic in the runtime system.

Ideally the user shouldn't be concerned about the hardware he is running on. Thats my take on the situation anyway.

Regards,
Bj
View user's profile Send private message
mattevans
Posted: Tue Dec 01, 2009 3:47 pm Reply with quote
User Joined: 07 Jun 2009 Posts: 47
I'll second that. Although I generally agree with the direction Erlang has taken of not exposing the underlying architecture to the developer, we must realize that there is a segment of people who do care. Providing abstractions to map processes onto a specific core would provide benefits to those people. We have already gone in that direction to a small degree with the cpu_topology options (both with the -sct VM invocation arguments, and with the erlang:system_flag/2 function).

________________________________
From: Alex Arnon [mailto:alex.arnon@gmail.com]
Sent: Tuesday, December 01, 2009 4:23 AM
To: Jayson Vantuyl
Cc: Robert Virding; Evans, Matthew; erlang-questions@erlang.org
Subject: Re: [erlang-questions] Concurrent processes on multi-core platforms with lots of chatter

+1
And then some Smile

On Tue, Dec 1, 2009 at 5:54 AM, Jayson Vantuyl <kagato@souja.net<mailto:kagato@souja.net>> wrote:
Off the top of my head, I would expect this to be a process_flag.

Something like: process_flag(scheduler_affinity, term()). Possibly with a generic group specified by an atom like undefined. This feels more functional than the proposed paf module, and has the benefit of being data-centric.

The reason I would use a term (and then group by the hash of the term) is because it gives an elegant way to group processes by an arbitrary (possibly application specific) key. Imagine if, for example, Mnesia grouped processes by a transaction ID, or if CouchDB grouped them by socket connection, etc. By not specifying it as an atom or an integer, it lets you just use whatever is appropriate for the application.

I'm not too keen on reusing process groups primarily because group leaders are used for some really common stuff like IO, which shouldn't affect affinity at all.

If we want to be really crazy, we could provide the ability to specify something like a MatchSpec to map a process group to a processor. Call it a SchedSpec. This has the added bonus that you could have multiple handlers that would match in order without having the full blown load of a gen_event or arbitrary fun. This might also provide the beginnings of more powerful prioritization than the existing process_flag(priority) we have now.

Currently, the Use Case that people seem to be concerned with is ensuring locality of execution. However, some people might also want to use it to provide dedicated cores to things like system processing. I have no idea how this would fit with things like the AIO threads, but I'm pretty sure that HPC could benefit from, for example, dedicating 1 scheduler to system management tasks, 1 core to IO, and 6 cores to computation. This is a higher bar, but it's important nonetheless.

Of course, this would have the user thinking about the underlying CPU topology (which I agree is bad). However, this is simply unavoidable in HPC, so it's best that we accept it. Let me state this emphatically, if we try to make Erlang "smart" about scheduling, what is going to happen is that HPC people will dig down, figure out what its doing wrong, then come back with complaints. We will never be able to make it work right for everyone without exposing these same tunables (but likely with a crappier interface). It's better to give them powerful hooks to customize the scheduler with smart default behavior for everyone else.

The reason I like the process_flag(scheduler_affinity) / SchedSpec option is that it can easily start out with just the process_flag, and add something like SchedSpec's later, without having to change the API (or particularly the default behavior). Basically, you get three groups of users:

* Normal People: They don't use affinity, although pieces of the system might. (effectively implemented already)
* Locality Users: They use affinity for locality using the convenient process_flag interface. (easily done with additional process_flag)
* HPC: They use affinity, and plugin SchedSpecs that are custom to their deployment. (can be provided when demanded without breaking first two groups)

On Nov 30, 2009, at 6:49 PM, Robert Virding wrote:

> Another solution would be to use the existing process groups as these are
> not really used very much today. A process group is defined as all the
> processes which have the same group leader. It is possible to change group
> leader. Maybe the VM could try to migrate processes to the same core as
> their group leader.
>
> One problem today is that afaik the VM does not keep track of groups as
> such, it would have to do this to be able to load balance efficiently.
>
> Robert
>
> 2009/11/30 Evans, Matthew <mevans@verivue.com<mailto:mevans@verivue.com>>
>
>> Hi,
>>
>> I've been running messaging tests on R13B02, using both 8 core Intel and 8
>> core CAVIUM processors. The tests involve two or more processes that do
>> nothing more than sit in a loop exchanging messages as fast as they can.
>> These tests are, of course, not realistic (as in real applications do more
>> than sit in a tight loop sending messages), so my findings will likely not
>> apply to a real deployment.
>>
>> First the good news: When running tests that do more than just message
>> passing the SMP features of R13B02 are leaps and bounds over R12B05 that I
>> was running previously. What I have however noticed is that in a pure
>> messaging test (lots of messages, in a tight loop) we appear to run into
>> caching issues where messages are sent between processes that happen to be
>> scheduled on different cores. This got me into thinking about a future
>> enhancement to the Erlang VM: Process affinity.
>>
>> In this mode two or more processes that have a lot of IPC chatter would be
>> associated into a group and executed on the same core. If the scheduler
>> needed to move one process to another core - they would all be relocated.
>>
>> Although this grouping of processes could be done automatically by the VM I
>> believe the decision making overhead would be too great, and it would likely
>> make some poor choices as to what processes should be grouped together.
>> Rather I would leave it to the developer to make these decisions, perhaps
>> with a library similar to pg2.
>>
>> For example, library process affinity (paf) could have the functions:
>>
>> paf:create(Name,[Opts]) -> ok, {error, Reason}
>> paf:join(Name,Pid,[Opts]) -> ok, {error, Reason}
>> paf:leave(Name,Pid) -> ok
>> paf:members(Name) -> MemberList
>>
>> An affinity group would be created with options for specifying the maximum
>> size of the group (to ensure we don't have all processes on one core), a
>> default membership time within a group (to ensure we don't unnecessarily
>> keep a process in the group when there is no longer a need) and maybe an
>> option to allow the group to be split over different cores if the group size
>> reaches a certain threshold.
>>
>> A process would join the group with paf:join/3, and would be a member for
>> the default duration (with options here to override the settings specified
>> in paf:create). If the group is full the request is rejected (or maybe
>> queued). After a period of time the process is removed from the group and a
>> message {paf_leave, Pid} is sent to the process that issued the paf:join
>> command. If needed the process could be re-joined at that time with another
>> paf:join call.
>>
>> Any takers? R14B01 perhaps Wink
>>
>> Thanks
>>
>> Matt
>>


--
Jayson Vantuyl
kagato@souja.net<mailto:kagato@souja.net>






________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org<http://erlang.org>



Post received from mailinglist
View user's profile Send private message
Guest
Posted: Tue Dec 01, 2009 5:18 pm Reply with quote
Guest
Moving back to the original problem... Your processes need to do a lot
of chatter, which means the tasks run within these processes need to
have a lot of shared data. If that's the case, why don't you "migrate"
the tasks onto one process?

When a process of yours would create such an affinity group, it could
very well say to the other process: "Hey, you are going to have too many
requests for me. Instead of messaging me, run this and that task".
Instead of leaving the group, it could say "Okay, just ask me if you
need anything from now on.". This is doable in Erlang without any
changes to the VM, and my guess is it has the same effect on performance
as the affinity groups would have.

Regards,
Zoltan.

Evans, Matthew wrote:
> I'll second that. Although I generally agree with the direction Erlang has taken of not exposing the underlying architecture to the developer, we must realize that there is a segment of people who do care. Providing abstractions to map processes onto a specific core would provide benefits to those people. We have already gone in that direction to a small degree with the cpu_topology options (both with the -sct VM invocation arguments, and with the erlang:system_flag/2 function).
>
> ________________________________
> From: Alex Arnon [mailto:alex.arnon@gmail.com]
> Sent: Tuesday, December 01, 2009 4:23 AM
> To: Jayson Vantuyl
> Cc: Robert Virding; Evans, Matthew; erlang-questions@erlang.org
> Subject: Re: [erlang-questions] Concurrent processes on multi-core platforms with lots of chatter
>
> +1
> And then some Smile
>
> On Tue, Dec 1, 2009 at 5:54 AM, Jayson Vantuyl <kagato@souja.net<mailto:kagato@souja.net>> wrote:
> Off the top of my head, I would expect this to be a process_flag.
>
> Something like: process_flag(scheduler_affinity, term()). Possibly with a generic group specified by an atom like undefined. This feels more functional than the proposed paf module, and has the benefit of being data-centric.
>
> The reason I would use a term (and then group by the hash of the term) is because it gives an elegant way to group processes by an arbitrary (possibly application specific) key. Imagine if, for example, Mnesia grouped processes by a transaction ID, or if CouchDB grouped them by socket connection, etc. By not specifying it as an atom or an integer, it lets you just use whatever is appropriate for the application.
>
> I'm not too keen on reusing process groups primarily because group leaders are used for some really common stuff like IO, which shouldn't affect affinity at all.
>
> If we want to be really crazy, we could provide the ability to specify something like a MatchSpec to map a process group to a processor. Call it a SchedSpec. This has the added bonus that you could have multiple handlers that would match in order without having the full blown load of a gen_event or arbitrary fun. This might also provide the beginnings of more powerful prioritization than the existing process_flag(priority) we have now.
>
> Currently, the Use Case that people seem to be concerned with is ensuring locality of execution. However, some people might also want to use it to provide dedicated cores to things like system processing. I have no idea how this would fit with things like the AIO threads, but I'm pretty sure that HPC could benefit from, for example, dedicating 1 scheduler to system management tasks, 1 core to IO, and 6 cores to computation. This is a higher bar, but it's important nonetheless.
>
> Of course, this would have the user thinking about the underlying CPU topology (which I agree is bad). However, this is simply unavoidable in HPC, so it's best that we accept it. Let me state this emphatically, if we try to make Erlang "smart" about scheduling, what is going to happen is that HPC people will dig down, figure out what its doing wrong, then come back with complaints. We will never be able to make it work right for everyone without exposing these same tunables (but likely with a crappier interface). It's better to give them powerful hooks to customize the scheduler with smart default behavior for everyone else.
>
> The reason I like the process_flag(scheduler_affinity) / SchedSpec option is that it can easily start out with just the process_flag, and add something like SchedSpec's later, without having to change the API (or particularly the default behavior). Basically, you get three groups of users:
>
> * Normal People: They don't use affinity, although pieces of the system might. (effectively implemented already)
> * Locality Users: They use affinity for locality using the convenient process_flag interface. (easily done with additional process_flag)
> * HPC: They use affinity, and plugin SchedSpecs that are custom to their deployment. (can be provided when demanded without breaking first two groups)
>
> On Nov 30, 2009, at 6:49 PM, Robert Virding wrote:
>
>
>> Another solution would be to use the existing process groups as these are
>> not really used very much today. A process group is defined as all the
>> processes which have the same group leader. It is possible to change group
>> leader. Maybe the VM could try to migrate processes to the same core as
>> their group leader.
>>
>> One problem today is that afaik the VM does not keep track of groups as
>> such, it would have to do this to be able to load balance efficiently.
>>
>> Robert
>>
>> 2009/11/30 Evans, Matthew <mevans@verivue.com<mailto:mevans@verivue.com>>
>>
>>
>>> Hi,
>>>
>>> I've been running messaging tests on R13B02, using both 8 core Intel and 8
>>> core CAVIUM processors. The tests involve two or more processes that do
>>> nothing more than sit in a loop exchanging messages as fast as they can.
>>> These tests are, of course, not realistic (as in real applications do more
>>> than sit in a tight loop sending messages), so my findings will likely not
>>> apply to a real deployment.
>>>
>>> First the good news: When running tests that do more than just message
>>> passing the SMP features of R13B02 are leaps and bounds over R12B05 that I
>>> was running previously. What I have however noticed is that in a pure
>>> messaging test (lots of messages, in a tight loop) we appear to run into
>>> caching issues where messages are sent between processes that happen to be
>>> scheduled on different cores. This got me into thinking about a future
>>> enhancement to the Erlang VM: Process affinity.
>>>
>>> In this mode two or more processes that have a lot of IPC chatter would be
>>> associated into a group and executed on the same core. If the scheduler
>>> needed to move one process to another core - they would all be relocated.
>>>
>>> Although this grouping of processes could be done automatically by the VM I
>>> believe the decision making overhead would be too great, and it would likely
>>> make some poor choices as to what processes should be grouped together.
>>> Rather I would leave it to the developer to make these decisions, perhaps
>>> with a library similar to pg2.
>>>
>>> For example, library process affinity (paf) could have the functions:
>>>
>>> paf:create(Name,[Opts]) -> ok, {error, Reason}
>>> paf:join(Name,Pid,[Opts]) -> ok, {error, Reason}
>>> paf:leave(Name,Pid) -> ok
>>> paf:members(Name) -> MemberList
>>>
>>> An affinity group would be created with options for specifying the maximum
>>> size of the group (to ensure we don't have all processes on one core), a
>>> default membership time within a group (to ensure we don't unnecessarily
>>> keep a process in the group when there is no longer a need) and maybe an
>>> option to allow the group to be split over different cores if the group size
>>> reaches a certain threshold.
>>>
>>> A process would join the group with paf:join/3, and would be a member for
>>> the default duration (with options here to override the settings specified
>>> in paf:create). If the group is full the request is rejected (or maybe
>>> queued). After a period of time the process is removed from the group and a
>>> message {paf_leave, Pid} is sent to the process that issued the paf:join
>>> command. If needed the process could be re-joined at that time with another
>>> paf:join call.
>>>
>>> Any takers? R14B01 perhaps Wink
>>>
>>> Thanks
>>>
>>> Matt
>>>
>>>
>
>
> --
> Jayson Vantuyl
> kagato@souja.net<mailto:kagato@souja.net>
>
>
>
>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org<http://erlang.org>
>
>
>


________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Post received from mailinglist
mattevans
Posted: Tue Dec 01, 2009 8:20 pm Reply with quote
User Joined: 07 Jun 2009 Posts: 47
Hi,

The application in question needs to talk to low-level C/C++ drivers. The idea is to create a "pool" or Erlang processes that act as an interface to a linked in driver (or maybe even have NIF functions). We expect to have somewhere in the order of 10,000 Erlang processes requiring access to these drivers at any time, and for obvious reasons a pool of processes providing serialized access to the drivers makes a little more sense than every Erlang process accessing that driver directly.

Now I don't imagine that the Erlang IPC is going to be the "lowest hanging fruit" in regards to performance here - far from it. But there is an internal debate as to how much code should be C/C++ vs Erlang. Anything that can show the Erlang side is efficient (I know it is - for this highly concurrent application it has proven far superior to any C++ code that has been written) will help.

It's not critical that this is addressed immediately. But certainly if it was a feature on the radar of the OTP team I think it would go a long way to calming some nerves over here.

Thanks

Matt

-----Original Message-----
From: Zoltan Lajos Kis [mailto:kiszl@tmit.bme.hu]
Sent: Tuesday, December 01, 2009 12:14 PM
To: Evans, Matthew
Cc: erlang-questions@erlang.org
Subject: Re: [erlang-questions] Concurrent processes on multi-core platforms with lots of chatter

Moving back to the original problem... Your processes need to do a lot
of chatter, which means the tasks run within these processes need to
have a lot of shared data. If that's the case, why don't you "migrate"
the tasks onto one process?

When a process of yours would create such an affinity group, it could
very well say to the other process: "Hey, you are going to have too many
requests for me. Instead of messaging me, run this and that task".
Instead of leaving the group, it could say "Okay, just ask me if you
need anything from now on.". This is doable in Erlang without any
changes to the VM, and my guess is it has the same effect on performance
as the affinity groups would have.

Regards,
Zoltan.

Evans, Matthew wrote:
> I'll second that. Although I generally agree with the direction Erlang has taken of not exposing the underlying architecture to the developer, we must realize that there is a segment of people who do care. Providing abstractions to map processes onto a specific core would provide benefits to those people. We have already gone in that direction to a small degree with the cpu_topology options (both with the -sct VM invocation arguments, and with the erlang:system_flag/2 function).
>
> ________________________________
> From: Alex Arnon [mailto:alex.arnon@gmail.com]
> Sent: Tuesday, December 01, 2009 4:23 AM
> To: Jayson Vantuyl
> Cc: Robert Virding; Evans, Matthew; erlang-questions@erlang.org
> Subject: Re: [erlang-questions] Concurrent processes on multi-core platforms with lots of chatter
>
> +1
> And then some Smile
>
> On Tue, Dec 1, 2009 at 5:54 AM, Jayson Vantuyl <kagato@souja.net<mailto:kagato@souja.net>> wrote:
> Off the top of my head, I would expect this to be a process_flag.
>
> Something like: process_flag(scheduler_affinity, term()). Possibly with a generic group specified by an atom like undefined. This feels more functional than the proposed paf module, and has the benefit of being data-centric.
>
> The reason I would use a term (and then group by the hash of the term) is because it gives an elegant way to group processes by an arbitrary (possibly application specific) key. Imagine if, for example, Mnesia grouped processes by a transaction ID, or if CouchDB grouped them by socket connection, etc. By not specifying it as an atom or an integer, it lets you just use whatever is appropriate for the application.
>
> I'm not too keen on reusing process groups primarily because group leaders are used for some really common stuff like IO, which shouldn't affect affinity at all.
>
> If we want to be really crazy, we could provide the ability to specify something like a MatchSpec to map a process group to a processor. Call it a SchedSpec. This has the added bonus that you could have multiple handlers that would match in order without having the full blown load of a gen_event or arbitrary fun. This might also provide the beginnings of more powerful prioritization than the existing process_flag(priority) we have now.
>
> Currently, the Use Case that people seem to be concerned with is ensuring locality of execution. However, some people might also want to use it to provide dedicated cores to things like system processing. I have no idea how this would fit with things like the AIO threads, but I'm pretty sure that HPC could benefit from, for example, dedicating 1 scheduler to system management tasks, 1 core to IO, and 6 cores to computation. This is a higher bar, but it's important nonetheless.
>
> Of course, this would have the user thinking about the underlying CPU topology (which I agree is bad). However, this is simply unavoidable in HPC, so it's best that we accept it. Let me state this emphatically, if we try to make Erlang "smart" about scheduling, what is going to happen is that HPC people will dig down, figure out what its doing wrong, then come back with complaints. We will never be able to make it work right for everyone without exposing these same tunables (but likely with a crappier interface). It's better to give them powerful hooks to customize the scheduler with smart default behavior for everyone else.
>
> The reason I like the process_flag(scheduler_affinity) / SchedSpec option is that it can easily start out with just the process_flag, and add something like SchedSpec's later, without having to change the API (or particularly the default behavior). Basically, you get three groups of users:
>
> * Normal People: They don't use affinity, although pieces of the system might. (effectively implemented already)
> * Locality Users: They use affinity for locality using the convenient process_flag interface. (easily done with additional process_flag)
> * HPC: They use affinity, and plugin SchedSpecs that are custom to their deployment. (can be provided when demanded without breaking first two groups)
>
> On Nov 30, 2009, at 6:49 PM, Robert Virding wrote:
>
>
>> Another solution would be to use the existing process groups as these are
>> not really used very much today. A process group is defined as all the
>> processes which have the same group leader. It is possible to change group
>> leader. Maybe the VM could try to migrate processes to the same core as
>> their group leader.
>>
>> One problem today is that afaik the VM does not keep track of groups as
>> such, it would have to do this to be able to load balance efficiently.
>>
>> Robert
>>
>> 2009/11/30 Evans, Matthew <mevans@verivue.com<mailto:mevans@verivue.com>>
>>
>>
>>> Hi,
>>>
>>> I've been running messaging tests on R13B02, using both 8 core Intel and 8
>>> core CAVIUM processors. The tests involve two or more processes that do
>>> nothing more than sit in a loop exchanging messages as fast as they can.
>>> These tests are, of course, not realistic (as in real applications do more
>>> than sit in a tight loop sending messages), so my findings will likely not
>>> apply to a real deployment.
>>>
>>> First the good news: When running tests that do more than just message
>>> passing the SMP features of R13B02 are leaps and bounds over R12B05 that I
>>> was running previously. What I have however noticed is that in a pure
>>> messaging test (lots of messages, in a tight loop) we appear to run into
>>> caching issues where messages are sent between processes that happen to be
>>> scheduled on different cores. This got me into thinking about a future
>>> enhancement to the Erlang VM: Process affinity.
>>>
>>> In this mode two or more processes that have a lot of IPC chatter would be
>>> associated into a group and executed on the same core. If the scheduler
>>> needed to move one process to another core - they would all be relocated.
>>>
>>> Although this grouping of processes could be done automatically by the VM I
>>> believe the decision making overhead would be too great, and it would likely
>>> make some poor choices as to what processes should be grouped together.
>>> Rather I would leave it to the developer to make these decisions, perhaps
>>> with a library similar to pg2.
>>>
>>> For example, library process affinity (paf) could have the functions:
>>>
>>> paf:create(Name,[Opts]) -> ok, {error, Reason}
>>> paf:join(Name,Pid,[Opts]) -> ok, {error, Reason}
>>> paf:leave(Name,Pid) -> ok
>>> paf:members(Name) -> MemberList
>>>
>>> An affinity group would be created with options for specifying the maximum
>>> size of the group (to ensure we don't have all processes on one core), a
>>> default membership time within a group (to ensure we don't unnecessarily
>>> keep a process in the group when there is no longer a need) and maybe an
>>> option to allow the group to be split over different cores if the group size
>>> reaches a certain threshold.
>>>
>>> A process would join the group with paf:join/3, and would be a member for
>>> the default duration (with options here to override the settings specified
>>> in paf:create). If the group is full the request is rejected (or maybe
>>> queued). After a period of time the process is removed from the group and a
>>> message {paf_leave, Pid} is sent to the process that issued the paf:join
>>> command. If needed the process could be re-joined at that time with another
>>> paf:join call.
>>>
>>> Any takers? R14B01 perhaps Wink
>>>
>>> Thanks
>>>
>>> Matt
>>>
>>>
>
>
> --
> Jayson Vantuyl
> kagato@souja.net<mailto:kagato@souja.net>
>
>
>
>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org<http://erlang.org>
>
>
>


________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Post received from mailinglist
View user's profile Send private message
wuji
Posted: Sat Aug 25, 2012 8:30 am Reply with quote
User Joined: 10 Aug 2012 Posts: 654
ram the ship, killing 17 American sailors.Word of the shooting shooting discount designer *beep* shooting comes on the same day that the Pentagon confirmed
it had agreed to a recent request from U.S. Central Central [h3]cheap Ralph Lauren Polo[/h3] Central Command to maintain a two carrier presence in the
East.The carrier U.S.S. John C. Stennis has been ordered to to cheap louboutins to head to the region four months ahead of schedule
September to replace the outgoing U.S.S. Enterprise. A A cheap designer *beep* A Pentagon spokesman said the Stennis is being sent so
there is no gap in between two carrier assignments to to replica designer *beep* to the region.On Sunday, the U.S.S. Eisenhower replaced the
Abraham Lincoln in the region. By replacing the Enterprise, Enterprise, [h4]discount designer *beep*[/h4] Enterprise, the U.S. will now be able to avoid having
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum