| Author |
Message |
|
| Guest |
Posted: Mon Sep 03, 2007 10:05 pm |
|
|
|
Guest
|
> Date: Mon, 3 Sep 2007 18:43:08 +0800
> From: "Hugh Perkins" <hughperkins@gmail.com>
> Subject: Re: [erlang-questions] Intel Quad CPUs
>
> Almost on topic, what could be the benefits and the challenges of
> getting Erlang working on an nVidia Tesla card?
>
> These cards have between 8 and 128 cores, depending on how you look at
> it. (16 multiprocessors, each running warps of up to 32 threads, every
> 2 clock cycles).
I've been working with CUDA on NVIDIA 8800's, which is the same core
technology as NVIDIA Tesla.
I don't believe Erlang is a good fit for the underlying hardware.
Erlang processes nicely map onto independent instruction streams.
Erlang processes are control-stream oriented, and not data stream
oriented (like 8800).
An NVIDIA 8800GTX has 128 ALU's, but only concurrently executes 16
instruction streams, 1 per 'multi-processor' (8 ALU's/multi-processor).
Threads and warps are an abstraction on top of the hardware created
by the CUDA language and run-time to make it practical to extract the
majority of the hardware performance.
IMHO getting at the full 8800 performance relies on aligning with a
few key hardware constraints and mechanisms:
1. All of the code on a multi-processor executes in lock-step; either
all of the ALU's execute the same instruction, or a subset are idle
(and you want to maximise the number of ALU's performing work).
2. Memory loads for ALU's must be to sequential addresses, to get at
the full chip to memory bandwidth.
3. 'Shared' memory access should be sequential, and used a lot, to
reach it's 1TB/second bandwidth
4. 'Texture memory' access should be 'spatially localised' to exploit
caching.
5. 'Constant memory' access should be synchronised to exploit caching.
6. 'blocks' of computation should fit into the register set of a
multi-processor.
7. Limited external communications - the 8800 rely on the host for
network and disk IO, and the 'pipe' isn't as 'fat' as one might like
(though fatter than GigEthernet).
These are pretty low-level concepts, and I do not see any software-
level (vanilla) Erlang support which could exploit these properties
(we could add features like data-parallel Haskell to Erlang, of course)
So, you could have Erlang, one process/multi-processor, and ignore 7
of the 8 ALU's, and ignore some of the hardware 'features'. I have no
problem with that, it's just worth thinking through. I would hope
that each 8800GTX would be within +/- 3x performance of an Intel quad
core run in this way (Intel runs about 2x the ALU clock of a GTX, and
each core of a Quad Core has multiple ALU's which an be exploited by
ILP in hardware, and Quad Core's caching is managed below the
instruction level, so it would 'just work').
IMHO, a Erlang processes on NVIDIA 8800/Tesla could be done (while
ignore much of the specialised hardware), but NVIDIA haven't released
full processor details, so we'd be restricted to programming in CUDA,
or the PTX pseudo-assembler, with only partial information. So, it'd
likely be noticeably slower than an Intel Quad Core for vanilla
Erlang code.
Having said all of that. I might be interested in having a crack at it !
IMHO, A much more interesting chip for Erlang is the Tilera TILE64:
http://www.tilera.com/products/processors.php
This seems like a very good fit; lots (64) of independent processors,
with very high-speed mesh interconnect. If anyone would like to buy
me a TILExpress-64 CARD (http://www.tilera.com/products/boards.php)
I'd be very happy to investigate putting Erlang on that.
Garry
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| Multiverse |
Posted: Fri Mar 07, 2008 7:14 am |
|
|
|
User
Joined: 07 Mar 2008
Posts: 15
|
| I've checked their site and they aren't selling it there, neither is it sold on newegg. Do you know where I can buy it? depending on the price, I might take you up on the offer and send the card to you, if you can figure out whether Erlang is a good fit for it. Otherwise I'll have to wait 5 years for intell to release their 80core cpu, (...They have demonstrated the cpu a few months ago, they said it will be available for the public within the 5 year window, and with the advancements in silicone nanophotonics that IBM made, cpus with 100s to thousands of cores will be available within the next 5-10 years). |
|
|
| Back to top |
|
| wuji |
Posted: Mon Aug 27, 2012 2:29 am |
|
|
|
User
Joined: 10 Aug 2012
Posts: 654
|
I can't go back into my house," one angry homeowner homeowner [h4]replica designer *beep*[/h4] homeowner said. "I looked this up. I'm not flying off
cuff here. I have rights as a property owner."Evacuees were were [h3]designer replica *beep*[/h3] were called to a meeting at the University of Colorado
Anne Marie Borrego of the Red Cross said a list list [h4]cheap Ralph Lauren Polo[/h4] list was passed around with the homes that were damaged
destroyed."It was a really tough meeting," Borrego said. "I have have cheap Ralph Lauren have to say just sitting there, I kept thinking to
how difficult it must be to wait for that list list [h4]cheap designer *beep*[/h4] list to actually be handed out to look for your |
|
|
| Back to top |
|
|
|
All times are GMT
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|