Saturday, March 24, 2007
Is it not ironic? I work at a super computing center, yet it turns out that world first petaflop/s computer will be housed right here in my living room in Lelystad. Well, at least a part of that machine will be. Expect the first petaflop/s to be reached this weekend, thanks to the PS/3 implementation of the Folding at Home code. Something amazing has happened here... a handful of PS/3's (currently 20K of them) dwarfed the contribution of the 200K regular PCs. As a matter of fact, if all PCs running folding at home were switched off, computational capacity of the folding at home initiative would hardly be impacted. I have to keep an eye on my energy bill though, as the PS/3 consumes 200Watt. At roughly 20 eurocent per KWatt-hour, it is costing me approximately 1 euro per day. The image of this blog-entry shows the Villin protein, that my PS/3 is currently simulating. One last note: the special purpose computer MDGrape-3 claims to have already reached the petaflop/s milestone.
Friday, March 2, 2007
I have made some progress with my ray tracing endeavor on the SPU. The renderer now renders coloured triangles as you can see in the image on the right. It is still un-shaded, but it renders using 6 SPUs at 19.5 fps at a 480x480 pixel resolution for a 304-triangle model. I use no spatial subdivision, so all triangles are tested if the bounding sphere is hit by the ray. The triangles are tested 8 at a time though. All operations are performed with two SIMD instructions, that each process 4 32-bit floats. I opted to test 8 at a time as opposed to the more natural 4 at a time, because this way I get more independent operations that can be pipelined properly. I have also implemented an algorithm that tests 8 spheres for intersection in a single go. Spatial subdivision will be a challenge, as I should somehow create batches of 8 to test. I really need to find some extra performance. The ray tracer of Jacco Bikker shows what is possible. Also, how on earth am I going to implement CSG in a SIMD-friendly manner without much branching? I need to study the algorithm of S D Roth. Unfortunately, this paper is not online.