IonPower phase 5!
Progress! I got IonPower past the point PPCBC ran aground at -- it can now jump in and out of Baseline and Ion code on PowerPC without crashing or asserting. Thats already worth celebrating, but as the judge who gave me the restraining order on behalf of Scarlett Johansson remarked, I always have to push it. So I tried our iterative π calculator again and really gave it a workout by forcing 3 million iterations. Just to be totally unfair, Ive compared the utterly unoptimized IonPower (in full Ion mode) versus the fully optimized PPCBC (Baseline) in the forthcoming TenFourFox 31.6. Here we go (Quad G5, Highest Performance mode):
% /usr/bin/time /Applications/TenFourFoxG5.app/Contents/MacOS/js --no-ion -e var pi=4,top=4,bot=3,minus = true;next(pi,top,bot,minus,3000000);function next(pi,top,bot,minus,num){for(var i=0;i<num;i++){pi += (minus == true)?-(top/bot):(top/bot);minus = !minus;bot+=2;}print(pi);}
3.1415929869229293
0.48 real 0.44 user 0.03 sys
% /usr/bin/time ../../../obj-ff-dbg/dist/bin/js --ion-offthread-compile=off -e var pi=4,top=4,bot=3,minus = true;next(pi,top,bot,minus,3000000);function next(pi,top,bot,minus,num){for(var i=0;i<num;i++){pi += (minus == true)?-(top/bot):(top/bot);minus = !minus;bot+=2;}print(pi);}
3.1415929869229293
0.37 real 0.21 user 0.16 sys
No, thats not a typo. The unoptimized IonPower, even in its primitive state, is 23 percent faster than PPCBC on this test largely due to its superior use of floating point. It gets even wider when we do 30 million iterations:
% /usr/bin/time /Applications/TenFourFoxG5.app/Contents/MacOS/js --no-ion -e var pi=4,top=4,bot=3,minus = true;next(pi,top,bot,minus,30000000);function next(pi,top,bot,minus,num){for(var i=0;i<num;i++){pi += (minus == true)?-(top/bot):(top/bot);minus = !minus;bot+=2;}print(pi);}
3.1415926869232984
4.20 real 4.15 user 0.03 sys
% /usr/bin/time ../../../obj-ff-dbg/dist/bin/js --ion-offthread-compile=off -e var pi=4,top=4,bot=3,minus = true;next(pi,top,bot,minus,30000000);function next(pi,top,bot,minus,num){for(var i=0;i<num;i++){pi += (minus == true)?-(top/bot):(top/bot);minus = !minus;bot+=2;}print(pi);}
3.1415926869232984
1.55 real 1.38 user 0.16 sys
Thats 63 percent faster. And Im not even to fun things like leveraging the G5s square root instruction (the G3 and G4 versions will use David Kilbridges software square root from JaegerMonkey), parallel compilation on the additional cores or even working on some of the low-hanging fruit with branch optimization, and on top of all that IonPower is still running all its debugging code and sanity checks. I think this qualifies as IonPower phase 5 (basic operations), so now the final summit will be getting the test suite to pass in both sequential and parallel modes. When it does, its time for TenFourFox 38!
By the way, for Bens amusement, how does it compare to our old, beloved and heavily souped up JaegerMonkey implementation? (17.0.11 was our fastest version here; 19-22 had various gradual degradations in performance due to Mozillas Ion development screwing around with methodjit.)
% /usr/bin/time /Applications/TenFourFoxG5-17.0.11.app/Contents/MacOS/js -m -n -e var pi=4,top=4,bot=3,minus = true;next(pi,top,bot,minus,30000000);function next(pi,top,bot,minus,num){for(var i=0;i<num;i++){pi += (minus == true)?-(top/bot):(top/bot);minus = !minus;bot+=2;}print(pi);}
3.1415926869232984
4.15 real 4.11 user 0.02 sys
Yup. Im that awesome. Now Im gonna sit back and go play some well-deserved Bioshock Infinite on the Xbox 360 (tri-core PowerPC, thank you very much, and I look forward to cracking the firmware one of these days) while the G5 is finishing the 31.6 release candidates overnight. They should be ready for testing tomorrow, so watch this space.