Tuesday, December 12, 2006

HKN Underground Guide evaluations

The Underground Guide student evaluations are now open. Visit them here for 6.035.

Please complete your evaluation before 11:59P December 22.

Quiz 3 Graded

Quiz 3 has been graded. You will get your quiz back tomorrow after class. The average (and median) was 68. The standard deviation was 14.

Small bug in data parallelization library (libderby.a)

I fixed a small bug in the data-parallelization library that prevented one group's code from working. Since no other groups emailed me, I'm assuming the problem was contained to this one group. The fix is installed in the same location as the old file, so you don't have to do anything. Let me know if you experience problems.

Monday, December 11, 2006

Parallelization differences across machines

First off, with the new benchmarking library, you can benchmark on chocura.

I have noticed big differences between the parallelization speedups of silver, chocura, and tyner. For a small 4-thread test program, silver gets about a 2.9x speedup over sequential code (executed on silver). Tyner gets a 1.8x speedup over sequential code (executed on tyner). Chocura gets a 3.7x speedup for the same code over sequential execution on chocura.

Based on my experiences over the last couple of days, I'm going to use chocura for the derby. It seems to be the most stable.

Friday, December 08, 2006

New benchmarking library for the derby!

After noticing that the current benchmarking infrastructure produces some inconsistencies when executing multi-threaded benchmarks, I have implemented a new benchmarking library for the derby. You can use this library now for benchmarking.

Don't worry, the implementation decisions that you arrived at using the current (old) benchmarking infrastructure are still valid. I have only noticed inconsistencies when running multiple threads in the current system.

You can continue to use the current (old) library (lib6035.a) but note that it has problems with multi-threaded benchmarks.

The new library is named libderby.a and is located in /u/mgordon/6035/lib64. The new library interface is similar to the old interface. There are two calls: start_caliper()and end_caliper(). Wrap these around the code you would like to benchmark. The only difference from the old library is that the code will not be benchmarked without a caliper defined. As before you can place them in a callout in your decaf source code.

Using the new library, the assemble command is simpler, for example:

gcc4 emboss.s -pthread -lderby -L/u/mgordon/6035/lib64 -o emboss

No need for that papi library from before. When you execute your code (assuming that you have added a call to start_caliper() and end_caliper()), a brief message will print out, for example:

$ emboss
Timer: 276864 usecs
$

This tells you how many usecs (microseconds) it took for the code wrapped in the timer to execute. We will use this library during the derby to determine the ranking. The calls to start_caliper and end_caliper will be in the derby program when it is distributed on Monday.

Sorry about this change so late in the game, but it should not be too much of a hassle to switch to the new library if your are harnessing data parallelism. I just want to make sure that the derby results are as accurate as possible.

Tuesday, December 05, 2006

Advice for Optimizer final write-up

Here are some points to think about:
  • Use the provided programs to substantiate your implementation decisions. Benchmark the provided programs on the target architecture. Hand-implement the transformation first. The target architecture is complex to say the least. Don't waste time with ineffectual transformations.
  • Cover all of the transformations discussed in class, at the very least qualitatively, given the benchmark programs and the target architecture.
  • I would like to see an analysis of each implemented optimization (you can group optimizations if you feel they are symbiotic).
  • Discuss the reasons for you benchmarking results given your knowledge of the target architecture (look at my last recitation).
  • Describe your full optimizations option and the ideas/experiments behind the implementation.
  • Analyze your generated assembly. Look for non-traditional peephole optimizations.