Archive for the ‘Programming’ Category

GSoC Wrap-Up Review


This year’s summer has just come to and end, and so has the Summer of Code. It’s time to go over what I did while working on a performance testing framework for DUNE.

First, I wrote a Python program that measures the run time and resource consumption of an external program. It stores quite a lot of data, but the most useful are definitely time spent, top memory consumption, and computer parameters such as number of CPUs. The tool can then output this data into a temporary log file, store into a sqlite database, or upload it to a central server. Furthermore, it can generate nice graphical representations of the data in the form of HTML pages with javascript graphs. It is written in a very modular way, so while there is a script to tie it all together, each of the described actions can be done separately. This also minimizes external dependencies, so if the user doesn’t have the Python SQLite module installed, the database part is skipped.

Then, with help from my mentors, I tied the measurement tool into the DUNE build system, or rather, both of DUNE’s currently available build systems, autotools and cmake. This allows you to set minimal configuration, and then just run “make perftest” from the build directory, and all performance tests are performed, measured, stored, uplodaded and visualized. By using the build system directly, we can get information about the compiler used and its flags. This is important if you ever want to compare compilation with different compiler options. When run this way, the tool separately measures both the compilation time and run time. The compilation time may get quite long with a lot of templates, unnecessary includes and in general large code files, so such a test comes in handy for identifying compilation bottlenecks.

For displaying the results, I used Twitter Bootstrap, Dygraphs and Table.js, so the generated pages look quite nice. Graphs are interactive, and some table columns can be filtered for easier browsing. Some examples are shown here.

Results of a single text run

Results of a single text run


Graphical representation of results of repeated runs of the same test

Finally, I added a server component, implemented as a number of CGI scripts in Python. One of these endpoints receives uploaded log files, while another stores them into a ‘proper’, PostgreSQL database. This two-step process is used, so that processing can be done in batches and separately from uploading. For example, one could easily upload the data in some other way, like the secure copying with SSH. The current uploading setup is not completely open, as it requires a username and password, but these are stored in plaintext on the server.

With a server that accepts data from multiple computers, I could add some additional views. For example, there is a page that identifies outliers with adjustable tolerance. Outliers are data points with considerable deviation from the mean, which in our cases means unusually long run or compile times.

Server overview of all collected data

Server overview of all collected data

Results of a single run on a server

Results of a single run on a server

Aggregated results of a single test on a server

Aggregated results of a single test on a server

A page for finding outliers on a server

A page for finding outliers on a server

All in all, I would say my summer project was a success. It started a little slow, at first with a two-week pause because of a summer school I attended, and then when had to finish my master’s thesis a month before I expected. However, despite not always following the set schedule, I tried really hard to complete everything I set out in the timeline. This project had several different components, from the DUNE libraries in C++, the two build systems, Python, database programming and websites, so I had to learn some new things over the summer. For this I am grateful, and I would like to thank the DUNE developers again for giving me this chance.

I’m sorry I can’t attend the DUNE developer meeting this week, even though the developers invited me and even offered to cover my expenses. I’m giving a talk on the optics of liquid crystals at a conference in Kranjska Gora that happens to take place the exact same three days. However, I can say I enjoyed working on this project, and can only hope that my contributions will help others.

Thank you!


Dune mid-term review


Last week was the mid-term review for the GSoC. Because of this, I spend more time polishing and completing existing things than adding new ones. The biggest was documentation, I added docstrings to all the functions I’ve written since the start of coding. This should hopefully make it easier for everyone else to see what I did, but more importantly to extend it in the future.

This was, however, not all. Dune-perftest now has a couple (= two) example programs, written in C++ using the DUNE libraries. One is mostly empty and basically just measures the time needed for MpiHelper initialization, while the other works with matrices. Such programs will be used for monitoring the performance of DUNE itself. In order to build these C++ programs, I had to use the DUNE build system, based on autotools. I probably spent far more time than I should have on this one. As mostly a KDE developer, I am only used to CMake. I know that DUNE already supports CMake, and if I understand it correctly a complete move is planned, but at the moment I will include both.

There are no new screenshots, because graphically nothing has change since the last post. The actual generation of templates is somewhat improved, and the page (and graph) only shows data for the same command. I’m pretty happy with how both Bootstrap and Dygraphs turned out, I will probably redesign the page a little, but the graphs look good enough to me. However, I will add more information, starting with the memory footprint.

Now that the first half is over, I have to start planning ahead. My short-term goals are more automation and some statistics. More automation means you should be able to test multiple programs with one command. A couple more example C++ would help a lot for testing this. I will also make it possible to define both compile and run commands and have those associated with the same program. DUNE is mostly a template library, and these can often cause very long compile times. Once testing is automatic enough, there will be more data, so a need for meaningful statistics will arise. These can be basic enough, identifying outliers and general trends will be my first priorities.


Dune performance visualization – first graphs

This week I managed to put together all the separate parts of measuring, storing and visualizing program performance. Now, there is a single Python command that runs an external executable, measuring its time and memory consumption, stores it first in a log file and then in a SQL database, and finally produces an HTML report with a graph. A sample output can be seen on the following screenshot.

First visual results of performance testing

First visual results of performance testing

The document formatting is courtesy of Twitter Bootstrap, while the graphs are made with JavaScript using the free library Dygraphs. Of course I plan to add more data to them, not just how long a program takes vs. when it was run. There is also no filtering yet, the two measurements with noticeable higher durations were actually with a slightly different test program.

Instructions for running the test are included in the code repository in the README file. Neither Bootstrap nor Dygraphs are included in the repository, and they both have to be in a specific location to work. Apart from that, you just have to run “” a couple of times (so that you have more than 1 point on the graph), and you already can see results similar to the ones above.

This week in Dune performance testing

I started my project of bringing performance measuring to DUNE almost a month ago. Unfortunately I was attending a physics summer school in Cambridge for two weeks, so I didn’t have any results to write about yet. Now I managed to put together the first week of actual work.

So far, it is possible to measure the running time of any external command, as well as some other data like memory consumption and CPU utilization. These measurements, together with information about the host computer, are then stored in a temporary log file. Plain text log file are not very useful for comparisons and finding trends, so I started on a kind of a toolchain. A measurement is first stored in a log file, then a second program reads the contents of the file and stores them into a SQL database, and finally a third script read the values from the database and outputs an HTML file with tables and charts.

The separation into three separate Python programs/modules is done so that only the first part has to be run locally. A user could thus measure the performance of DUNE and his own programs without installing a bunch of dependencies, which are needed for database operations and visualization.

So far, the first part (measurement) pretty much works. I only say “pretty much” because we will probably decide to add more measured data later. The second part (database) is a little behind, because I want to first decide on the data entry format and at least most of the measured fields. These are details such as whether to store maximum or average RAM usage, or maybe both. Otherwise, interfacing with a SQLite3 database is pretty straightforward and I don’t anticipate any troubles here. I have only just started on the third, visualization part. This one is the most flexible (and the most fun), so it’s hard to tell how long it will take. I created a couple of HTML template files, and am now adding the programmatic part of reading from the DB and displaying the data.

First week of DUNE

I’m not able to do much work while in Cambridge, but I do manage to blog about it. In the week before I came here, I started working on DUNE as part of this year’s Summer of Code. With my mentors, we decided it’s best to use Python for a utility that will measure and report the compilation and running time of DUNE programs. In the future, it will report more that just time, such as memory consumption and I/O, but I’m only starting now.

Basically, now I have a Python script that runs /usr/bin/time on a specified command and extracts the relevant information. The next step, which is only partially done, is to store this data in a log file. A separate script will then read the log files, store the data in a database and display it in a HTML file. However, this will have to wait for as long as I’m here in England.

[GSoC] Getting started

Today is the start of coding for Google Summer of Code 2013. This year I was accepted by DUNE, a Distributed and Unified Numerics Environment. It is a toolbox for solving partial differential equations, and as a student of Computational Physics I have some experience with that.

My project, however, won’t be directly related to differential equations. Instead, I will provide a tool for measuring, storing and analyzing the performance of DUNE applications. This project consists of several parts, and the final workflow will look like this.

  • Measure the time spent compiling and running a program
  • Store the result into a database, so they can be compared with previous data
  • Perform statistical analysis on the data, find outliers, general trends, and quantify optimizations.
  • Visualize the data as a HTML page with pretty graphs
  • Upload local data to a central server, where performance of the same program on different computers with different compilers can be compared
  • More statistics and visualizations performed on the central server

DUNE currently has helpers for some of the parts (log format, uploading), but the project will still require a lot of work with different languages and frameworks. I plan to use Python for the most part, but charts will require JavaScript. I will also start with some example DUNE programs, which will be written in C++.

My work will progress roughly in the order written above. I’m starting with a small script that measures the  running time of a program and writes it to a logfile. I’m leaving for a conference in England after this week, so I hope to complete at least this first item by then.

Finally, KDevelop file templates

The title of this post is slightly misleading, because you can’t use them in the stable version of KDevelop yet. However, after months of laziness schoolwork, I finally managed to upload some to This means that anyone running KDevelop master can now download some of the templates they need. So far, I only made templates for things I am using: QObject subclasses with bells and whistles (properties, signals, private pointers), QML items, and a differential equation solver. This covers all my current programming activities, although I do hope to write some code (and templates) in Java for Android and/or some web framework, preferably Meteor.


Example class made with a template, only by specifying its name and members. Note the proper insertion of const references for QString (which is known to KDevelop as a class), and lack thereof for QDateTime which has not been included and parsed yet.

Now, I am certain that at least some of you write code in different languages and framework. I would like to have some more templates ready for the 4.5 release, so people won’t go “oh, there’s only three” and never look at it again. I am having some trouble with non-C++ languages, because they all have simpler syntaxes with less boilerplate and fewer files. Java comes close, but I as far as I know most people use Eclipse with it. In C++, using Qt and all the best practices, you have to do the following for every class member:

  • Add a public member to the private class
  • Add a Q_PROPERTY() declaration to the main class
  • Declare a getter, and a setter if the property is not read-only
  • Declare a signal
  • Define the getter and the setter in the implementation file

In QML or ORM’s like Django, you only need a single-line property declaration, while in pure Python you don’t really have to do anything. Obviously templates, as pretty much all code generation, are much more useful in C++. I presume there are other examples of such complicated languages, or at least different uses of the same ones. So I ask you for ideas, what kind of template would help you write code quicker. They don’t really have to be classes, for example unit testing frameworks require quite a bit boilerplate code and could be sped up by using pre-made templates. What are the things you write again and again, but are different enough to not be able to simply copy?