Archive for the ‘Programming’ Category

GSoC Wrap-Up Review

Hello!

This year’s summer has just come to and end, and so has the Summer of Code. It’s time to go over what I did while working on a performance testing framework for DUNE.

First, I wrote a Python program that measures the run time and resource consumption of an external program. It stores quite a lot of data, but the most useful are definitely time spent, top memory consumption, and computer parameters such as number of CPUs. The tool can then output this data into a temporary log file, store into a sqlite database, or upload it to a central server. Furthermore, it can generate nice graphical representations of the data in the form of HTML pages with javascript graphs. It is written in a very modular way, so while there is a script to tie it all together, each of the described actions can be done separately. This also minimizes external dependencies, so if the user doesn’t have the Python SQLite module installed, the database part is skipped.

Then, with help from my mentors, I tied the measurement tool into the DUNE build system, or rather, both of DUNE’s currently available build systems, autotools and cmake. This allows you to set minimal configuration, and then just run “make perftest” from the build directory, and all performance tests are performed, measured, stored, uplodaded and visualized. By using the build system directly, we can get information about the compiler used and its flags. This is important if you ever want to compare compilation with different compiler options. When run this way, the tool separately measures both the compilation time and run time. The compilation time may get quite long with a lot of templates, unnecessary includes and in general large code files, so such a test comes in handy for identifying compilation bottlenecks.

For displaying the results, I used Twitter Bootstrap, Dygraphs and Table.js, so the generated pages look quite nice. Graphs are interactive, and some table columns can be filtered for easier browsing. Some examples are shown here.

Results of a single text run

Results of a single text run

final-vis-cr

Graphical representation of results of repeated runs of the same test

Finally, I added a server component, implemented as a number of CGI scripts in Python. One of these endpoints receives uploaded log files, while another stores them into a ‘proper’, PostgreSQL database. This two-step process is used, so that processing can be done in batches and separately from uploading. For example, one could easily upload the data in some other way, like the secure copying with SSH. The current uploading setup is not completely open, as it requires a username and password, but these are stored in plaintext on the server.

With a server that accepts data from multiple computers, I could add some additional views. For example, there is a page that identifies outliers with adjustable tolerance. Outliers are data points with considerable deviation from the mean, which in our cases means unusually long run or compile times.

Server overview of all collected data

Server overview of all collected data

Results of a single run on a server

Results of a single run on a server

Aggregated results of a single test on a server

Aggregated results of a single test on a server

A page for finding outliers on a server

A page for finding outliers on a server

All in all, I would say my summer project was a success. It started a little slow, at first with a two-week pause because of a summer school I attended, and then when had to finish my master’s thesis a month before I expected. However, despite not always following the set schedule, I tried really hard to complete everything I set out in the timeline. This project had several different components, from the DUNE libraries in C++, the two build systems, Python, database programming and websites, so I had to learn some new things over the summer. For this I am grateful, and I would like to thank the DUNE developers again for giving me this chance.

I’m sorry I can’t attend the DUNE developer meeting this week, even though the developers invited me and even offered to cover my expenses. I’m giving a talk on the optics of liquid crystals at a conference in Kranjska Gora that happens to take place the exact same three days. However, I can say I enjoyed working on this project, and can only hope that my contributions will help others.

Thank you!

Dune mid-term review

Hello!

Last week was the mid-term review for the GSoC. Because of this, I spend more time polishing and completing existing things than adding new ones. The biggest was documentation, I added docstrings to all the functions I’ve written since the start of coding. This should hopefully make it easier for everyone else to see what I did, but more importantly to extend it in the future.

This was, however, not all. Dune-perftest now has a couple (= two) example programs, written in C++ using the DUNE libraries. One is mostly empty and basically just measures the time needed for MpiHelper initialization, while the other works with matrices. Such programs will be used for monitoring the performance of DUNE itself. In order to build these C++ programs, I had to use the DUNE build system, based on autotools. I probably spent far more time than I should have on this one. As mostly a KDE developer, I am only used to CMake. I know that DUNE already supports CMake, and if I understand it correctly a complete move is planned, but at the moment I will include both.

There are no new screenshots, because graphically nothing has change since the last post. The actual generation of templates is somewhat improved, and the page (and graph) only shows data for the same command. I’m pretty happy with how both Bootstrap and Dygraphs turned out, I will probably redesign the page a little, but the graphs look good enough to me. However, I will add more information, starting with the memory footprint.

Now that the first half is over, I have to start planning ahead. My short-term goals are more automation and some statistics. More automation means you should be able to test multiple programs with one command. A couple more example C++ would help a lot for testing this. I will also make it possible to define both compile and run commands and have those associated with the same program. DUNE is mostly a template library, and these can often cause very long compile times. Once testing is automatic enough, there will be more data, so a need for meaningful statistics will arise. These can be basic enough, identifying outliers and general trends will be my first priorities.

 

Dune performance visualization – first graphs

This week I managed to put together all the separate parts of measuring, storing and visualizing program performance. Now, there is a single Python command that runs an external executable, measuring its time and memory consumption, stores it first in a log file and then in a SQL database, and finally produces an HTML report with a graph. A sample output can be seen on the following screenshot.

First visual results of performance testing

First visual results of performance testing

The document formatting is courtesy of Twitter Bootstrap, while the graphs are made with JavaScript using the free library Dygraphs. Of course I plan to add more data to them, not just how long a program takes vs. when it was run. There is also no filtering yet, the two measurements with noticeable higher durations were actually with a slightly different test program.

Instructions for running the test are included in the code repository in the README file. Neither Bootstrap nor Dygraphs are included in the repository, and they both have to be in a specific location to work. Apart from that, you just have to run “perftest.py” a couple of times (so that you have more than 1 point on the graph), and you already can see results similar to the ones above.

This week in Dune performance testing

I started my project of bringing performance measuring to DUNE almost a month ago. Unfortunately I was attending a physics summer school in Cambridge for two weeks, so I didn’t have any results to write about yet. Now I managed to put together the first week of actual work.

So far, it is possible to measure the running time of any external command, as well as some other data like memory consumption and CPU utilization. These measurements, together with information about the host computer, are then stored in a temporary log file. Plain text log file are not very useful for comparisons and finding trends, so I started on a kind of a toolchain. A measurement is first stored in a log file, then a second program reads the contents of the file and stores them into a SQL database, and finally a third script read the values from the database and outputs an HTML file with tables and charts.

The separation into three separate Python programs/modules is done so that only the first part has to be run locally. A user could thus measure the performance of DUNE and his own programs without installing a bunch of dependencies, which are needed for database operations and visualization.

So far, the first part (measurement) pretty much works. I only say “pretty much” because we will probably decide to add more measured data later. The second part (database) is a little behind, because I want to first decide on the data entry format and at least most of the measured fields. These are details such as whether to store maximum or average RAM usage, or maybe both. Otherwise, interfacing with a SQLite3 database is pretty straightforward and I don’t anticipate any troubles here. I have only just started on the third, visualization part. This one is the most flexible (and the most fun), so it’s hard to tell how long it will take. I created a couple of HTML template files, and am now adding the programmatic part of reading from the DB and displaying the data.

First week of DUNE

I’m not able to do much work while in Cambridge, but I do manage to blog about it. In the week before I came here, I started working on DUNE as part of this year’s Summer of Code. With my mentors, we decided it’s best to use Python for a utility that will measure and report the compilation and running time of DUNE programs. In the future, it will report more that just time, such as memory consumption and I/O, but I’m only starting now.

Basically, now I have a Python script that runs /usr/bin/time on a specified command and extracts the relevant information. The next step, which is only partially done, is to store this data in a log file. A separate script will then read the log files, store the data in a database and display it in a HTML file. However, this will have to wait for as long as I’m here in England.

[GSoC] Getting started

Today is the start of coding for Google Summer of Code 2013. This year I was accepted by DUNE, a Distributed and Unified Numerics Environment. It is a toolbox for solving partial differential equations, and as a student of Computational Physics I have some experience with that.

My project, however, won’t be directly related to differential equations. Instead, I will provide a tool for measuring, storing and analyzing the performance of DUNE applications. This project consists of several parts, and the final workflow will look like this.

  • Measure the time spent compiling and running a program
  • Store the result into a database, so they can be compared with previous data
  • Perform statistical analysis on the data, find outliers, general trends, and quantify optimizations.
  • Visualize the data as a HTML page with pretty graphs
  • Upload local data to a central server, where performance of the same program on different computers with different compilers can be compared
  • More statistics and visualizations performed on the central server

DUNE currently has helpers for some of the parts (log format, uploading), but the project will still require a lot of work with different languages and frameworks. I plan to use Python for the most part, but charts will require JavaScript. I will also start with some example DUNE programs, which will be written in C++.

My work will progress roughly in the order written above. I’m starting with a small script that measures the  running time of a program and writes it to a logfile. I’m leaving for a conference in England after this week, so I hope to complete at least this first item by then.

Finally, KDevelop file templates

The title of this post is slightly misleading, because you can’t use them in the stable version of KDevelop yet. However, after months of laziness schoolwork, I finally managed to upload some to http://kde-files.org. This means that anyone running KDevelop master can now download some of the templates they need. So far, I only made templates for things I am using: QObject subclasses with bells and whistles (properties, signals, private pointers), QML items, and a differential equation solver. This covers all my current programming activities, although I do hope to write some code (and templates) in Java for Android and/or some web framework, preferably Meteor.

template-complex-qt

Example class made with a template, only by specifying its name and members. Note the proper insertion of const references for QString (which is known to KDevelop as a class), and lack thereof for QDateTime which has not been included and parsed yet.

Now, I am certain that at least some of you write code in different languages and framework. I would like to have some more templates ready for the 4.5 release, so people won’t go “oh, there’s only three” and never look at it again. I am having some trouble with non-C++ languages, because they all have simpler syntaxes with less boilerplate and fewer files. Java comes close, but I as far as I know most people use Eclipse with it. In C++, using Qt and all the best practices, you have to do the following for every class member:

  • Add a public member to the private class
  • Add a Q_PROPERTY() declaration to the main class
  • Declare a getter, and a setter if the property is not read-only
  • Declare a signal
  • Define the getter and the setter in the implementation file

In QML or ORM’s like Django, you only need a single-line property declaration, while in pure Python you don’t really have to do anything. Obviously templates, as pretty much all code generation, are much more useful in C++. I presume there are other examples of such complicated languages, or at least different uses of the same ones. So I ask you for ideas, what kind of template would help you write code quicker. They don’t really have to be classes, for example unit testing frameworks require quite a bit boilerplate code and could be sped up by using pre-made templates. What are the things you write again and again, but are different enough to not be able to simply copy?

Loud talkers be-gone!

In the venerable words of ESR, every good work of software starts by scratching a developer’s personal itch. One of such itches of mine was the fact that my mother is very (VERY) loud when talking on the phone. Invariably, I had to turn the volume down every time she called me, or at least hold it away from my head.

No more, I said last week, and promptly wrote an app for that. It’s called Personal Volume and allows you to have customized per-contact call volume settings. I only have a handful of such contacts listed, but it really feels like a great headache-preventer.

The app is now available on the Google Play store for free. See it for yourself here.

[GSoC] Class and Test Templates in KDevelop

 

I have generalized the “Create Class” dialog, so that it starts with a template selection page that is similar to what you see when starting a new project. It offers you a selection of templates of different types (currently only Class and Test) and programming languages. Selecting a templates leads to further assistant pages, which are chosen dynamically depending on the selected template’s type.

The first page of the “Create from Template” dialog

I have added wrapper classes for template rendering with Grantlee (KDevelop::TemplateRenderer) and template archives (KDevelop::SourceFileTemplate), so the assistant only deals with those to keep the code clean and readable. Both class and tests use the same format of template archives (described here), only with a some different variables.

As you can see from the screenshot, there’s already quite a few available templates to choose from. Considering a template only has to be written once, every project could have one or more preferred templates for new code. Existing ones can always be re-used, or maybe just tweaked slightly, so a little work can improve coding style and consistency.

 

 

[GSoC] Templates in KDevelop – Week 5

My last report (here) was full of pictures, but since then I’ve spent more time polishing the functionality, behind-the-scene improvements, and fixing bugs. However, the mid-term evaluation is approaching, so I think let everybody know what’s the state of things. I’m happy with my progress, my schedule was a bit vague, but I think I’m ahead of it. So far I’m still enjoying it, and I bought a more comfortable chair, so I have little problems with working long hours.

I split out some functionality to make code more modular, which also allowed me to write unit tests for much of the newly added classes. The main parts of the code (TemplateRenderer, TemplateClassGenerator and TemplatesModel) are covered with test cases.

Class Templates

My main focus was still on templates for creating new classes. I slowly decided on the variables passed to templates. They are also documented, and I think I can start writing more templates about now. The C++ plugin adds some variables of its own, such as namespaces,

Since KDevelop’s Declarations must always point to a location in a source file, they cannot be created directly, I had to add my own classes for describing code before it is generated. This way, data members of a class can be declared. These code description classes are very simple, with only a couple of members, and are written so that Grantlee templates have access to all their properties.

The state of template class generation is such that it covers all functionality of the existing “Create Class” dialog. I have already written a basic C++ template that produces the same output. Of course, it is possible to create different classes, such as ones with private pointers, QObject’s macros, or in different languages.

Writing the Class Templates

As I said, I already wrote a template for a basic C++ class, as well as one with a d-pointer. Since I figured a lot of the code would be shared between templates for the same language, I added some basic templates that can be included. These are for method declarations, argument lists, namespaces, include guards, and some other small conveniences. There is also a library of custom template filters in kdevplatform. The end result is that the templates themselves can be relatively small. I even reduces the amount of whitespace in the rendered output, so that templates can be more readable while the generated classes are compact enough.

However, I don’t think it’s practical to store templates and filters for all possible languages in kdevplatform. So I intend to add a way for templates to specify dependencies on language plugin. Of course, they could still be written from scratch, or simply ship with the needed includes. It is merely a convenience.

The plan is to have language plugins provide some of their own templates and filters, but so far they are only for C++.

Templates, Templates Everywhere

Of course, KDevelop has other utilities for code generation, and I figured templates would be useful there as well. I started with inserting API documentation. The previous implementation manually constructed a Doxygen C++ style comment. I replaced that with a renderer template, which now supports C++, Php and Python.

Pressing Alt-Shitf-D on a declaration of a C++ functions results in this

API documentation with Doxygen for C++

While doing the same thing on a Python function produces this

API documentation with reST for Python

Not only is it formatted in reST, the most common format for Python documentation, but it also is positioned below the declaration, as a true Python docstring. Obviously, we still need to convert __kdevpythondocumentation_builtin type names to python types, but stripping a prefix can be done within a template thanks to Grantlee’s built-in filters.