Posts Tagged ‘python’

GSoC Wrap-Up Review

Hello!

This year’s summer has just come to and end, and so has the Summer of Code. It’s time to go over what I did while working on a performance testing framework for DUNE.

First, I wrote a Python program that measures the run time and resource consumption of an external program. It stores quite a lot of data, but the most useful are definitely time spent, top memory consumption, and computer parameters such as number of CPUs. The tool can then output this data into a temporary log file, store into a sqlite database, or upload it to a central server. Furthermore, it can generate nice graphical representations of the data in the form of HTML pages with javascript graphs. It is written in a very modular way, so while there is a script to tie it all together, each of the described actions can be done separately. This also minimizes external dependencies, so if the user doesn’t have the Python SQLite module installed, the database part is skipped.

Then, with help from my mentors, I tied the measurement tool into the DUNE build system, or rather, both of DUNE’s currently available build systems, autotools and cmake. This allows you to set minimal configuration, and then just run “make perftest” from the build directory, and all performance tests are performed, measured, stored, uplodaded and visualized. By using the build system directly, we can get information about the compiler used and its flags. This is important if you ever want to compare compilation with different compiler options. When run this way, the tool separately measures both the compilation time and run time. The compilation time may get quite long with a lot of templates, unnecessary includes and in general large code files, so such a test comes in handy for identifying compilation bottlenecks.

For displaying the results, I used Twitter Bootstrap, Dygraphs and Table.js, so the generated pages look quite nice. Graphs are interactive, and some table columns can be filtered for easier browsing. Some examples are shown here.

Results of a single text run

Results of a single text run

final-vis-cr

Graphical representation of results of repeated runs of the same test

Finally, I added a server component, implemented as a number of CGI scripts in Python. One of these endpoints receives uploaded log files, while another stores them into a ‘proper’, PostgreSQL database. This two-step process is used, so that processing can be done in batches and separately from uploading. For example, one could easily upload the data in some other way, like the secure copying with SSH. The current uploading setup is not completely open, as it requires a username and password, but these are stored in plaintext on the server.

With a server that accepts data from multiple computers, I could add some additional views. For example, there is a page that identifies outliers with adjustable tolerance. Outliers are data points with considerable deviation from the mean, which in our cases means unusually long run or compile times.

Server overview of all collected data

Server overview of all collected data

Results of a single run on a server

Results of a single run on a server

Aggregated results of a single test on a server

Aggregated results of a single test on a server

A page for finding outliers on a server

A page for finding outliers on a server

All in all, I would say my summer project was a success. It started a little slow, at first with a two-week pause because of a summer school I attended, and then when had to finish my master’s thesis a month before I expected. However, despite not always following the set schedule, I tried really hard to complete everything I set out in the timeline. This project had several different components, from the DUNE libraries in C++, the two build systems, Python, database programming and websites, so I had to learn some new things over the summer. For this I am grateful, and I would like to thank the DUNE developers again for giving me this chance.

I’m sorry I can’t attend the DUNE developer meeting this week, even though the developers invited me and even offered to cover my expenses. I’m giving a talk on the optics of liquid crystals at a conference in Kranjska Gora that happens to take place the exact same three days. However, I can say I enjoyed working on this project, and can only hope that my contributions will help others.

Thank you!

Dune performance visualization – first graphs

This week I managed to put together all the separate parts of measuring, storing and visualizing program performance. Now, there is a single Python command that runs an external executable, measuring its time and memory consumption, stores it first in a log file and then in a SQL database, and finally produces an HTML report with a graph. A sample output can be seen on the following screenshot.

First visual results of performance testing

First visual results of performance testing

The document formatting is courtesy of Twitter Bootstrap, while the graphs are made with JavaScript using the free library Dygraphs. Of course I plan to add more data to them, not just how long a program takes vs. when it was run. There is also no filtering yet, the two measurements with noticeable higher durations were actually with a slightly different test program.

Instructions for running the test are included in the code repository in the README file. Neither Bootstrap nor Dygraphs are included in the repository, and they both have to be in a specific location to work. Apart from that, you just have to run “perftest.py” a couple of times (so that you have more than 1 point on the graph), and you already can see results similar to the ones above.

This week in Dune performance testing

I started my project of bringing performance measuring to DUNE almost a month ago. Unfortunately I was attending a physics summer school in Cambridge for two weeks, so I didn’t have any results to write about yet. Now I managed to put together the first week of actual work.

So far, it is possible to measure the running time of any external command, as well as some other data like memory consumption and CPU utilization. These measurements, together with information about the host computer, are then stored in a temporary log file. Plain text log file are not very useful for comparisons and finding trends, so I started on a kind of a toolchain. A measurement is first stored in a log file, then a second program reads the contents of the file and stores them into a SQL database, and finally a third script read the values from the database and outputs an HTML file with tables and charts.

The separation into three separate Python programs/modules is done so that only the first part has to be run locally. A user could thus measure the performance of DUNE and his own programs without installing a bunch of dependencies, which are needed for database operations and visualization.

So far, the first part (measurement) pretty much works. I only say “pretty much” because we will probably decide to add more measured data later. The second part (database) is a little behind, because I want to first decide on the data entry format and at least most of the measured fields. These are details such as whether to store maximum or average RAM usage, or maybe both. Otherwise, interfacing with a SQLite3 database is pretty straightforward and I don’t anticipate any troubles here. I have only just started on the third, visualization part. This one is the most flexible (and the most fun), so it’s hard to tell how long it will take. I created a couple of HTML template files, and am now adding the programmatic part of reading from the DB and displaying the data.

[GSoC] Templates in KDevelop – Week 5

My last report (here) was full of pictures, but since then I’ve spent more time polishing the functionality, behind-the-scene improvements, and fixing bugs. However, the mid-term evaluation is approaching, so I think let everybody know what’s the state of things. I’m happy with my progress, my schedule was a bit vague, but I think I’m ahead of it. So far I’m still enjoying it, and I bought a more comfortable chair, so I have little problems with working long hours.

I split out some functionality to make code more modular, which also allowed me to write unit tests for much of the newly added classes. The main parts of the code (TemplateRenderer, TemplateClassGenerator and TemplatesModel) are covered with test cases.

Class Templates

My main focus was still on templates for creating new classes. I slowly decided on the variables passed to templates. They are also documented, and I think I can start writing more templates about now. The C++ plugin adds some variables of its own, such as namespaces,

Since KDevelop’s Declarations must always point to a location in a source file, they cannot be created directly, I had to add my own classes for describing code before it is generated. This way, data members of a class can be declared. These code description classes are very simple, with only a couple of members, and are written so that Grantlee templates have access to all their properties.

The state of template class generation is such that it covers all functionality of the existing “Create Class” dialog. I have already written a basic C++ template that produces the same output. Of course, it is possible to create different classes, such as ones with private pointers, QObject’s macros, or in different languages.

Writing the Class Templates

As I said, I already wrote a template for a basic C++ class, as well as one with a d-pointer. Since I figured a lot of the code would be shared between templates for the same language, I added some basic templates that can be included. These are for method declarations, argument lists, namespaces, include guards, and some other small conveniences. There is also a library of custom template filters in kdevplatform. The end result is that the templates themselves can be relatively small. I even reduces the amount of whitespace in the rendered output, so that templates can be more readable while the generated classes are compact enough.

However, I don’t think it’s practical to store templates and filters for all possible languages in kdevplatform. So I intend to add a way for templates to specify dependencies on language plugin. Of course, they could still be written from scratch, or simply ship with the needed includes. It is merely a convenience.

The plan is to have language plugins provide some of their own templates and filters, but so far they are only for C++.

Templates, Templates Everywhere

Of course, KDevelop has other utilities for code generation, and I figured templates would be useful there as well. I started with inserting API documentation. The previous implementation manually constructed a Doxygen C++ style comment. I replaced that with a renderer template, which now supports C++, Php and Python.

Pressing Alt-Shitf-D on a declaration of a C++ functions results in this

API documentation with Doxygen for C++

While doing the same thing on a Python function produces this

API documentation with reST for Python

Not only is it formatted in reST, the most common format for Python documentation, but it also is positioned below the declaration, as a true Python docstring. Obviously, we still need to convert __kdevpythondocumentation_builtin type names to python types, but stripping a prefix can be done within a template thanks to Grantlee’s built-in filters.

Getting into Web Development

I’ve always been a desktop guy, preferring to use desktop applications over web ones wherever possible. And since I started programming, I only wrote desktop applications with Qt and KDE. Maybe it was the whole paradigm, or just a lucky choice of language and toolkit, but I liked it immediately and soon learned to feel at home with programming. I got used to C++, it’s strictness and plenty of compile-time error checking.

It was only recently that I decided to broaden my horizons. I got a smartphone, I’m using both a desktop computer and a laptop, so I’m starting to find it easier to store non-sensitive data in the cloud. I’ve had GMail for a long time, WordPress as well, and even used Facebook until I deleted my profile because it was a waste of time. This movement of course reflected on my programming as well. After writing Opeke and Knights (C++/KDE), contributing to Orange (Python/Qt), I have decided it’s time to learn something new again.

Platform Choices

The first problem I encountered was the choice of platform. On the desktop, everything is free (at least as in beer), no matter if you choose Windows, Qt, Gtk, or something else. You  write a program, other people compile and run it. After you’ve written the code, that’s it. On the web, everything is different: you (as the author/developer) have to run the application on your own infrastructure, and more importantly you also have to cover the cost. Fortunately, there are a couple of options that do it easily and cheaply. There’s probably more, but these three are the most used ones:

Amazon Elastic Cloud

Image representing Amazon Web Services as depi...

Image via CrunchBase

It lets you run any OS, but you have to manage the entire software stack yourself. This options grants you the most freedom, and it’s easy to migrate you application from here to a dedicated host and back. This is in my opinion the best option, if it weren’t for the price: the smallest package (micro instance) costs around 15$ a month, even if you don’t use it very much.

There is a free tier that lets you run the 15$ package for free, but it only lasts for a year. I’ve tried it with my http://noughmad.eu website, and it’s noticeable slow even with a small number of visitors (I had around 100 per day). I understand pricing gets better with volume, so as long as you’re prepared to use your own computer or another dedicated host while learning, writing and testing your app, it could be useful when your app is ready for the public. Yet again, because of it costliness, it’s not really appropriate for non-commercial fun projects.

Google AppEngine

Image representing Google App Engine as depict...

Image via CrunchBase

Unlike the Amazon cloud, Google provides a complete managed software stack. This means less work and worries for you, but it comes at a heavy price: Only Java, Python and Go are supported. The runtime environment (especially the database) is designed with scalability in mind, rather than compatibility with existing systems, so porting an app to and from GAE is not trivial.

Still, it’s possible to develop apps that will be compatible with the App Engine and still work without it. For Python, this can be done with Django and django-appengine. For Java, you can use GWT for presentation and JDO for database objects, both of which can work on GAE and on your own server. However, you always have to keep in mind the restrictions Google’s datastore imposes on your data query to ensure scalability.

Windows Azure

It comes from Microsoft, so I’m not touching it. I don’t even know what it offers, but from what I could tell it’s not free, which means bad news for a hobby programmer.

My Choice

I chose to start with Google App Engine. The main reason was cost: It’s free for small apps permanently. Out of the three supported languages I only know Python, so I used that, combined with the power and comfort of Django. There is a large number of modules for it, including Dajax (asynchronous requests) and a REST framework (to easily create an API).

Appearance

Another important difference between desktop and web programming is application visual appearance. On the desktop, the toolkit will guarantee a consistent look and feel with other programs, so you don’t have to think about how your buttons look. You just create a button, and users will see it will all sorts of decorations and text effects, and it will be just like all the buttons in all other programs.

On the web, the default appearance with most browsers looks like horrible. You have to specify the style of every single element of the page, setting colors, borders and layout. The separation between content and appearance is nice, but it would be even nicer if I didn’t have to worry about the appearance at all.

My First Web Application

For the learning experience, I decided on a task management tool. You can find it at http://ntasks.appspot.com. It’s not much yet, and most certainly it doesn’t look like much, but I’m starting to like Django so maybe it’s going to progress faster in the future.

Sudoku solver

I have programming experience mostly in C++, but we’re learning Python in school this year. The latest assignment was to write a sudoku solver, first a brute-force one and then a more intelligent implementation.

My current result takes about 0.1 seconds for a hard sudoku, which I think is fast enough, especially as I have no idea how could one improve it.