Thursday, 30 June 2022

Some praise for my Qt book

You might perhaps already know that I recently wrote a book (blogged about it here)! In the book I discussed various general program performance topics but also some more specific, Qt related ones, for example Qt's graphical or networking performance.
Recently I was very happy (and very surprozed) to have read some kind words about my book, and I just wanted to show off a little πŸ˜‡ and cite them here. So let's start!

First Ivica* mentioned he had my book, and it wouldn't be that bad:

After I thanked him, he said he actually liked it!

What I was particularly pleased about is (beside the "well written" phrase) that:
  • 95% ... can be applied to other frameworks
  • very good read for intermediate/advanced C++ developer
Why? Because all of that were my goals when writting the book, so I can congratulate myself on a good job! What can I say, thanks a lot, Ivica!

Thank you!

You may ask what the book was about, really? Let's have a look:
  • Part I - Basics
    • Intro - basic performance wisdom and techniques, hardware architecture and its impact.
    • Profiling - performance tools and how to use them. As I said, we don't want to make it easy for us and look at the Windows platform!
    • C++ - how performant is C++ really? And what are optimizations the compiler (and linker) can do?
  • Part II - General Techniques
    • Data structures and algorithms - what is the performance of Qt containers and strings? Do we have to use them?
    • Multithreading - Qt's take on threading and how to speed up your program with concurrency.
    • Fails - some more memorable performance problems I encountered in my programming career.
  • Part III - Selected Qt Modules
    • File I/O etc - Qt and file system performance, JSON and XML parsing. Also memory mapped files and caching.
    • Graphic - GUI, OpenGL, widget and QML performance. Probably the most interesting part of the book.
    • Networking - network performance and network support in Qt. Because we live in the ubiquitous connectivity era!
  • Part IV - Goodies
    • Mobile and embedded - how to use Qt on mobile and embedded (a kind of advanced chapter applying all we have learned before to two specific platforms)
    • Testing - a goodie chapter about testing techniques, GUI testing and performance regression tests. At the end this chapter turned out pretty interesting!
Got you interested? Then read my detailed blogpost introducing the book!

* I maybe have to mention, that Ivica has a well established reputation in the performance community -  just look up these links:
Impressive, isn't it?

Tuesday, 4 January 2022

Named parameters for C++11 with variadic templates vs a language feature

As I'm a little obsessed with emulating Python-style named parameters in C++ (of which several previous posts on this blog could serve as witnesses) when I saw the following line of code*:
  auto p = Popen({"cat", "-"}, input{PIPE}, output{"cat_fredirect.txt"});
I wanted to see how named parameters are implemented in this library. 

Variadic templates implementation

So, without much ado, here is the implementation of Popen. It is using variadic templates, but don't fear, it's not as complicated as it sounds:
  // 1. named parameters are bound to the variadic param set ...args

  template <typename... Args>
  Popen(std::initializer_list<const char*> cmd_args, Args&& ...args)
    vargs_.insert(vargs_.end(), cmd_args.begin(), cmd_args.end());
    // ...

  // 2. named params are forwarded to:

  template <typename F, typename... Args>
  inline void Popen::init_args(F&& farg, Args&&... args)
    // 3. let ArgumentDeducer do the job!
    detail::ArgumentDeducer argd(this);    
    // 4. now process next named param from variadic parameter pack:

  // the ArgumentDeducer class has an overload for each of the named parameter types

  inline void ArgumentDeducer::set_option(output&& out) {
    if (out.wr_ch_ != -1) popen_->stream_.write_to_parent_ = out.wr_ch_;
    if (out.rd_ch_ != -1) popen_->stream_.read_from_child_ = out.rd_ch_;
  inline void ArgumentDeducer::set_option(environment&& env) {
    popen_->env_ = std::move(env.env_);
  // etc, etc...  
This all is very nice, but also very tedious. You can implement it as part of your library to be friendly to your users, but most people won't be bothered to go to that lengths. What about some support from our language? Let have a look at the shiny new C++ standard (most of us aren't allowed to use yet...): 

The C++ 20 hack

In C++20 we can employ the following hack** that is using C99's init designators:
  struct Named 
    int size; int defaultValue; bool forced; bool verbose; 
    Named() : size(0), defaultValue(0), forced(false), verbose(false) {};

  void foo(Named);

  void bar()
    foo({ .size = 44, .forced = true});
Discussion: OK, this just looks like a hack, sorry! We could as well use a JSON literal as input parameter and circumvent the normal C++ function parameter mechanism altogether: 
  foo({ "size": 44, "forced": true});
The difference is that the Named-hack will result in members passed in registers, and JSON-hack won't. Unfortunately, the latter is more readable, and that's why we are doing all that in first place! 😞
The language proposal

As so often with C++, the various workarounds don't really cut it. So please, please Mr. Stroustrup, can we have named parameters as a language feature?

As a  matter of fact, there's even a "minimal" proposal for this feature:

but unfortunately, I don't know what it's status is just now... Is there even a possibility to check for status of different standard proposals??? Does anybody know?  πŸ€” (Please respond in the comments!)

If accepted, we could write in our case:
  void bar()
    foo(size: 44, forced: true);
Another example taken from the proposal:
  gauss(x: 0.1, mean: 0., width: 2., height: 1.);
Here you can see the benefits of that feature in its entirety as all the parameter passed to the gauss() functions are floats!!! 

Alas, althought this minimalistic proposal stems from 2018 it not part of C++20 (nor of C++23
AFAIK). What can I say - this just fits nicely in the more general notion of legendary user-unfriendliness of C++... 😞 To cheer everybody up - a classic: "but... we have ranges!".


Reading the "2021 C++ Standardization Highlights" blogpost*** I stumbled upon following formulation:
"Named Arguments Making a Comeback? 
While it has not yet been officially submitted as a P-numbered paper, nor has it been reviewed by EWG, I’ve come across a draft proposal for named arguments (in this formulation called designated arguments) which was circulated on the committee mailing lists (including the public std-discussion list); there’s also an accompanying video explainer. 
This looks to me like a thorough, well-researched proposal which addresses the concerns raised during discussions of previous proposals for named arguments ..."

 So maybe there's hope after all? Never say never!


** source: -> "Actually you don't even need to repeat "Named". The wonderful ({ .params }) style!"

*** found here: the referenced quasi-proposal is: "D2288R0 Proposal of Designated Arguments DRAFT 2" to be found at Google Docs here.

Friday, 31 December 2021

Beauty in Software Design (and Programming)

Recently I repeatedly heard the phrases "beautiful"/"not beautiful" used to describe and grade a software design, a bugfix or a piece of existing code. 

Personally, I didn't like it at all, as it seemed somehow simplistic, but I told to myself: be humble, don't judge, think before you speak... But then, out of the blue, I realized what disturbed me in that phrase! Let me present my argument.

Why didn't I like the "beauty" argument? Because it's too easy, too general and too unspecific.

Software design, as an engineering effort, is an exercise in finding a right tradeoff between different vectors: performance and code readibility, code quality and release deadlines, programming effort and importance of a feature, etc.

When we just fall back to the "beauty" criterion we are just ignoring the more complex reality behind our code. I'd even go as far and venture to say that it is the sign of an immature engineer.  

Let me cite from a blog post*:
"- Mature engineers make their trade-offs explicit when making judgements and decisions."
Let me state it again - we are not looking for beauty when programming, we are trying to survive in a world full of trade-offs. And a mature engineer won't try to ignore that and lock himself up in an ivory tower. Let me cite from a blog post again*:
"The tl;dr on trade-offs is that everyone cuts corners, in every project. Immature engineers discover them in hindsight, disgusted. Mature engineers spell them out at the onset of a project, accept them and recognize them as part of good engineering."

Have you seen the mammoths? They weren't beautiful**, not by any means! But they were optimally adapted to thrive in their habitat!

Here is one - should we reject him in a code review because of his ugliness?


What about beauty in mathematics***? Isn't programming "mathematics done with other means"

Isn't there the old adage among mathematicians that "if it's not beautiful, than it's probably wrong"? And des it apply to programming (via the argument that programming is "mathematics done with other means")? 

Well, my response to that is: mathematics is pure, in the sense that it doesn't have to make any trade-offs. 

Besides, I can't see much beauty in modern advanced mathematic like, for example, in proof of Fermat's theorem. Rather than that, it's just complicated and tedious. Looks like the low hanging fruit in mathematics has been already picked... 


A recent tweet expresses much the same sentiment:

** as a classic polish rock songtext stated (credits where credit is due!)

*** "there is some cold beauty in mathematics..." as Bertrand Russel said

My talk about the Emma architectural pattern

When working for one of my clients I learned about Emma (or rather EMMA :-) pattern and it got me quite intrigued. As far as I know, it's never been published anywhere, and frankly, there aren't many publications about embedded systems architecture anyway.

In the meanwhile, Burkhard Stuber from Embedded Use gave a talk about Hexagonal Architecture* pattern and argued in his newsletter that:
"In my talk at Meeting Embedded 2021, I argue that the Hexagonal Architecture should be the standard architecture for UI applications on embedded devices. As software and system architects we should never have to justify why we use the Hexagonal Architecture. In contrast, the people, who don't want to use Hexagonal Architecture, should justify their opinion." 
However, at the time of this talk, these ideas werent yet communicated, but at the moment I still think that even for UI-based applications on embedded devices EMMA could also be an option!

The EMMA architecture was successfully used in embedded devices at my client's, it follows an interesting idea and it wasn't published outside of the company before. So I secured an OK from my client and presented it for the general public!

The Talk

As they gave me permission to blog and speak about it, I gave a presentation on EMMA at the emBO++ 2021** conference:
The general TOC of the talk looks like this:
  • General intro about architecture patterns 
    - and also about embedded software architectures

  • Motivations and trade-offs
    - the motivation for an the trade-offs taken in the EMMA pattern

  • Overview of EMMA 
     - its layers, the various conventions it imposes on developers, the standard project structure
     - the basic control flow
     - the startup phase

  • Use Case 1
    - Bootloader on a Device Control MCU

  • Use Case 2
    - Target Tests for a Device

  • Use Case 3
    - Qt GUI on Yocto Linux and iMX-6

EMMA Architecture

And here it is, the short, non-formal description of Emma architecture pattern.

The name of the pattern is an acronym of:
  • – event-driven (asynchronous events for communication)
  • – multi-layered (layering to organize the )
  • – multi-threaded (i.e. threading has to follow guidelines)
  • – autonomous (i.e. minimal interfaces)
The motivation of the pattern is to avoid designs like this:

which, unfortunately, tend to spring to life in many programming projects!

The inspiration for the pattern comes from the ISO's seven layer model of a networking stack:

We can clearly see, that there are analogies between networking stack decomposition and embedded architecture layers! This is what got me interesten in first pplace

When to use EMMA? The original design argues that it is good for: 
  • Low-power 8-bit MCUs
  • Cortex M0-M4 class, 32-bit MCUs (e.g. STM32F4xx), RTOS, C
But in a recent project we also used it for:
  • Cortex A class, Linux, C++, Qt, UI
I'd say that it can be naturally extended to UI-driven embedded devices, where the A7 (Application) layer isn't a message loop but a Qt-based UI!

Thus it can be seen as an alternative to Hexagonal Architecture*, offering the one benefit, that it follows the more widely known Layered Architecture pattern!



** emBO++ 2021 | the embedded c++ conference in Bochum, 25.-27.03.2021

Thursday, 30 December 2021

My two talks about polymorphic memory resources in C++ 17/20

Quite a time ago I was listening to Jason Turner's C++ Weekly‘s Episode 250* about PMRs (i.e. C++17's Polymorphic Memory Resources) and he mentioned some advanced usage techniques that can be accomplished using them. 

Althought I dare to consider myself to be quite an knowlegeable C++ programmer, I just couldn't get it - I didn't understand how these techniques are working and what are they good for! What a disaster! Myself not understanding C++??? That cannot be true! Well, it was, but I wasn't willing to leave it like that.

So I started to look into the whole memory allocators story in C++. 

What have I found? Well:

  • the surprisingly lacking design of C++98 allocators
  • a fascinating story of library design and evolution
  • understanding of some techniques which aren‘t generally known

Talk 1

So I told myself, that this could be a material for a conference talk, and I just did it!  I mean, I gave a presentation at  C++Italy 2021 conference**:

I titled the talk "PMRs forPerformance in C++17-20", but as I can see in retrospect, it was a little misleading because performance wasn't the main theme of it. Instead its TOC looked like this:

  • Memory Allocators for Performance
    - how allocators can increase performance (OK, at last some general performance knowledge was brought forth here!)

  • Allocators in C++98 and C++11
    - how (and why) allocators were broken in C++98, how Bloomberg's design tried to fix it, and what parts of it were incorporated in C++11

  • Allocators in C++17 and PMRs
    - how the remaining parts of Bloomberg's design were addded in C++17 and what classes in the pmr namespace are implementing it

  • Usage Examples
    - here more pmr:: classes, functions and mechanisms were introduced

  • Advanced PMR Techniques
    - at last, the 2 techniques that started all of that were explained here - Wink-Out and Localized Garbage Collection, but also the EBO optimization technique for allocators was mentioned

  • Allocators in C++20 (and Beyond)
    - here I discussed PMR extensions added in C++20, talked about C++23's test_resource proposal and hinted at the possibility of delegating all of the allocator mess (sic!) to the compiler

So, it was more a journey through allocator's evolution up to the PMRs and about some more advanced  techniques than PMR's impact on performance, I must admit. But all in all, I was rather happy with my talk - there was lot of material covered! 

However, promptly, the inevitable question popped up - is the Wink-Out technique UB-safe***? Ooops, I didn't expect that! I tried to conjecture something but in the end I had to admit that I wasn't sure. πŸ˜• 

Besides, I was only able to give Wink-Out and Localized GC examples in C++20 as C++17 lacked some PMR features added in C++20. πŸ˜• And I also wanted to look on the interplay of PMRs and smart pointers but didn't quite go round to it...

Talk 2

At that point was clear, that I have to do more digging! The result of it was a second talk, which I proposed for the 2021 Meeting C++ conference. Luckily, it got accepted (yay!): 

This time I wanted to concertrate on the 2 techniques, that I could only shortly discuss in the previous talk, so I tried to cut out the library evolution stuff and add discussion of the UB problems, C++17 vs C++20 code and smart pointer usage. 

Finally the TOC of the presentation looked like this:
  • Intro: memory allocators for performance and more
    - again, how and why allocators can increase performance

  • STL Allocators vs PMRs
    - how PMRs improve on the traditional C++ allocator model (spoiler - by using a virtual base class!)

  • Advanced techniques
    1. Wink Out (and/or arenas)
      - here I also cleared the Wink-Out UB question. It was already answered in some proposal or TR (as I noticed later!) but I found a C++ Standard's section  9.2.2. [] which says:

      "For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released"

      So winking-out is allowed within C++ object lifetime framework!

      - also real-life examples of the technique were presented: Google's Protocol Buffers and JSON data struct implementation in C++ Actor Framework

    2. Localized GC (aka. self contained heaps)
      - with localized GC we also discussed how smart pointers work with PMRs, where they both collide and we even touched on C++20 destroying delete feature!

  • Some lessons
    - this time we also discussed the usage costs and open problems with PMRs and concluded that there are still some of them. Maybe pushing it all down to the compiler via a language mechanism could improve the situation?
All in all, I though that I cut out so much from my initial slide set, but I was still short for time! I wonder how I managed to squeeze in so much material in the first talk!

The two techniques

The example code for both talks can be found here: PmrTests.cpp, so you can just check it out (or listen to my presentations πŸ™‚), but for completeness' sake I'll briefly introduce those two advanced techniques also in this post.

Winking out tries to omit all calls to a destructors of in given container and reclaims the used memory when the allocator (and it's PMR) goes out of scope. Of course no side-effects in destructors are allowed here!

The slide below shows the example code:

Localized GC uses the above technique to avoid unbounded recursion and stack exhaustion when deleting complicated graph structures where nodes are connected by smart pointers. We just fall back to naked pointers and wink-out the entire graph. Graph cycles are also no problem now!

The slide below shows the example code:

Want to learn more? Listen to presentations or read the slides. Or just ask your question in the comment section.


Here I will add a list of the resources I used when preparing the 2 talks: mainly standard proposals and technical reports of the C++ standard committee. 

--> OPEN TODO: please be patient a liitle more, I promise it comes soon!


* C++ Weekly, Ep.250: "Custom Allocation -How, Why, Where (Huge multi threaded gains and more!) "

** C++Italy 2021 - Italian C++ Conference 2021, 19.06.2021, online

*** UB: Undefined Behaviour, the bane of C++ developer's existence

Thursday, 31 December 2020

A new code example for the "Qt Performance" book

Recently I added a new example* to my "Qt5 Performance" book's resources that I didn't manage to include in the original release - the trace macros creating output in chrome:tracing format. 

I think it's a very cool technique - instrument your code and than inspect the result as a graphic! On Linux we have flame graph support in the standard tooling integrated with Qt Creator.

On Windows, however, we do not have such a thing! And because I decided to use Windows as the deveolpment platform for my book, we have a kind of problem here, so creative solutions are needed!

And I already came up with an idea for that in the book - we can use libraries (like minitrace, the library we are using in this example ) to generate profiling output in a format that the Chrome's** trace viewer will understand! You didn't know that Google's Chrome had a built-in profiler? Some people assume it’s only for profiling “web stuff” like JavaScript and DOM, but we can use it as a really nice frontend for our own profiling data.

However, due to lack of time I couldn't try it out when I was writing the book 😞, but with the new example I added  I was able to generate output you see visualizedin the figure below:

Here is the code example I used - as you can see, you have only to insert some MTR_() macros here and there:
  int main(int argc, char *argv[])

    MTR_META_THREAD_NAME("Main GUI tread");
    MTR_BEGIN("main", "main()");

    QGuiApplication app(argc, argv);

    QStringList countryList;
    countryList << "Thinking...";

    MTR_BEGIN("main", "load QML View");
    QQuickView view;
    QQmlContext *ctxt = view.rootContext();
    ctxt->setContextProperty("myModel", QVariant::fromValue(countryList));

    MTR_END("main", "load QML View");

    // get country names from network
    MTR_BEGIN("main", "init NetworkMgr");
    QNetworkAccessManager networkManager;
    MTR_END("main", "init NetworkMgr");

PS: heob-3 (i.e the new version of the heob memory profiler we discussed in the book) has a sampling profiler with optional flame graph output!!! I think I have to try it out in the near future!

* you can find the new example here, and the generated JSON file here

** Chrome browser's about:tracing tool. You can use it like this:

  • go to chrome://tracing in Chrome
  • click “Load” and open your JSON file (or alternatively drag the file into Chrome)
  • that's all

More about basic QML optimizations

You might perhaps already know that I recently wrote a book (blogged about it here)! In the book we discuss various general program performance topics but also some more specific, Qt related ones, for example Qt's graphical or networking performance.

Recently, as I was watching some QML presentation I realized that some very basic pieces of performance advice I just glossed over in the book could (and should) be explained in a much more detailed manner to build a general understanding of this technology's caveats.

In Chapter 8.4, where the book discusses QML's performance, I simply wrote that:
"If an item shouldn't be visible, set its visible attribute to false, this will spare the GPU some work. In the same vein, use opaque primitives and images where possible to spare GPU from alpha blending work."
As I re-read this section on some occasion I somehow had to notice that what seemed pretty clear in writing, isn't that clear when read a year later! I started to think why this passage is not that lucid as it should be, and I noticed, that my intended formatting was gone! Now we have in the book:
"... set its visible attribute to false" 
instead of
"... set its visible attribute to false"
as I wrote it initially!!! The formatting went somehow missing and I didn't notice it while proofreading. For me this change rendered the sentence pretty unintelligible :-/.

But anyway, I could have explained it in a little more detailed manner* - so let's try to make a better job and explain why the visible attribute is so important!

To state it bluntly - QML won't do any optimizations to prevent drawing items the user cannot see, like items which are out of bounds or are completely obscured or clipped** by other items! It just draws every single item with visible set to true, totally unconcerned with its real visibility! So we have to optimize manually and:
"... set its visible attribute to false"
But how on earth should we know which items are invisible in which situation? Easy, just set this environment variable: QSG_VISUALIZE=overdraw and Qt will show the overdrawn items using a different color and making them visible as can be seen in this pic from Qt documentaion:

That's pretty cool!

This was my state of knowledge until recently (learned about that in this KDAB presentation) but than I read this line in Qt Quick 2D Renderer's documentation:
Qt Quick 2D Renderer will paint all items that are not hidden explicitly with either the visibility property or with an opacity of 0. Without OpenGL there is no depth buffer to check for items completely obscured by opaque items, so everything will be painted - even if it is unnecessary.
Qt Quick 2D Renderer is a raster-graphic replacement for the standard OpenGL-based QML renderer. Does that (by tertium non datur) mean that with OpenGL we do use the depth buffer after all? This could be an unfortunate formulation, so we'd like to stay on the conservative side but maybe a little more investigation would be appropriate in this case. Have you heard anything about it?

* This might be due to the very tight deadlines :-(

** as Qt documentation says:
"If clipping is enabled, an item will clip its own painting, as well as the painting of its children, to its bounding rectangle."