You might perhaps know that I recently wrote a book (and blogged about it here). In its second chapter, where we discuss profiling and profilers, I mentioned Intel's VTune Amplifier, but because the book only uses open source tools I just wrote that:
"The commercial Intel performance suite comprises of VTune Amplifier, Intel Advisor, and Intel Inspector. VTune Amplifier is an advanced sampling profiler using hardware and OS counters, providing information on CPU, threading, memory, cache, and storage…"Admittedly, there was some free version of VTune at the time, but you had to request it and you could use it only for a limited time for evaluation (if my memory serves me right), so I just classified it as a commercial tool.
But this week I learned, that there is a free version of VTune 2019 everybody can use. I didn’t believe that but I went to VTune’s homepage to check it – and indeed, an entirely free versions is available there. The commercial version simply comes with commercial-level support, and there are no other differences AFAICS.
Stress-Test with MinGW
So I downloaded the free version and wanted to try it. I knew that Intel tools integrate tightly with Visual Studio, and that Intel’s compiler is (still is?) binary compatible with Microsoft’s compilers, so I was interested if VTune would be able to understand MinGW’s symbol information. I know for sure, that in the past VTune only could understand Microsoft’s debug info format.
I wanted to test it out, because in my book I am using the standard Qt distribution for Windows (at the time of writing it was Qt 5.9) and this distribution is using MinGW as compiler. This also means that the provided Qt DLLs contained symbol information is stored in the MinGW format (i.e. Dwarf2).
To test VTune's support for various debug info format I profiled a following application of mine:
+-----------------+ | | +-------------+ | | | | | QML GUI +--------->+ QProcess | | | | | | (MinGW) | +---+----^-+--+ | | | | | +-----------------+ start | | | cin, cout, cerr | | | | | | | | | +------v----+-v---------+ | | +------------------+ | | | | | "Backend" Executable +--------+ Some SDK DLLs | | | | | | (Visual Studio) | | (MinGW) | | | +------------------+ +-----------------------+As you can see, it’s rather a complicated situation where a MinGW-based GUI is starting a Visual Studio-base executable that is loading some SDK DLLs that export a plain C API, but are internally using MinGW! So let have a try and see how VTune will cope with that mess.
Surprisingly, when I started VTune and let my example application be started and profiled by it I saw the following picture:
We see that VTune can indeed see and understand symbols from Qt’s QML libraries, as well as from my own C++ code in some QML extension modules, which were also compiled with MinGW. A great hooray! But that’s not enough – on closer inspection I found out, that also the “backend” executable symbols (Microsoft format) were visible in profile data, and then even one step further, i.e. the internal symbols of the SDK DLL it is using (MinGW format again)!
Surprise, surprise, it seems to be working rather neatly. Nice one! It turned out that in a new profiling project the "Analyze child processes" option is turned on by default. I had only to provide the location of my sources, everything else just worked!
Summing Up
As an addendum (or call it correction) to the book, I can say that the VTune Amplifier is another option when profiling Qt applications on Windows!
Besides the traditional profiling to find execution hotspots, VTune has a very nice support for diagnosing multithreaded performance, visualizing lock contentions, it can help with the Intel CPUs by showing cache misses and pipeline stalls, it can diagnose memory access bottlenecks and also profile entire system with multiple applications. You’ll admit it’s quite a mighty* tool!
--
* Want to see ho to use all these features? There is a video guiding you around the GUI here, and here is an article about "Thread, Memory, and Vector Optimizations" using VTune and Intel Advisor, showing how to diagnose and improve usage of the available cores, optimize cache usage and introduce vectorization.