Monday, 25 February 2019

Holy cow, I wrote a book!


I always wanted to say that since I saw Raymond Chen's blog, and now. at last, I can! The title of the book is "Hands-On High Performance Programming with Qt 5" and you can get it from Amazon (here) or Packt Publishing (here). The book's cover looks like that:


I have to say that I rather like the look of it!

TL;DR

A new book About Qt's and C++ performance. Uses Qt 5.9 - 5.12 on Windows. Guides the reader from an intermediate to the (lower) advanced level. 4 of 5 stars 😉.

There are also code examples illustrating some of the discussed concetps (to be found at https://github.com/PacktPublishing/Hands-On-High-performance-with-QT).

What this book is about.

This book is about many things. Judging from the title it's about performance optimization of Qt 5
(Qt 5.9 - Qt 5.12) programs. But because Qt's underlying language is C++ it is also about C++ optimizations and performance. Because C++ is a close-to-the-metal language (some would say low-level) we also discuss the hardware architectures. Also because Qt framework covers so many areas we also discuss data structures and algorithms, multithreading, file I/O and parsing, GUI and graphics, networking, and even mobile and embedded platforms.

Quite a mouthful, you'd say? Yes, it's true. But it also makes it interesting, and shows how many facets there are to the performance optimization.

Additionally, I decided to use Windows as the development platform for this book. Most Qt books use Linux, probably because Qt Creator IDE offers a much better tooling there. But because Windows is also a very popular platform I wanted to take the road less traveled and try to find out how to compensate the lack of the standard Linux performance tools on Windows.

We use only open-source or free tools, Qt Creator as development environment and Qt 5.9 LTE version for code examples (as it was the last LTE version at the time I started writing this book).

Why did I write it?

Well, because I always found performance optimization very interesting. And because I have been working with Qt for quite a long time now, nothing to say about C++! So, as the publishers approached me with a proposed book title I knew I could write rather a good book about it!

The second reason was that there wasn't a resource about performance optimization I would be content with. There area a couple of C++ performance books, but I wasn't entirely convinced by them. As of Qt and its specific problems there was nothing! Well, there was a host of information scattered in blogs, Stack Overflow, books, articles and Twitter threads. So I thought that before I forget it all, I'd better put it to paper!

I also wanted to write a book I'd have liked to read when I first started to learn about performance and optimizations, because there wasn't such a thing either. Thus I settled on a intermediate-level, approachable, but not trivial book.

What material is covered

This book was planned as an intermediate level read. If you can write some C++ and have learned to write basic Qt application then it is for you. No further knowledge is requires, as every topic (e.g. data structures, graphics, networking) will be introduced in a understandable and (I hope so) entertaining manner.

My initial plan for the book was:
  • Part I - Basics
  • Intro - basic performance wisdom and techniques, hardware architecture and its impact.
  • Profiling - performance tools and how to use them. As I said, we don't want to make it easy for us and look at the Windows platform!
  • C++ - how performant is C++ really? And what are optimizations the compiler (and linker) can do?
  • Part II - General Techniques
  • Data structures and algorithms - what is the performance of Qt containers and strings? Do we have to use them?
  • Multithreading - Qt's take on threading and how to speed up your program with concurrency.
  • Fails - some more memorable performance problems I encountered in my career
  • Part III - Selected Qt Modules
  • File I/O etc - Qt and file system performance, JSON and XML parsing. Also memory mapped files and caching.
  • Graphic - GUI, widget and QML performance. Probably the most interesting part of the book.
  • Networking - network performance and network support in Qt. Because we live in the ubiquitous connectivity era!
However, the editors wished two additional chapters, so I also added them (unfortunately, this made the three-part structure obsolete):
  • Mobile and embedded - how to use Qt on mobile and embedded (a kind of advanced chapter applying all we have learned before to two specific platforms)
  • Testing - a goodie chapter about testing techniques, GUI testing and performance regression tests. At the end this chapter turned out pretty interesting!

I really think that this book can do a good job to guide the reader from an intermediate to the (lower) advanced level!

Table of Contents

On the publisher's book page there is a TOC (here), but it is very high level, and doesn't really show the real contents of the book. It goes rather monotonously like "Some intro for topic", "Qt classes fro that", "Performance techniques for that", "Summary", "Questions" and "Further reading". I agree with you that one cannot judge the level and quality of the book from that. It could be everything - from horrible to superb.

For that reason I include here a complete TOC as it appears in the book, so you can have a better idea of themes covered. Without much further ado, here it is:

   
OK, I tried, but it's way too long, it follows at the end* of the post.


You can see from that more detailed TOC that there is a wealth of information and techniques in there! I really think that This book can do a good job to guide the reader from an intermediate to the (lower) advanced level!

The "Questions" and "Further Reading" sections

As unseemly as they look in the TOC, these sections were my secret personal favorites. The "Questions" sections contain, as you probably guessed, some questions to test your understanding of themes discussed in the given chapter, but they also will try to deepen your understanding and sometimes even to introduce new and interesting information! I enjoyed writhing them because it was fun trying to find out how could I keep an intelligent reader still interested after the real material was already introduced.

As this is an intermediate-level book, and because I didn't want to write a 500-600 pages tome, it couldn't go deep on every of the introduced themes. Because of that I also included the "Further Reading" sections, where more advanced materials are referenced. Qt covers so many areas, and some themes like networking, graphics, embedded or mobile are so deep, that you will definitely need to consult more books and articles! In hindsight I should have to include even more references, but you know, If I had more time etc...

Summary

If I had more time, this book could (of course) be much better. But even so I'm rather content with it, considering that I wrote it in half of a year only, in parallel with my normal working hours. I hope you will enjoy it!

PS: Maybe I will start some kind of online addendum/errata for this book, as there are quite many things I'm still leaning after I finished writing it.

--
* Here comes the TOC in its whole glory:










Preface

Chapter 1: Understanding Performant Programs 1

  Why performance is important 1

    The price of performance optimization 2

  Traditional wisdom and basic guidelines 2

    Avoiding repeated computation 4

    Avoiding paying the high price 4

    Avoiding copying data around 5

    General performance optimization approach 6

  Modern processor architectures 7

    Caches 7

    Pipelining 8

      Speculative execution and branch prediction 10

      Out-of-order execution 10

    Multicore 11

    Additional instruction sets 12

    Impact on performance 13

      Keeping your caches hot 14

      Don't confuse your branch predictor 15

      Parallelizing your application 16

Summary 16

Questions 17

Further reading 17


Chapter 2: Profiling to Find Bottlenecks 19

  Types of profilers 20

    Instrumenting profilers 20

    Sampling profilers 21

    External counters 22

      Note on Read Time-Stamp Counter 22

  Platform and tools 22

    Development environment 23

    Profiling tools 24

      Just use gprof? 25

      Windows system tools 25

      Program profiling tools 27

      Visualizing performance data 30

    Memory tools 30

  Profiling CPU usage 31

    Poor man's sampling technique 31

    Using Qt Creator's QML profiler 32

    Using standalone CPU profilers 35

      Reiterating on sampling profiling's limitations 39

  Investigating memory usage 39

    Poor man's memory profiling 40

    Using Qt Creator's heob integration 41

  Manual instrumentation and benchmarks 45

    Debug outputs 45

    Benchmarks 46

      Benchmarks in regression testing 46

  Manual instrumentation 47

  Further advanced tools 48

    Event Tracing for Windows (ETW) and xperf 48

      Installation 48

      Recording and visualizing traces 50

      Conclusion 53

    GammaRay 54

      Building GammaRay 54

      When can we use it? 56

    Other tools 57

      Graphic profilers 57

      Commercial Intel tools 58

      Visual Studio tools 58

  Summary 59

  Questions 59


Chapter 3: Deep Dive into C++ and Performance 61

  C++ philosophy and design 61

    Problems with exceptions 62

       Run-time overheads 63

       Non-determinism 63

       RTTI 64

       Conclusion 64

    Virtual functions 64

  Traditional C++ optimizations 65

    Low-hanging fruit 65

      Temporaries 66

      Return values and RVO 66

      Conversions 67

    Memory management 68

      Basic truths 68

      Replacing the global memory manager 69

      Custom memory allocators 70

        Where they do make sense 71

        Stack allocators 71

        Conclusion 71

      Custom STL allocators 72

    Template trickery 73

      Template computations 73

      Expression templates 74

      CRTP for static polymorphism 75
      Removing branches 76

  C++11/14/17 and performance 77

    Move semantics 77

      Passing by value fashionable again 78

    Compile time computations 78

    Other improvements 80

  What your compiler can do for you 81

    Examples of compiler tricks 81

    More on compiler optimizations 85

      Inlining of functions 86

      Loop unrolling and vectorization 86

    What compilers do not like 87

      Aliasing 87

      External functions 88

    How can you help the compiler? 89

      Profile Guided Optimization 90

    When compilers get overzealous 90

  Optimization tools beyond compiler 92

    Link time optimization and link time code generation 93

    Workaround – unity builds 93

    Beyond linkers 94

  Summary 95

  Questions 95

  Further reading 96


Chapter 4: Using Data Structures and Algorithms Efficiently 97

  Algorithms, data structures, and performance 98

    Algorithm classes 98

      Algorithmic complexity warning 100

    Types of data structures 100

      Arrays 100

      Lists 101

      Trees 101

      Hash tables 102

  Using Qt containers 103

    General design 103

       Implicit sharing 103

       Relocatability 105

    Container classes overview 105

       Basic Qt containers 106

       QList 106

       QVarLengthArray 107

       QCache 108

    C++11 features 108

     
Memory management 109

    Should we use Qt containers? 110

  Qt algorithms, iterators, and gotchas 110

    Iterators and iterations 111

    Gotcha - accidental deep copies 111

  Working with strings 113

    Qt string classes 113

      QByteArray 113

      QString 114

      QStringBuilder 114

      Substring classes 115

    More string advice 115

      Interning 115

      Hashing 116

      Searching substrings 117

      Fixing the size 117

  Optimizing with algorithms and data structures 118

    Optimizing with algorithms 118

      Reusing other people's work 120

    Optimizing with data structures 120

      Be cache-friendly 121

      Flatten your data structures 122

      Improve access patterns 122

        Structure of arrays 122

        Polymorphism avoidance 123

        Hot-cold data separation 123

        Use a custom allocator 124

      Fixed size containers 124

      Write your own 125

  Summary 125

  Questions 125

  Further reading 126


Chapter 5: An In-Depth Guide to Concurrency and Multithreading 128

  Concurrency, parallelism, and multithreading 128

    Problems with threads 130

    More problems – false sharing 131

  Threading support classes in Qt 133

    Threads 134

    Mutexes 134

    Condition variables 135

    Atomic variables 136

    Thread local storage 136

    Q_GLOBAL_STATIC 136

  Threads, events, and QObjects 137

    Events and event loop 137

    QThreads and object affinities 138

      Getting rid of the QThread class 142

    Thread safety of Qt objects 142

  Higher level Qt concurrency mechanisms 142

    QThreadPool 143

    QFuture 143

    QFutureInterface 145

      Should we use it? 146

    Map, filter, and reduce 147

    Which concurrency class should I use? 149

  Multithreading and performance 150

    Costs of multithreading 150

      Thread costs 150

      Synchronization costs 151

        QMutex implementation and performance 151

      Atomic operation costs 151

      Memory allocation costs 152

      Qt's signals and slots performance 152

    Speeding up programs with threads 153

      Do not block the GUI thread 153

      Use the correct number of threads 154

      Avoid thread creation and switching cost 154

      Avoid locking costs 154

        Fine-grained locks 155

        Lock coarsening 155

        Duplicate or partition resources 156

        Use concurrent data structures 157

        Know your concurrent access patterns 157

        Do not share any data 157

        Double-checked locking and a note on static objects 158

      Just switch to lock-free and be fine? 159

        Lock-free performance 159

        Progress guarantees 160

      Messing with thread scheduling? 161

      Use a share nothing architecture 162

        Implementing a worker thread 162

        Active object pattern 163

        Command queue pattern 164

  Beyond threading 164

    User-space scheduling 164

    Transactional memory 165

    Continuations 165

    Coroutines 166

  Summary 167

  Questions 168

  Further reading 168


Chapter 6: Performance Failures and How to Overcome Them 170

  Linear search storm 170

    Context 171

    Problem 172

    Solution 172

    Conclusion 173

  Results dialog window opening very slowly 173

    Context 173

    Problem 174

    Solution 174

    Conclusion 174

  Increasing HTTP file transfer times 174

    Context 175

    Problem 175

    Solution 176

    Conclusion 177

  Loading SVGs 177

    Context 177

    Problem 178

    Solution 178

    Conclusion 179

  Quadratic algorithm trap 179

    Context 180

    Problem 180

    Solution 180

    Conclusion 180

  Stalls when displaying widget with QML contents 181

    Context 181

    Problem 181

    Solution 182

    Conclusion 182

  Too many items in view 182

    Context 183

    Problem 183

    Solution 183

    Conclusion 183

  Two program startup stories 184

    Time system calls 184

    Font cache 184

    Conclusion 185

  Hardware shutting down after an error message 185

    Context 185

    Problem 185

    Solution 185

    Conclusion 186

  Overly generic design 186

    Context 186

    Problem 187

    Solution 187

    Conclusion 187

  Other examples 187

  Summary 188

  Questions 189

  Further reading 189


Chapter 7: Understanding I/O Performance and Overcoming Related Problems 190

  Reading and writing files in Qt 191

    Basics of file I/O performance 191

      Buffering and flushing 191

      Tied and synchronized streams 192

      Reading and writing 193

      Seeking 194

      Caching files 194

    Qt's I/O classes 195

      QFile 195

      QTextStream and QDataStream 196

      Other helper I/O classes 198

      QDebug and friends 198

  Parsing XML and JSON at the speed of light 199

    QtXml classes 200

      QDomDocument 200

      QXmlSimpleReader 201

    New stream classes in QtCore 201

    Quick parsing of XML 202

    Reading JSON 203

      QJsonDocument's performance 204

  Connecting databases 204

    Basic example using SQLite 204

    Some performance considerations 205

  More about operating system interactions 206

    Paging, swapping, and the TLB 206

    Reading from disk 207

    Completion ports 208

  Summary 208

  Questions 208

  Further reading 209


Chapter 8: Optimizing Graphical Performance 210

  Introduction to graphics performance 211

    Graphics hardware's inner workings 211

    What is a GPU? 211

    OpenGL pipeline model 213

    Performance of the graphics pipeline 215

      CPU problems 217

      Data transfer optimization 217

      Costly GPU operations 217

    Newer graphics programming APIs 218

  Qt graphics architecture and its history 218

    The graphics API Zoo 219

      Qt Widget 219

      QGraphicalView 220

      QOpenGLWidget 221

      QVulkanWindow 222

      Qt Quick 223

      QtQuick Controls 1 and 2 224

      Extending QML 224

        Canvas 2D 224

        QQuickPaintedItem 225

        QQuickItem 226

        QQuickFrameBufferObject 227

        More APIs 228

      Qt 3D 229

    OpenGL drivers and Qt 231

      Graphic drivers and performance 231

      Setting the OpenGL implementation for QML 233

  Qt Widget's performance 234

     QPainter 234

      Images 234

      Optimized calls 235

    OpenGL rendering with QOpenGLWidget 236

      Images 236

      Threading and context sharing 236

      Usage of QPainter 237

    QGraphicsView 237

    Model/view framework 237

  QML performance 238

    Improvements in 5.9 and beyond 239

    Measuring QML performance 240

    Startup of a QML application 242

    QML rendering 243

      Scene graph optimizations 243

      Scene graph and threading 245

      Scene graph performance gotchas 245

        Batching 245

        Texture atlas 246

        Occlusion, blending, and other costly operations 246

        Antialiasing 246

        Use caching 247

      Which QML custom item should you choose? 247

      JavaScript usage 247

      Qt Quick Controls 248

  Other modules 248

    Qt 3D performance 249

    Hybrid web applications 249

  Summary 249

  Questions 250

  Further reading 251


Chapter 9: Optimizing Network Performance 252

  Introduction to networking 253

    Transport layer 254

      User Datagram Protocol (UDP) 254

      Transmission Control Protocol (TCP) 254

      A better TCP? 256

    Application layer 256

      Domain Name Service (DNS) 256

      HyperText Transfer Protocol (HTTP) 257

      Secure data transfer 258

      A better HTTP? 259

  Qt networking classes 259

    TCP and UDP networking classes 259

      QTcpServer and QTcpSocket 260

      QUdpSocket 261

      QAbstractSocket 262

      QSslSocket 264

      Other socket types 265

    HTTP networking using Qt classes 265

      DNS queries 265

      Basic HTTP 266

      HTTPS and other extensions 267

      Qt WebSocket classes 267

      Miscallaneous classes 268

    Other higher-level communication classes 269

      Qt WebChannel 269

      Qt WebGL streaming 269

      Qt remote objects 269

  Improving network performance 270

    General network performance techniques 270

    Receive buffers and copying 271

    TCP performance 271

    HTTP and HTTPS performance 272

      Connection reuse 273

      Resuming SSL connections 273

      Preconnecting 274

      Pipelining 275

      Caching and compression 276

      Using HTTP/2 and WebSocket 276

  Advanced networking themes 278

  Summary 278

  Questions 279

  Further reading 279


Chapter 10: Qt Performance on Embedded and Mobile Platforms 281

Challenges in embedded and mobile development 282

    Basic performance themes 282

    Run to idle 283

    Some hardware data 283

    Embedded hardware and performance 285

  Qt usage in embedded and mobile worlds 285

    Qt for embedded 286

      Qt usage on embedded Linux 286

      Qt's embedded tooling 287

      Supported hardware 288

      Example usage with Raspberry Pi 288

    Qt for mobile 289

      Android support in Qt Creator 289

      Profiling Android applications 290

      Mobile APIs in Qt 290

  Embedded Linux and Qt performance 291

    Executable size 291

    Minimizing assets 292

    Power consumption 292

    Start-up time 293

      Using the current Qt version 293

      Using loaders 294

      3D asset conditioning 294

      Linux start-up optimizations 294

      Hardware matters! 295

    Graphical performance 295

    Time series chart display 296

      Qt Charts and OpenGL acceleration 296

      Polyline simplifications 297

    Floating-point considerations 298

  Mobile-specific performance concerns 299

    Executable size 299

    Power usage 299

    Mobile networking 300

      Batch and piggyback 301

      Consider a push model 302

      Prefetch data 302

      Reuse connections 303

      Adapting to the current network connection type 303

    Graphic hardware 304

  Summary 304

  Questions 305

  Further reading 305


Chapter 11: Testing and Deploying Qt Applications 307

  Testing of Qt code 307

    Unit testing 308

      Qt Test 308

      Test support in Qt Creator 310

    Automated GUI testing 312

      Squish 312

      Example Squish test 313

    Performance regression testing 316

      Adding a qmlbench benchmark 316

      Using Squish 317

  Deploying Qt applications 318

    Flying parts 318

    Static versus dynamic builds 319

    Deploying on Windows 320

      Windows deployment tool 320

    Installation and paths 320

  Summary and farewell 321

  Questions 322

  Further reading 323



Appendix A: Responses to questions 324

Chapter 1 324

Chapter 2 325

Chapter 3 326

Chapter 4 328

Chapter 5 329

Chapter 6 331

Chapter 7 332

Chapter 8 333

Chapter 9 334

Chapter 10 336

Chapter 11 337