What We Learned about Continuous Integration from Analyzing 2+ Million Travis Builds

This is a guest post from Moritz Beller from the Delft University of Technology in The Netherlands. His team produced amazing research on several million Travis CI builds, creating invaluable insights for us and, more importantly, the open source community as a whole. We’re blown away by the results and the level of details they have put into this study, and we’re excited that they’re sharing these results with the wider community. Thank you! – Mathias Meyer, CEO

Over the past five years, Travis CI has performed hundreds of millions of free builds for the Open-Source Community. Along with every build, Travis provides the community access to its build log, a textual execution trace of the build output. By combining build and build log information with data from GitHub and the commit that triggered it, we, researchers at the Delft University of Technology, set out to explore how Travis CI is used. In our investigation of 2,640,825 builds, we focused on the adoption of Travis CI in the GitHub community, why builds commonly break, and whether the use of multiple integration environments leads to different build results, thus possibly exposing bugs that would not otherwise have been found. So far, this rich source of data has gone largely unexplored. In the following, we provide a short overview of our findings.

We measured the adoption of Travis CI on a pre-filtered sample of 58,032 active, non-toy and non-fork GitHub projects. Each of them had more than 50 stars in the 19 most popular languages on GitHub. 16,159 of these projects used Travis CI for at least one build, resulting in an overall Travis CI usage rate of 27.8%. When only considering projects in one of the 26 languages supported by Travis CI, we are left with 43,695 projects (75.3% of all projects). Out of these, 13,590 (31.1%) actually used Travis CI for at least one build. Even projects whose primary language Travis does not yet support, use it! This might be due to the fact that a secondary language in the projects might be set-up for building, or developers adopted a custom-made build process.

A third of popular GitHub projects already make use of the free Travis CI service.

Similar to our 50 star filtering, we further explored projects that have more than 50 Travis builds and thus a reasonably long history of builds. Figure 1 shows a boxplot of the number of builds broken down per language.

Fig 1: Boxplots of the median (|) and mean (⊕) number of builds per project and language

When looking at the number of builds per project and language in Figure 1, we find that projects in many of these very different programming languages are using Continuous Integration (CI) with a relatively similar frequency. It is not the case that dynamic languages, often associated with quick coding and rapid prototyping (like Ruby), have more builds than projects in a more traditional, statically typed language like C or C++.

Continuous Integration seems to be a concept that lends itself equally well for projects in many different language categories.

What causes builds to break?

Inevitably, when doing Continuous Integration, you will have to deal with broken builds. We generally consider a build broken when its end result is not successful for any reason, be it a provisioning or compile error. However, when the build breaks during testing or static analysis, we consider it failed. Failed builds are helpful in that they often allow developers to uncover a regression or bug before it is accidentally released. However, they also inevitably mean work, as someone has to check what went wrong. A compilation error is fundamentally different to debug than a testing error or a provisioning problem, and different persons might be responsible for handling the different types of problems. It is therefore important to know the reasons for why builds break.

Fig 2: Build status break down of 2,640,825 analyzed Travis builds: Most fail due to test executions.

Figure 2 shows a breakdown of build status on our 2,640,825 analyzed Travis builds. We see only a small percentage of builds are canceled by the users, and that errored builds due to provisioning or infrastructural problems make up only another 5 percent points. By and large, builds are successful.

The single dominant reason for builds to break are failing tests, more prevailing than all other reasons combined.

This is true irrespective of the language of the project. Ruby builds have a higher likelihood of failing (33% higher), particularly failing in the testing phase, than Java builds. We believe that the dynamic nature of Ruby plays a large role in this. Of course, the Ruby projects also had ten times more tests on average than the Java projects.

All other things being equal, developers in a dynamically typed language like Ruby can expect a higher number of failed builds than in a statically typed language like Java. On the other hand, CI might be even more important for them to capture bugs early on.

Do multiple integration environments help me catch regressions?

Travis makes building in multiple environments easy. But their use comes at a cost: Builds take longer and it is unclear whether they really expose problems in the product. To identify whether multiple integration environments do indeed help, we counted the number of times a build result from one integration environment deviated from the others. If this is the case, we consider it helpful, as it catches a potential problem that would not otherwise have been found.

We observe that in total, 11.4% of builds that were performed in multiple environments, have a different integration result, meaning that there were at least two jobs in which the build execution resulted in a different status. This effect is much more pronounced for Ruby (15.6%) than for Java (2.3%) systems.

The use of multiple integration environments is most helpful for highly diverse ecosystems with many possible build configurations, like Ruby.

How can I apply this to my own projects?

In summary, we found that close to a third of active and popular GitHub projects already use Travis CI. Its use seems to further and further emerge as a best practice, irrespective of the language of your project. Testing is the integral part of CI for most projects, so make sure your tests are also executed on Travis CI. Don’t lightly compare failing build rates among projects in search of the “higher quality project”, however, and absolutely do not compare projects written in different languages: Due to their test-oriented and dynamically-typed nature, we observed Ruby builds to have a higher tendency to fail than Java builds. We highly recommend the use of multiple integration environments for languages with a highly diverse library and ecosystem such as Ruby.

Are you still hungry for more Travis statistics? Read our scientific pre-print in full detail. Agree or disagree with our findings from your personal experience? Leave a comment below! Do you want to do your own research with the data set? Have a look at our freely accessible TravisTorrent!