Travis

Service outages, 9-14 July 2016

Joe Corcoran,

On Saturday 9 July, some users experienced a service outage on travis-ci.org. This consisted of a three hour period, beginning at approximately 14:30 UTC, in which some builds were not processed. This was followed by five hours of slowed scheduling performance as we cleared the backlog. By 23:04 UTC, the incident was fully resolved.

A very similar, slightly shorter outage also occurred on Thursday, 14 July, between approximately 16:00 and 02:30 UTC.

We know our users rely on Travis CI to be running smoothly at all times and we are very sorry for the trouble these outages will have caused.

What happened?

We are currently rolling out a rewritten version of the application that handles the data we receive from GitHub webhooks. Our rollout process means that, in theory, the number of users exposed to the rewritten application increases in small increments over time, while we closely monitor performance.

Sadly, this application left some database records in an unexpected state. This had severe consequences for another application in our system, which is responsible for scheduling the jobs for each build. As this application struggled to parse the bad data, other jobs were left waiting in a backed-up queue.

In between the two outages we found what we thought was the cause and released an update. After a couple of days without a repeat occurrence, we assumed that everything was fine. As it turned out, our initial diagnosis was incorrect.

Next steps

We are currently taking a number of steps to prevent this from repeating. Firstly, the rollout of the rewritten application was immediately paused. The rollout continues this week, after we are confident that we have made its handling of database records more robust.

Secondly, the job scheduling application has already been patched in order to ensure that it does not struggle under these specific circumstances again. We are also exploring different job scheduling strategies as a result of this investigation.

Lastly, but perhaps most importantly, we are looking into how we can be more promptly alerted to these kinds of issues. It took around three hours for us to tackle this outage in the first instance, mostly because it had evaded our alert system. We don’t ever want that to happen again.

Conclusion

These outages will have made things difficult for a lot of our users and we want to reiterate our apologies for the stress this will have caused. We are fully committed to always improving the performance and reliability of Travis CI.

Thanks for your understanding!


Welcome Renée!

Lisa Passing,

Renée
Renée skiing north of the wall

Again we're super excited to introduce a new team member! Clap your hands and welcome Renée!

The story of Renée and Travis CI goes way back to 2011. Our own Sven and Josh went on a promotion tour for the Travis Love Campaign and first met Renée at Conferencia Rails 2011. Now that she's joining the team it's time for you to get to know her as well.

Renée is a world traveller having eaten a bug and been lost at least once on every continent except Antarctica. But traveling doesn't stop on this very planet!

Her favourite (childhood?) movie is The Last Starfighter and it was soon clear that she would try to join the Star League to defend the Frontier from Xur and the Ko-Dan Armada. But that sadly didn't work out. Instead she deepened her knowledge on space watching every Star Trek series except The-Series-That-Shall-Not-Be-Named (apparently fellow Trekkies know what she's talking about?).

Travelling aside, Renée has been a skiing pro since she was 2 and has collected experience as a ski racer, instructor, and ski patrol person over the years. Impressive! Her next item on the list is taking a year off and living the life of a global ski bum.

Until then she will help awesome-izing our backend. You can follow her as @gigglegirl4e on Twitter.

Welcome, Renée!


Security Advisory: Encrypted Environment Variables

Henrik Hodne's Gravatar Henrik Hodne,

We've had a feature for a while that allows you to encrypt environment variables in your .travis.yml as a way to include credentials that can be used in your builds without making them readable by everyone with access to your .travis.yml.

Originally these variables weren't available in pull request builds, since anyone could submit a pull request against the repository and print out the variables. Later we changed this a bit to allow encrypted environment variables in pull requests from branches on the same repository. If someone has access to push to branches they already have access to create builds that can see the encrypted environment variables.

About 2 months ago, on April 11th, 2016, we were alerted about an issue with the way we determined if a pull request was coming from the same repository or a fork. Due to the way GitHub stores forks internally, there was a way to make a pull request with commits from a fork, but make it look like it came from the main repository.

This then meant that you could fork a repository, make a commit that reveal the encrypted environment variables, submit a pull request and thereby get access to encrypted environment variables.

As of April 14th, 2016, we had a patch deployed to production that no longer made this exploit possible. We then ran a query against our database to find pull request that had used this exploit, but found no evidence that it had been used to gain access to encrypted environment variables.

We would like to thank ChALkeR for responsibly disclosing this to us and for the help in getting this resolved.

Technical details

In order to explain how this exploit worked, we're going to use two example repositories: henrikhodne/test-project-1 and travis-repos/test-project-1. henrikhodne/test-project-1 is a fork of travis-repos/test-project-1. It's also useful to know some pull request terms to understand this: The base of the pull request is the "target" of the pull request, where you want it to be merged into. This would often be master on an upstream repository, but could be any branch. The head of the pull request is the "source" of the pull request. Usually this would be a feature branch, but it could be anything that's "commitish", which we'll get back to.

When you fork a repository on GitHub, instead of copying the entire repository they save space by sharing the commit data between the repositories. This has the perhaps unexpected side effect of making commits made to forks available through the upstream repository as well, although you wouldn't get them as part of a clone. For example, I pushed a commit 6e940c3 to henrikhodne/test-project-1, which you can see at https://github.com/henrikhodne/test-project-1/commit/6e940c3. But you can also go to https://github.com/travis-repos/test-project-1/commit/6e940c3 and see the same commit.

Since the head of a pull request can be anything commitish (which is a term used often in Git documentation to mean anything that can resolve to a commit reference, including commit SHAs (6e940c3), branch names (new-cool-feature) and more advanced things like master@{yesterday}), you can then create a pull request on travis-repos/test-project-1 that tries to merge travis-repos/test-project-1@6e940c3 into travis-repos/test-project-1@master. If you look at the pull request with the GitHub API (which is what Travis CI uses), the API reports the "head commit" as being a part of the main repository, which then causes Travis CI to include encrypted environment variables.

We've now worked around this issue by only allowing access to encrypted environment variables to pull requests where the head reference is also on the list of known branches for the repository, so commit SHAs and other non-branch references would no longer get access to encrypted environment variables.