A guide to making on-call holidays suck less

on November 29th 2022

December is a time for basking in the glow of a menorah, hanging boughs of holly, and pondering new year resolutions as you sip spiced apple cider and unwind from the chaos of 2022. Unless you’ve been handed the pager and are on-call during the holiday season. 

Balancing holiday cheer and on-call rotations for one is tricky, but take it from me — two pagers under one roof is madness! Before my retirement from SRE, both my partner and I were on-call for mission critical infrastructure and software. Sometimes our rotations synced, and sometimes we’d spend a whole month with either one of us on primary or secondary. 

One winter, the siren call of holiday lights and ice skating beckoned. This was especially momentous for us as he previously had the unfortunate luck to get scheduled for primary on-call during Christmas for the past two years! But for one holiday season, we were able to enjoy a night free from thinking about computers and the spectacular ways in which they can fail. 

It shouldn’t be the norm to depend on your teammates’ goodwill and open schedules to partake in holiday cheer. In fact, if you’re reading this in the beginning of December there’s still time to invest in smooth holiday on-call operations! 

Why holiday on-call sucks: the missteps

Ultimately, it is change in routines to people, processes, and technology for the end of year that contribute to unique stressors of holiday on-call.

People-wise, having most of the company offline means a smaller pool of engineers and support staff are available to troubleshoot any issues that arise. When you’re flying solo, what could have been a quick Slack saying, “Oh yeah that error? We just ignore it” turns into an anxiety-inducing investigation. Odds are your own routine is a bit different and you may face spotty internet connectivity while out on holiday excursions.

Process-wise, many organizations institute a deployment freeze that means no new code, configuration, or infrastructure changes are made during the specified window. This also means there are likely new and unfamiliar steps to the deploy process, such as requiring extra reviewers and getting special approval from leadership.

Technology-wise, deploy freezes reduce the amount of available answers when investigating issues, but bring about their own challenges when thawing out post holiday season. One trick is to ensure your systems are ready to handle atypical traffic patterns at this time of year. For example, you’ll have different requirements to handle peak loads for an e-commerce site versus a reduced load for business tooling, such as business chat or document sharing infrastructure.  

Directives meant to ease the difficulties for your holiday on-call teams actually end up placing them in uncharted waters when something does happen. Reduced access to help in troubleshooting, changed remediation processes, and modified infrastructure behavior all helping to ratchet up their stress levels.

Beginners guide to de-stress holiday on-call

There are many great intentions by organizations trying to implement steps to ensure a smooth, painless, and stress-free holiday on-call experience for their teams. The problem is, most of these intentions contribute to stress levels, which is the exact opposite of the desired outcome. 

I’ve compiled a list of my holiday on-call wishes. This is not a complete list of what each organization should do, but a list to cherry pick from when managers and department heads consider the sacrifices their on-call teams are making. It should be looked at as more than just doing their jobs and acknowledge the sacrifice involving their holiday break. 

Here is my wish list for engineering leaders and managers to consider when implementing holiday on-call schedules:

Holiday on-call wish list:

  • On-call holiday compensation
    • Spot bonuses per shift or a flat rate 
    • Additional time off if paged
  • Check in with your team(s)
    • Ask which holidays are important for each individual. Plenty of holidays of different faiths and around the world are not given time off in the United States (e.g. Rosh Hashanah or Yom Kippur).
    • Stay aware of the geography and potential for adverse weather events affecting on-callers and be prepared with a back-up plan for coverage.
    • Find out who was on holiday on-call the previous year to avoid doubling up their bad luck. You may not remember who was primary, but they sure will!
    • Survey the engineers to identify on-call stressors to address with statements they can rate on a scale from 1-5, such as:
      • I am able to dedicate time to proactively improving on-call during primary.
      • I feel my manager has sufficient insight into my on-call duties.
      • I have been able to get schedule swaps when needed.
      • I am able to decline project work and interviews during primary rotation.
      • I feel confident picking up the pager and going on-call.
      • I am confident when I am paged the alert will be actionable.
      • At the end of my primary rotation week, I felt burned out by the demands of on-call.
  • Listen to the Voice of the Customer
    • Connect with Customer Success and Support to identify any particular customer considerations or concerns.
    • Send proactive messages to customers about what to expect for engaging with support during the holiday.

Engineers can also take steps to help their leaders and managers improve the on-call experience for all. The following things have worked very well for me in Christmases past:

  • Add any travel or unavailable times to your calendar as soon as they’re booked.
  • Aggressively review and mute spammy alerts. A question to consider: “What level of business impact should rise to the occasion of interrupting holiday festivities?” 
  • Run through an on-call checklist
    • Ensure pager notifications are loud, or change them to something unexpected like this barbershop song from PagerDuty singing “the network is doooooown”.
    • Double-check the process for deploying during the code freeze. 
    • Understand how to page your secondary, customer success, and other rotations.
    • Review on-call handoff notes and confirm your scheduled shifts.
  • Use a polling tool to identify which times/days during the holidays are most important to each teammate. Maybe one family does a big New Years brunch while another celebrates solely on Christmas Eve. If everyone can protect their special holiday moments, that is a win-win.
  • Roll a dice to cast spots for who will take what shifts. 
  • Carve up the duration of on-call shifts to spread the load:
    • Half-day split between day/night 
    • 1 day 
    • Weekday vs. weekend (M-F, Sa-Su)

No matter what checklist you follow, what steps you take, or how happy the on-call team is with the final schedule, holiday on-call still sucks. I hope that these insights, ideas, and tips do make your organization’s holiday on-call experience just a little less stressful every year. 

From the entire crew at Chronosphere, we are wishing you a zero downtime, eventless, silent pager, reliable, and stress-free holiday season!

Interested in what we are building?