A misadventure with Terraform Sets & PagerDuty Schedules
How Terraform's setunion() disregards ordering.

I'm a developer from Vancouver, BC who's had an interesting journey in tech starting from support, through cloud infrastructure and project management. Currently I work as an SRE at lightstep helping build and "operationalize" things that helps to guide others towards better o11y :)
"T, why didn't I get this page?" π€¨
"Wait, why does it show that <other_person> is on call? They just did it the other week." π§
Are two phrases that you don't want to hear after making changes to your PagerDuty schedules terraform.
Intro
In the last couple of weeks, I've been leading the efforts to on-board 3 new engineers to our on-call rotation. As part of that work, one of the tasks is to get those engineers added to PagerDuty(PD), the app we use for managing on-call shifts and alerting. While this can easily be done in the PD UI, we implement these changes via Terraform so that it's documented, codified, and tracked via version control. Also because it adds another layer of auditability.
Some key concepts for working with Pagerduty:

- A
scheduledetermines the WHO, and WHEN. (Who will be in the rotation, how long the rotation will be, and when the rotation starts).

- An
escalation policydetermines the ordering/logic for which schedules get paged.

- A
serviceis what represents your service (or system) and will be linked to anescalation policy.

So from the top:
When a
servicehas an alert, PD will look at theescalation policy.Based on the
escalation policyand the current situation (i.e. first alert, first loop), PD will notify the appropriateschedule
You can see a full gist of the old code here.
An important note for this example is that my team is actually considered a subteam (A) that shares its pager with subteam (B)
Before
Prior to this work, I had originally set my schedule up as follows:

I also had each person's membership in a PagerDuty team like this:

Given that:
I was specifying an association to a user twice AND
Creating a new resource for each team membership; I wondered if I could refactor this.
Enter, the Good Idea Fairy π§πΌ
Since my last brush with Terraform, I'd like to think I'd gotten better with it - especially with the use of for_each statements. So when looking at a solution to this "problem" - I thought:
Why not just create a
locals.memberslist with all the users, and then use that as (1) the members for thescheduleand (2) to have a single statement to create theteam_membershipsvia a for_each?In FACT! Since we have two subteams, I could create two lists and simply combine them!
After
This is what I ended up with after refactoring and thinking what I thought were good changes. Gist.
I thought I was pretty slick by doing the following:
- Setting up the list of teammates in a local variable.
locals {
my_team_subteam_a_members = toset([
pagerduty_user.thilina_ratnayake.id,
pagerduty_user.teammate_b.id,
pagerduty_user.teammate_c.id,
])
my_team_subteam_b_members = toset([
pagerduty_user.teammate_a.id,
pagerduty_user.teammate_d.id
])
}
- Use the list from above with
setunion()to combine both subteams A and B.
resource "pagerduty_schedule" "myteam_schedule" {
name = "My Team"
time_zone = "America/Los_Angeles"
description = "PD Schedule for My Team, Slack #my-team, Email: my-team@company.com"
layer {
name = "weekday"
rotation_turn_length_seconds = 1209600
rotation_virtual_start = "2023-01-1T09:00:00-08:00"
start = "2023-01-1T09:00:00-08:00"
users = setunion(local.my_team_subteam_a_members, local.my_team_subteam_b_members)
}
}
- Iterate through the memberships for each subteam.
resource "pagerduty_team_membership" "my_team_subteam_a_members" {
for_each = local.my_team_subteam_a_members
user_id = each.value
team_id = pagerduty_team.my_team_subteam_a.id
}
resource "pagerduty_team_membership" "my_team_subteam_b_members" {
for_each = local.my_team_subteam_b_members
user_id = each.value
team_id = pagerduty_team.my_team_subteam_b.id
}
Except, I wasn't. Because this didn't go as planned - and the day after I made the changes we noticed that the PagerDuty schedules were completely off.
The Reason
In a schedule, ordering matters.
Before, we had specified the ordering and had that ordering based on a start date. That meant that after every interval (rotation), the next person would be in the hot seat to carry the pager.
However, when we did:
users = setunion(local.my_team_subteam_a_members, local.my_team_subteam_b_members)
This ended up doing a union of the sets, which completely changes & disregards the order. In fact, that's actually specified in the documentation that I missed π€¦π½ββοΈ:
> setunion(["a", "b"], ["b", "c"], ["d"])
[
"d",
"b",
"c",
"a",
]
The given arguments are converted to sets, so the result is also a set and the ordering of the given elements is not preserved.
By doing a setunion on the locals.my_team_subteam_a_members and locals.my_team_subteam_b_members - the ordering was completely disregarded which led to PagerDuty setting up someone that wasn't scheduled as the person on-call for the rotation
Conclusion
While it's great to be DRY and avoid the repetition of values - that shouldn't get in the way of functionality. With regards to Terraform:
If ordering matters in a list, don't use
setunion()Especially if you're setting up a PagerDuty schedule, just "hardcode" / manually specify the rotation order.





