Skip to content

docs: Add an incident response guide #133567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tstellar
Copy link
Collaborator

This will serve as a guide for adminstrators when handling security incidents on LLVM managed systems.

This will serve as a guide for andminstrators when handling security
incidents on LLVM managed systems.
@tstellar tstellar requested review from asl and kbeyls March 29, 2025 04:39
5. Notify the community for what was done and why
-------------------------------------------------

This should be done in a Discourse post in the LLVM Project category.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also send notification on Discord?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe on discord, just send a link to the discourse thread? IMHO, it is better to only have to manage one thread.

@@ -0,0 +1,87 @@
============================
LLVM Incident Response Guide
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this guide covers Incident Response, for a specific class of Incidents, but not all possible things people might think of as "incidents"?

I'm guessing some might be security incidents, others might be more "accidental outage" kind of incidents?
And this only relates to incidents related to project infrastructure, rather than e.g. a security issue discovered in one of the LLVM projects?

I'm wondering if a more accurate title is possible?
Maybe "LLVM Infrastructure Incident Response Guide"?

malicious or unwanted content that appears on LLVM infrastructure. This includes but
is not limited to: malicious code checked into the GitHub repository, unauthorized access
to LLVM controlled servers, or compromise of community owned resources like buildbots
or GitHub Actions runners.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there are different classes of incidents, and this document only covers a subset of all classes, I think it would be useful to have a pointer here to how other classes of incidents should be handled.
At the least, I think there should be a pointer to https://llvm.org/docs/Security.html#how-to-report-a-security-issue, and a description of when to use that process instead of the one in this document. We could also update https://llvm.org/docs/Security.html to point to this document, with a description of when to use the process documented here.

I'm not sure if there is a class of incidents that are not covered by either process document, but maybe there could be a "fall-back", saying what to do when you have an incident that requires some action that is not covered by either process document?

The purpose of this document is to outline how a project administrator should respond to
malicious or unwanted content that appears on LLVM infrastructure. This includes but
is not limited to: malicious code checked into the GitHub repository, unauthorized access
to LLVM controlled servers, or compromise of community owned resources like buildbots
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to LLVM controlled servers, or compromise of community owned resources like buildbots
to LLVM controlled servers, or compromise of community-owned resources like buildbots

so we want to avoid creating regulations or rules that will slow down or limit their ability to
quickly resolve it. However, we do want to provide some general guidelines for admins
to follow during an incident, mainly to ensure that the problem and the steps taken to
resolve it are being communicated effectively. Here is a checklist admins should follow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resolve it are being communicated effectively. Here is a checklist admins should follow
resolve it are being communicated effectively. Here is a checklist admins should follow

1. Communicate the problem to another admin
-------------------------------------------

It's important to let someone else know what is going on. It can be an email,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's important to let someone else know what is going on. It can be an email,
It's important to let someone else know what is going on. It can be an email,

-------------------------------------------

It's important to let someone else know what is going on. It can be an email,
slack, or Discord message, and you don't have to wait for a response before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
slack, or Discord message, and you don't have to wait for a response before
Slack, or Discord message, and you don't have to wait for a response before

---------------------------------------------------------

For a short-term solution the goal should be to protect the community or users from
being impacted by the incident. An example of a short-term action would be to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
being impacted by the incident. An example of a short-term action would be to
being impacted by the incident. An example of a short-term action would be to

----------------------------------------------------

A single admin can do this on their own as long as they've communicated what they're doing to
someone else. There are cases where waiting for confirmation could leave users or community
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
someone else. There are cases where waiting for confirmation could leave users or community
someone else. There are cases where waiting for confirmation could leave users or community

-------------------------------------------------------

Once the immediate risk has been eliminate, admins should meet together and discuss
a long-term solution. Unlike the short-term solution, this conversation should be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a long-term solution. Unlike the short-term solution, this conversation should be
a long-term solution. Unlike the short-term solution, this conversation should be


Once the immediate risk has been eliminate, admins should meet together and discuss
a long-term solution. Unlike the short-term solution, this conversation should be
done with two or more admins. The discussion could also include key community members
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
done with two or more admins. The discussion could also include key community members
done with two or more admins. The discussion could also include key community members

4. Take action and implement the solution.
5. Notify the community of what was done.
6. Meet with one or more admins to discuss long-term solution.
7. Implement long-term solution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below you say Implement the long-term solution we should probably keep them consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants