-
Notifications
You must be signed in to change notification settings - Fork 13.5k
docs: Add an incident response guide #133567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This will serve as a guide for andminstrators when handling security incidents on LLVM managed systems.
5. Notify the community for what was done and why | ||
------------------------------------------------- | ||
|
||
This should be done in a Discourse post in the LLVM Project category. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also send notification on Discord?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe on discord, just send a link to the discourse thread? IMHO, it is better to only have to manage one thread.
@@ -0,0 +1,87 @@ | |||
============================ | |||
LLVM Incident Response Guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this guide covers Incident Response, for a specific class of Incidents, but not all possible things people might think of as "incidents"?
I'm guessing some might be security incidents, others might be more "accidental outage" kind of incidents?
And this only relates to incidents related to project infrastructure, rather than e.g. a security issue discovered in one of the LLVM projects?
I'm wondering if a more accurate title is possible?
Maybe "LLVM Infrastructure Incident Response Guide"?
malicious or unwanted content that appears on LLVM infrastructure. This includes but | ||
is not limited to: malicious code checked into the GitHub repository, unauthorized access | ||
to LLVM controlled servers, or compromise of community owned resources like buildbots | ||
or GitHub Actions runners. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given there are different classes of incidents, and this document only covers a subset of all classes, I think it would be useful to have a pointer here to how other classes of incidents should be handled.
At the least, I think there should be a pointer to https://llvm.org/docs/Security.html#how-to-report-a-security-issue, and a description of when to use that process instead of the one in this document. We could also update https://llvm.org/docs/Security.html to point to this document, with a description of when to use the process documented here.
I'm not sure if there is a class of incidents that are not covered by either process document, but maybe there could be a "fall-back", saying what to do when you have an incident that requires some action that is not covered by either process document?
The purpose of this document is to outline how a project administrator should respond to | ||
malicious or unwanted content that appears on LLVM infrastructure. This includes but | ||
is not limited to: malicious code checked into the GitHub repository, unauthorized access | ||
to LLVM controlled servers, or compromise of community owned resources like buildbots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to LLVM controlled servers, or compromise of community owned resources like buildbots | |
to LLVM controlled servers, or compromise of community-owned resources like buildbots |
so we want to avoid creating regulations or rules that will slow down or limit their ability to | ||
quickly resolve it. However, we do want to provide some general guidelines for admins | ||
to follow during an incident, mainly to ensure that the problem and the steps taken to | ||
resolve it are being communicated effectively. Here is a checklist admins should follow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve it are being communicated effectively. Here is a checklist admins should follow | |
resolve it are being communicated effectively. Here is a checklist admins should follow |
1. Communicate the problem to another admin | ||
------------------------------------------- | ||
|
||
It's important to let someone else know what is going on. It can be an email, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to let someone else know what is going on. It can be an email, | |
It's important to let someone else know what is going on. It can be an email, |
------------------------------------------- | ||
|
||
It's important to let someone else know what is going on. It can be an email, | ||
slack, or Discord message, and you don't have to wait for a response before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slack, or Discord message, and you don't have to wait for a response before | |
Slack, or Discord message, and you don't have to wait for a response before |
--------------------------------------------------------- | ||
|
||
For a short-term solution the goal should be to protect the community or users from | ||
being impacted by the incident. An example of a short-term action would be to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
being impacted by the incident. An example of a short-term action would be to | |
being impacted by the incident. An example of a short-term action would be to |
---------------------------------------------------- | ||
|
||
A single admin can do this on their own as long as they've communicated what they're doing to | ||
someone else. There are cases where waiting for confirmation could leave users or community |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
someone else. There are cases where waiting for confirmation could leave users or community | |
someone else. There are cases where waiting for confirmation could leave users or community |
------------------------------------------------------- | ||
|
||
Once the immediate risk has been eliminate, admins should meet together and discuss | ||
a long-term solution. Unlike the short-term solution, this conversation should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a long-term solution. Unlike the short-term solution, this conversation should be | |
a long-term solution. Unlike the short-term solution, this conversation should be |
|
||
Once the immediate risk has been eliminate, admins should meet together and discuss | ||
a long-term solution. Unlike the short-term solution, this conversation should be | ||
done with two or more admins. The discussion could also include key community members |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done with two or more admins. The discussion could also include key community members | |
done with two or more admins. The discussion could also include key community members |
4. Take action and implement the solution. | ||
5. Notify the community of what was done. | ||
6. Meet with one or more admins to discuss long-term solution. | ||
7. Implement long-term solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below you say Implement the long-term solution
we should probably keep them consistent.
This will serve as a guide for adminstrators when handling security incidents on LLVM managed systems.