-
Notifications
You must be signed in to change notification settings - Fork 47
Add task metadata to /apps endpoint #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We will reference TaskInfo in a map of container IDs to task info in a later commit.
ContainerTaskRels has a single member, a map of container IDs to TaskInfo references. It has a getter and a setter for easy access threadsafe access to the relationship map, and an update method which populates it from agentState.
The new ContainerTaskRels argument providers access to the task metadata for each container.
We add a new mock object with appropriate metadata and supply it to avroRecord.extract(), checking that the task IDs in the ContainerTaskRels are the same ones that correspond to the container ID.
This is orthogonal to the intent of this PR, however it's a small change and seemed like a quick win.
This reference can be provided to the extract method from the framework collector.
We update the task mapping after every new fetch from agent state.
Now the framework collector has access to the ContainerTaskRels object which belongs to the mesos agent collector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a weird feeling from reading this PR for a couple of reasons
First, I feel like ContainerTaskRels
should be built/instantiated outside of the agent collector and passed in to both the constructors for the agent collector and the framework collector. I can't really put my feeling on it, but it is weird to me to ask for the agent collector's "rels" variable and then give it to the framework collector.
Second, it seems like there's a race condition here. Because the agent collector and framework collector run independently of each other, we can't guarantee that the agent collector runs first before the framework collector does. This suggests that we will get a partitioned time series from time to time. For example:
- framework collector runs, no data in the rels yet
- framework collector produces a time series with blank TaskId and a blank TaskName
- agent collector runs
- framework collector runs, now there is data in the rels
- framework collector produces a new time series for what is essentially the same metric, because it now has a non-empty TaskID and a non-empty TaskName.
I'm not sure how big of a deal it is, but it seems like it could potentially double the number of unique time series coming out of the framework collector. I wonder if it makes sense to not emit a time series from the framework collector if no taskID or taskName could be found?
collectors/mesos/agent/agent.go
Outdated
@@ -53,6 +54,53 @@ type Collector struct { | |||
metricsChan chan producers.MetricsMessage | |||
nodeInfo collectors.NodeInfo | |||
timestamp int64 | |||
|
|||
// ContainerTaskRels defines the relationship between containers and tasks. | |||
type ContainerTaskRels struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably put the typedef and the associated methods into a new file container_task_rels.go
collectors/mesos/agent/agent.go
Outdated
// Get is a utility method which handles the mutex lock and abstracts the inner | ||
// map in ContainerTaskRels away. If no task info is available for the supplied | ||
// key, returns nil. | ||
func (ctr *ContainerTaskRels) Get(key string) *TaskInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is key
a containerID? If so i'd probably rename this to containerID
collectors/mesos/agent/agent.go
Outdated
func (ctr *ContainerTaskRels) Get(key string) *TaskInfo { | ||
ctr.Lock() | ||
defer ctr.Unlock() | ||
if ctr.rels != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of worrying about whether or not rels
had been initialized, I think a constructor method for the type would be a better way of creating instances of this type.
func NewContainerTaskRels() *ContainerTaskResls {
return &ContainerTaskRels{rels: make(map[string]*TaskInfo)}
}
collectors/mesos/agent/agent.go
Outdated
ctr.rels = map[string]*TaskInfo{} | ||
} | ||
ctr.rels[key] = info | ||
ctr.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before you used a defer ctr.Unlock()
. Since the lock/unlock in this method happens at the method boundaries maybe it makes more sense to do it that way here for clarity?
collectors/mesos/agent/agent.go
Outdated
for _, e := range f.Executors { | ||
for _, t := range e.Tasks { | ||
for _, s := range t.Statuses { | ||
rels[s.ContainerStatusInfo.ID.Value] = &t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bug since you're taking the address of a range variable. The golang range construct only allocates the memory once to hold the value, and each iteration overwrites the same location in memory.
So let's assume that you have 2 values in e.Tasks
...
The first iteration over e.Tasks
, go will write the Task
into t
which has memory address M
The second iteration over e.Tasks
, go will write the Task
into t
which also has the same memory address M
.
So when you have multiple tasks, you're taking the address of the same memory location.
What you probably want to do here is force a copy of the variable:
copy := t
rels[s.ContainerStatusInfo.ID.Value] = ©
It should be easy to write a unit test which verifies this behavior (with multiple e.Tasks
).
You can see this in action at https://play.golang.org/p/s7zGgU_Nz6
Alternatively, if it is acceptable and would not impact other things, you could have rels
instead be a map[string]TaskInfo
instead of a map[string]*TaskInfo
, which would force the copy of the range variable and avoid this scenario.
This is a sharp edge of Go that i've skinned my knees on more times than I would like to count 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely follow the logic of what you're saying, but I'm a little puzzled because I can't force this bug to emerge. I am running four identical applications on the same node, each called emitter-N where N is in the range 1-4.
It seems to me that if this bug exists, the metrics results for all four containers should have the same name and task ID. But their task IDs remain distinct, correct and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the loop goes first over executors, and then tasks, this bug will not appear if there is only ever one task per executor, I think. Is it possible that an executor can ever have more than one task?
@@ -288,6 +288,37 @@ func TestGetAgentState(t *testing.T) { | |||
}) | |||
} | |||
|
|||
func TestUpdateContainerRels(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to add multiple tasks to the mockAgentState
here to help prevent a regression of the range variable address bug (if it is a bug :) )
collectors/mesos/agent/agent.go
Outdated
|
||
// Set adds or updates an entry to ContainerTaskRels and, if necessary, | ||
// initiates the inner map. It is only currently used in tests. | ||
func (ctr *ContainerTaskRels) Set(key string, info *TaskInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rename key
in this case to containerID
for readability.
Hi @collinvandyck, thanks for all the feedback! I just wanted to address this bit from your comment above:
Unfortunately that is not possible, because some containers don't have associated tasks. It's not possible to know when dealing with a container ID in isolation whether you expect an associated task. You could simply not return metrics at all if agentState is not populated, but that doesn't fix the problem either - if a freshly launched container emits metrics, we will need to offer them in the API before we know the associated task ID, if one even exists. I don't see any way around this unfortunately. |
@philipnrmn perhaps we should unite the agent and framework polling mechanism so that the agent always polls first and then, once successful, the framework gets polled next, supplying data from the former to the latter? |
@collinvandyck that is unfortunately also not possible; the framework collector does not poll - it receives data from apps emitting statsd messages over UDP (via the mesos module). I don't believe it is possible to force consistency. |
In scripts/test.sh we naively run go get to update golint. A new rule has just been added to golint (golang/lint#319) and I don't want to complicate my current PR (#118). I will revert this commit, fix any outstanding nil-returns and consider pinning golint in a future PR.
Rather than initiating ContainerTaskRels in the framework collector, we now initiate it in dcos-metrics.go and pass a reference into each collector instead.
Hey @collinvandyck , I addressed all your feedback. Would we be OK to merge? |
collectors/mesos/agent/agent.go
Outdated
@@ -95,7 +95,10 @@ func (ctr *ContainerTaskRels) update(as agentState) { | |||
for _, e := range f.Executors { | |||
for _, t := range e.Tasks { | |||
for _, s := range t.Statuses { | |||
rels[s.ContainerStatusInfo.ID.Value] = &t | |||
rels[s.ContainerStatusInfo.ID.Value] = &TaskInfo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
collectors/mesos/agent/agent.go
Outdated
func (ctr *ContainerTaskRels) Get(containerID string) *TaskInfo { | ||
ctr.Lock() | ||
defer ctr.Unlock() | ||
if ctr.rels != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this anymore if you're already constructing the type with a non-nil map, here and otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I guess not. Just disguising my laziness behind a front of belt-and-braces.
Hi @philipnrmn yeah, good to go. one or two minor comments here an there. |
aaaaand... fixed ✨ |
* Export TaskInfo from the mesos agent collector We will reference TaskInfo in a map of container IDs to task info in a later commit. * Add a struct for container->task relationships ContainerTaskRels has a single member, a map of container IDs to TaskInfo references. It has a getter and a setter for easy access threadsafe access to the relationship map, and an update method which populates it from agentState. * Return task_id and task_name in /app output The new ContainerTaskRels argument providers access to the task metadata for each container. * Add tests for extracting a record with task data We add a new mock object with appropriate metadata and supply it to avroRecord.extract(), checking that the task IDs in the ContainerTaskRels are the same ones that correspond to the container ID. * Add tests for executor ID and framework ID This is orthogonal to the intent of this PR, however it's a small change and seemed like a quick win. * Refer to ContainerTaskRels in Framework collector This reference can be provided to the extract method from the framework collector. * Pass ref to ContainerTaskRels into transform * Update task mapping from agent collector We update the task mapping after every new fetch from agent state. * Pass reference to mapping into framework collector Now the framework collector has access to the ContainerTaskRels object which belongs to the mesos agent collector. * Clarify arg name * Defer unlock for consistency * Temporarily skip golint in build In scripts/test.sh we naively run go get to update golint. A new rule has just been added to golint (golang/lint#319) and I don't want to complicate my current PR (#118). I will revert this commit, fix any outstanding nil-returns and consider pinning golint in a future PR. * Create new TaskInfo object to avoid memory issues * Create ContainerTaskRels in dcos-metrics.go Rather than initiating ContainerTaskRels in the framework collector, we now initiate it in dcos-metrics.go and pass a reference into each collector instead. * Initiate ContainerTaskRels with a utility method * Remove unnecessary nil check
* Export TaskInfo from the mesos agent collector We will reference TaskInfo in a map of container IDs to task info in a later commit. * Add a struct for container->task relationships ContainerTaskRels has a single member, a map of container IDs to TaskInfo references. It has a getter and a setter for easy access threadsafe access to the relationship map, and an update method which populates it from agentState. * Return task_id and task_name in /app output The new ContainerTaskRels argument providers access to the task metadata for each container. * Add tests for extracting a record with task data We add a new mock object with appropriate metadata and supply it to avroRecord.extract(), checking that the task IDs in the ContainerTaskRels are the same ones that correspond to the container ID. * Add tests for executor ID and framework ID This is orthogonal to the intent of this PR, however it's a small change and seemed like a quick win. * Refer to ContainerTaskRels in Framework collector This reference can be provided to the extract method from the framework collector. * Pass ref to ContainerTaskRels into transform * Update task mapping from agent collector We update the task mapping after every new fetch from agent state. * Pass reference to mapping into framework collector Now the framework collector has access to the ContainerTaskRels object which belongs to the mesos agent collector. * Clarify arg name * Defer unlock for consistency * Temporarily skip golint in build In scripts/test.sh we naively run go get to update golint. A new rule has just been added to golint (golang/lint#319) and I don't want to complicate my current PR (#118). I will revert this commit, fix any outstanding nil-returns and consider pinning golint in a future PR. * Create new TaskInfo object to avoid memory issues * Create ContainerTaskRels in dcos-metrics.go Rather than initiating ContainerTaskRels in the framework collector, we now initiate it in dcos-metrics.go and pass a reference into each collector instead. * Initiate ContainerTaskRels with a utility method * Remove unnecessary nil check
* Export TaskInfo from the mesos agent collector We will reference TaskInfo in a map of container IDs to task info in a later commit. * Add a struct for container->task relationships ContainerTaskRels has a single member, a map of container IDs to TaskInfo references. It has a getter and a setter for easy access threadsafe access to the relationship map, and an update method which populates it from agentState. * Return task_id and task_name in /app output The new ContainerTaskRels argument providers access to the task metadata for each container. * Add tests for extracting a record with task data We add a new mock object with appropriate metadata and supply it to avroRecord.extract(), checking that the task IDs in the ContainerTaskRels are the same ones that correspond to the container ID. * Add tests for executor ID and framework ID This is orthogonal to the intent of this PR, however it's a small change and seemed like a quick win. * Refer to ContainerTaskRels in Framework collector This reference can be provided to the extract method from the framework collector. * Pass ref to ContainerTaskRels into transform * Update task mapping from agent collector We update the task mapping after every new fetch from agent state. * Pass reference to mapping into framework collector Now the framework collector has access to the ContainerTaskRels object which belongs to the mesos agent collector. * Clarify arg name * Defer unlock for consistency * Temporarily skip golint in build In scripts/test.sh we naively run go get to update golint. A new rule has just been added to golint (golang/lint#319) and I don't want to complicate my current PR (#118). I will revert this commit, fix any outstanding nil-returns and consider pinning golint in a future PR. * Create new TaskInfo object to avoid memory issues * Create ContainerTaskRels in dcos-metrics.go Rather than initiating ContainerTaskRels in the framework collector, we now initiate it in dcos-metrics.go and pass a reference into each collector instead. * Initiate ContainerTaskRels with a utility method * Remove unnecessary nil check
This PR is the successor to #114 . It adds the same data - task ID and task name - to the
/metrics/v0/containers/<container-id>/app
endpoint as #114 added to the/metrics/v0/containers/<container-id>
endpoint.It had to be separated because the two collectors don't share data. Since task IDs are derived from the agent state, adding them to the agent collector was not especially complicated. Adding them to the framework collector was a bit trickier.
In order to do so, we add a mapping between container IDs and references to task info (
ContainerTaskRels
), held on the agent collector. Fromdcos-metrics.go
where both are initiated, we then pass a reference to theContainerTaskRels
object into the framework collector. When the framework collector receives acontainer_id
tag, it looks up the associated task metadata and adds it to the metrics message.