You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pi0 is great work and I have seen great results on real devices. I want to explore the model attention and visualize it to give me a better understanding of Pi0. I used the simulation evaluation environment provided by RoboTwin to test and returned the attention score in Pi0. I saw some nice results, but maybe my parameters were not adjusted correctly and the focus was not focused well, but scattered around the paddings. Taking the task of grabbing a bottle as an example, here are some of my visualization results:
Here is some code where I process the attention score matrix:
Where attn_maps is returned from gemma.py layer by layer. The shape of attn_maps is (10, 18, 1, 8, 51, 867), which represent the diffusion time step, number of layers, batch, head, suffix and all tokens respectively.
The original probs is this:
Does anyone have similar experience in visualizing Pi0 attention scores? What are the problems with my processing? And how should the number of visualization layers be adjusted? I hope someone can help me, I will be very grateful!!
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Pi0 is great work and I have seen great results on real devices. I want to explore the model attention and visualize it to give me a better understanding of Pi0. I used the simulation evaluation environment provided by RoboTwin to test and returned the attention score in Pi0. I saw some nice results, but maybe my parameters were not adjusted correctly and the focus was not focused well, but scattered around the paddings. Taking the task of grabbing a bottle as an example, here are some of my visualization results:
Here is some code where I process the attention score matrix:
Where attn_maps is returned from gemma.py layer by layer. The shape of attn_maps is (10, 18, 1, 8, 51, 867), which represent the diffusion time step, number of layers, batch, head, suffix and all tokens respectively.
The original probs is this:
Does anyone have similar experience in visualizing Pi0 attention scores? What are the problems with my processing? And how should the number of visualization layers be adjusted? I hope someone can help me, I will be very grateful!!
The text was updated successfully, but these errors were encountered: