Questions about the paper

Thank you to the authors for providing such meaningful work to the motion generation community. After reading the paper, I have two questions:

1. I noticed that the FID metric on the HumanML3D dataset converged to 0.10, but recent works like  [MoMask](https://arxiv.org/abs/2312.00063) and [LaMP,](https://arxiv.org/abs/2410.07093) which use Mask Transformers, have achieved better results, with FID as low as 0.03. I would like to know the authors' thoughts on the strengths and weaknesses of these two approaches (Autoregressive & Mask Transformer).

2. Recent research (https://openreview.net/forum?id=UxzKcIZedp, https://openreview.net/forum?id=Oh8MuCacJW) has discussed dataset differences. The first paper analyzes the dataset gap between InterX and HumanML3D, while the second paper's rebuttal reported that training on Motion-X and testing on HumanML3D did not yield good results. I am concerned that multi-dataset training may introduce potential issues. What are the authors' views on this potential problem?

Thanks in advance for your answer!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions about the paper #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about the paper #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions