Skip to content

Conflicting latent correlation layer implementation and equations from paper #21

Open
@Chrixtar

Description

@Chrixtar

Hello,

if I am not mistaken, you have mixed-up the temporal and the feature dimension in the GRU input of the latent correlation layer.
So, the input data x of the latent_correlation_layer has the shape (batch_size, window_size, number_of_nodes).
Then, the permutated input of the GRU has the shape (number_of_nodes, batch_size, window_size).
Since you do not set the batch_first flag, the PyTorch GRU implementation expects an input of shape (sequence_length, batch_size, input_size) (see documentation ).

Moreover, in the paper you say that you use the last hidden state.
In your code you simply use the output of the GRU, which is the hidden state at every single time step and not just the last one.
If you would actually use the last hidden state and fix the allegedly mixed-up dimensions, you would end up with a single vector of length hidden_size in the PyTorch documentation, which corresponds to your units input, i.e., the number of nodes.
Now, I am wondering about the correct dimensionality of your linear projections W^Q and W^K for the query and key representations.
These are vectors in the current implementation, which would not make much sense in the equations from the paper in combination with a vector R.

My conclusion: To me, it seems like there are major conflicts between the paper and the implementation regarding the latent correlation layer and the formal approach from the paper has issues regarding correct dimensionalities or is formulated inaccurately.

Please correct me, if I made any mistake.

Best regards
Chris

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions