Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

C_API for C++ iterator only supports one data and one label for one sample #12141

Open
@squidszyd

Description

@squidszyd

I'm currently implementing a C++ iterator for performance purpose.
My iterator (name it MyIterator) provides one data and multiple labels of different shapes.
These data and labels are stored in the data attribute of DataBatch object. The arrangement of data and labels may look like the following:

databatch.data[0] = data  // in this case only one data is provided (image data)
databatch.data[1] = label1 // the 1st label, of shape shape<dim>(?)
databatch.data[2] = label2 // the 2nd label, of shape shape<dim>(?)
...

However, as I inspecting into the C API code, I found that this API implementation does not take into account of multiple-data and multiple-label situation like the case above. Instead, only the 0-th data (Line767) of DataBatch is taken as data and only the 1-st data (Line745) of DataBatch is taken as label. All the remainders that MyIterator provides just go to some null space. For instance, at the python end, as I call next, an incomplete batch is returned:

my_iter = mx.io.MyIterator(...)
databatch = my_iter.next()
print len(databatch.data)
>>> 1 // the data is preseverd since only one data is provided
print len(databatch.label)
>>> 1 // only one label is preserved !

Should this issue be placed on some "feature request" ? Maybe the API should consider a more extensible implementation by using key-value pairs to store multiple data and labels. For example:

struct DataBatchEx {
    std::vector<std::pair<std::string, NDArray>> data;  // Instead of vector<NDArray>, store data as kv pairs
    std::vector<std::pair<std::string, NDArray>> labels;  // API will explicitly use this attribute to construct label
    std::vector<uint64_t> index;
    std::string extra_data; // may be discarded
    int num_batch_padd;
};

(Nevertheless, I could go hacking by concating all my labels into one and decoded at python end for this moment.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions