Description
Hi,
Thank you for developing this amazing tool!
I am trying to run Numbat on two samples from the same patient. My dataset consists of two conditions (diagnosis and MRD) from a patient affected by multiple myeloma (MM).
I would like to reconstruct the clonal evolution and selection of the malignant clones, in particular discover if a clone at diagnosis has been selected by the therapy and now represent the major clone at MRD.
I successfully ran pileup_and_phase.R in multiple sample mode, using only the barcodes of clonal plasma cells (tumor), identified through the CDR3 sequence, from both conditions. The number of tumor cells at diagnosis is 2088, while at MRD, I only have 6 cells. This means my dataset should not contain normal cells.
I then merged both matrix_count and df_allele to run run_numbat() and create the "nb_object". However, I have a few questions:
1) Since my barcodes.tsv file only contains tumor cells, could this cause any issues? I noticed that Clone 1 consists only of normal cells—could I consider it the most abundant clone? Or does Numbat specifically use Clone 1 as the baseline for CNV calling? Should I include healthy plasma cells as well?
2) Should I merge the two conditions before running run_numbat(), or would it be better to keep them separate? (the goal is to create a tree of clonal evolution)
3)i I there a way to keep information about the sample characteristics (diagnosis or MRD) in the "numbat_object" or to discriminate between this two states?
4) Is there a tutorial that guides users through the downstream analysis? (Sorry, I’m a biotech researcher trying to learn bioinformatics!)
Thanks in advance for your help!
Best regards,
Emanuele