-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
During the generation of sequence to k-mers matrix, A, each process keeps track of unique k-mers local to it. In order to generate S from these, processes have to communicate these to figure out the global set of unique k-mers.
The way this is done in the code is as follows.
- Every process creates a boolean array of the size |Alph|k, where |Alph| is the size of the alphabet. For proteins, it's 25k.
- This array serves as the process-local unique k-mer ID list.
- Once, each process has found its list of k-mers, it participates in an
MPI_AllreductionusingMPI_LOR. - This results in a globally unique k-mer list.
Currently, the code relies on a single MPI_Allreduce, which limits the size of the boolean array to be less than 231. This works for k=6 with proteins but will fail for anything above that for proteins as the alphabet size is 25.
The solution to this would be to use multiple MPI_Allreduce calls over parts of the boolean array.
Metadata
Metadata
Assignees
Labels
No labels