Generate simulated DNA sequences for regulatory genomics problems
Author kchu25
3 Stars
Updated Last
1 Year Ago
Started In
June 2022


Stable Dev Build Status Coverage

Yes, we can generate synthetic DNA sequence motifs datasets in the following way -- i.i.d. background, and a profile that represents a motif that corresponds to a product multinomial (i.e., PWMs) -- and then plant a realization of that profile at some randomly chosen position for each generated background sequence. But this motif problem is way too easy to tackle. How about we simulate the motif as a mixture of profiles, where each profile may share some identical patterns (i.e., overlaps)? Moreover, what if a motif has a blocked structure such that variable spacings exist between each two adajacent blocks (i.e., gaps)? Maybe let's simulate a mixture of blocked-structured profiles as our ground truth motif? This package creates such patterns.

Basic examples

Coming soon

Undetectable patterns

Coming soon