h a l f b a k e r y
Baker Street Irregulars
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
or get an account
OK, so this is a bit technical, and probably of minority interest. Sorry about that. I'll give an overview of current state of the art DNA sequencing, but please see link for something more in-depth.
Biologists want to sequence DNA. Please accept that this is currently done and desirable.
the standard method (Sanger sequencing, or di-deoxy sequencing, or dye-terminator sequencing) is to purify the DNA, then transcribe a section from one particular point in the presence of fluoresently labelled chain terminators. This reaction yields DNA fragments of different lengths, fluorescently labelled depending on their final base.
The final step is to sort these fragments by size, and read off their colours.
Now, the drawback to this is that it only works for a certain distance - as the chains terminate randomly at each base, the number of longer strings is less than shorter ones. Also, the larger fragments are harder to separate. Currently a good read with yield about 1000 bases, with quality (accuracy) falling off towards the end.
My method is proposed with the objective of making the read length longer (ideally as long as possible). Basically I propose replacing the single bases with longer units.
1) Single stranded DNA of the region is generated, perhaps using asymmetric PCR.
2) The sequencing reaction. In the mixture is included all n-mers, where n may be around 3. (there are 4 bases, so this would be 4^3 or 64 different oligonucleotides). Also included are di-deoxy terminator oligos, each with a different fluorophore. I don't know how many of these it is practical to distinguish - if 64 is too many maybe 16 would be OK - giving n=2.
DNA ligase is used to sequentially bind the annealing oligonucleotides onto the primer.
The idea is that while the n-mers are too short to form stable association with the template DNA, the primer is long enough, and so oligos can be sequentially attached to the end.
3) The sequence can be read in the same way as for Sanger sequencing, in a 'sequencing machine'.
The fragments which need to be distinguished differ by n, reducing both the drop-off per base and increasing the length to which resolution may occur. Hopefully this would result in longer, more accurate reads.
As a final thought, I don't know how many different flurophores it is possible to cheaply make and distinguish. I'm hoping the answer is more than 4, but if it isn't as much as 16, all is not lost. In any case, it would be desirable to be able to use the existing sequencing machines, at least for a proto-typing and change-over period.
Suppose we label the terminators with more than one fluorescence.
Using only the four current types in combination, there are 16 (2^4) possibilities (which include labelled with all 4 colours, and labelled with no colours).
So the scheme could easily be tested, and if it worked well, could be improved to give better data. Adding only two extra fluorescent marker types would be sufficient to increase the oligo ('n') by one. Or one extra marker could make the base peak data more solid, since there would be no need to use every theoretical combination (like unlabelled or 4 labels).
from Wikipedia [Loris, Aug 04 2005]
Ligase Based Sequencing
A Variation on the Theme [Cuit_au_Four, Feb 20 2010]
||As far as I understand, Sanger sequencing breaks the DNA at each nucleotide, a different restriction enzyme batch for each of the four bases. That is, every base gets cut, and that's how every base is sequenced. If you want to cut your strand at every 3 base n-mer, your gonna have to use 63 different reactions (not 64, since you can infer the last one and fill in the gaps)
||[Daseva] That;s not how Sanger
sequencing works - it works by
replicating the template strand from a
common starting point, and terminating
synthesis by means of di-deoxy bases.
The proportion of di-deoxy NTPs to
normal dNTPs in the mix determines
likelihood of the reaction terminating at
each step. In the simplest version, four
reactions are run, each one containing
a different ddNTP. Hence, the first
reaction produces a series of fragments,
each ending at one of the "A"s in the
sequence. The second reaction
produces a series ending in C's, etc. In
more modern implementations,
distinguishable fluorophores are
present on the ddNTPs, and the four
reactions are run in one tube. No
||[Loris] This is not a bad idea (I do this
kind of stuff, so I have seen plenty of
bad ideas. Are you in the field?). A few
||1) How do you ensure that the ligating
strand grows from a fixed starting
point, and doesn't 'self prime' in mid-
strand? I presume you've got a long
primer to initiate ligation, but given the
long length of 'free' template, the lower
level of ligation elsewhere may swamp
your signal. (The 'primed' ligation is
effectively a trimolecular reaction,
whilst false ligations requiring two or
more short oligos to bind will be
>tetramolecular, but I don't think the
advantage will be great
(2) There still a
problem in distinguishing >4
fluorophores. ABI (and others) have
several more dyes, but probably not 16
good ones in total. Doubtless this will
be addressed at some point, but it'll be
a limiting factor. You might be able to
use dye combinations, though, to
expand the range.
main limitation on Sanger sequencing (I
think) is not the sequencing reaction
but the gel/capillary resolution. You
will have a considerable advantage if
you're resolving 2-base or 3-base steps
rather than 1-base increments, but I
don't know if you'd double or triple the
That said, it is a
bloody good idea. If you want to
discuss it further, email me (work
address is on my profile
[EDIT] Sorry - I just
noticed that you already suggested dye
combinations in your posting. By the
way, I think you should be thinking
about single-molecule approaches.
||I'm getting clear on this now. Or at least, the thick fog has settled into a fine mist.
||So he wants to use base pairs (he hey!) and/or triplets for site specific termination? Am I not then correct that he will require 63 different reactions, 63 different dyes for the triplets, or 15 different dyes for the pairs? Will this alleviate the length problem at all?
||I've done a few PCRs and recombinations, etc. In a lab, its pretty amazing stuff. I'd love to get into the field, once I get my damn degree!
||Yes, the essence of the idea is to do
Sanger-style reactions, but with two-
(or three- or four...) base steps instead
of single-base steps, and using ligase
to processively add di- (or tri- ....)
nucleotides instead of
The advantage is
that you read two bases at each step so,
given a constant relative gel resolution,
you could read twice as far. The
drawback is the need to have 16
distinguishable dyes (or dye
combinations) for 2-base steps (or 64
for 3-base steps etc), and possible
unwanted reactions and
I'm not sure if
dinucleotides would work well - but I
know that tandemly-annealing
hexanucleotides have been used to
prime conventional sequencing
reactions. You might need to use PNAs
(DNA analogues which anneal more
strongly) instead of short oligos. But
basically a bloody good idea.
||How will you get all your n-mers in solution? Usually, the nucleotides are just fully denatured to yield individual bases which then assemble via the ligase. You can't control denaturing specifically enough to produce only 2 or 3 base long pieces. Or.. am I off again?
||Guess you could always centrifuge or something?
||Ah - I'm not following here. There are
two possible reasons: (a) you're off or
(b) Bidoli pinot grigio. The oligos (2 or
3 or 4 base) would be synthetic, and
they'd dissolve just fine....
||[daseva] Maybe with a comet assay, your smaller bits would separate. Toss in a few ligase-coated quantum dots to hasten their departure, and voila?
||//bloody good idea// from someone who works in the field deserves a bun from me. I barely grasp it though, so forgive me.
||...which reminds me - [+]
||We've been discussing this in the group,
and nobody's sure whether you'd get
ligation happenning elsewhere on the
template (effectively 'non-primed'
concatenation). But you might be OK -
I suspect that ligase needs more than 4
bases to grab hold of, so it probably
wouldn't ligate two 2-mers that were
sitting on the template, but it possibly
*would* ligate a 2-mer to the growing
strand. You'd need to suck it and see,
and you'd want to play around with
different ligases (and eventually
perhaps make some modified ligases to
do the job better).
||Thank you Basepair, I hoped you'd see this.
Yes, I do work in this area.
||Regarding the priming issue, I hope that wouldn't be a problem. If ligase could act on two n-mers bound adjacently at any significant frequency then the idea is sunk. This probably rules out longer extender units. Although optimising the reaction temperature would be useful. One degree can make a dramatic difference to mispriming in PCR, due to the thermal instability of imperfect matches.
||My major reservation about the practicality of this is whether ligase would actually join on such a small oligonucleotide. Particularly a 2 or 3-mer.
Your suggestion of using more strongly binding DNA analogues is a good one.
||I also wonder whether the reaction would be very efficient, as the number of different types of extender unit increase. Thinking about it some more, perhaps we don't actually need all the different n-mer extenders. Using non-standard bases which pair with several bases like inosine might be sufficient. (Maybe only in the middle of an n-mer.)
Then the only combinatorial problem is the terminator units. It may be possible to increase their incorporation despite low concentration by adding extra 'any' bases to the 3' end.
||As regards mis- priming, we know that
ligase will link two consecutively -
annealed 8mers (I thought 6-mers, but
I'm not sure this is proven), so the limit
must be below 8. But I think 2 would
be safe, since that would only give the
ligase 4bp to sit on if it were mis-
priming. For "true" extensions, you've
got unlimited length on the upstream
side. I don't know if ligase will work to
seal a 'nick' that's only 2bp from the 3'
end, but it might. I don't think the
number of different extenders will
impact efficiency - at these short
lengths you should have good
bases like inosine might also help.
There are people here who work on a
range of 'universal' base
can get sufficient read- length, you
don't necessarily need the complete
If you could read every
(say) 10th base, you would create a
sequence profile against which shorter
shotgun reads could be assembled.
The real value in long reads probably
kicks in at >10kb, and you might be
able to do this if you used 10mers (with
specificity only on the last base, hence
4 dyes and 'read one skip ten'
Edit - this of
course would be silly as the 10mers
would mis-primer everywhere. but
there are ways around this, I think.
||Yes, yes, yes, all very well, but does it mean we'll be able to breed dinosaurs from amber and frogs?
||Copro - sadly not. Though we're
getting mammoth and thylacine sorted.
||[Basepair] It seems like you're joking about the mammoth, but now I wonder as it would seem possible to try.
||We (and others) have got some genomic
sequence from mammoth, but only bits.
There'll be a genome project underway
soon to produce a draft sequence.
Same goes for thylacine and cavebear.
Of course, getting a draft sequence is
one thing; getting a decent finished
sequence is another. And then turning
a sequence into a living animal is still
probably 20 years away. But there's no
doubt that it'll be doable within 20-50
years at most.
than mammoth is tricky, though. As far
as we know, there is no DNA in
anything that old, even under
||/*Incidentally, if you can get sufficient read- length, you don't necessarily need the complete sequence. If you could read every (say) 10th base, you would create a sequence profile against which shorter shotgun reads could be assembled. The real value in long reads probably kicks in at >10kb, and you might be able to do this if you used 10mers (with specificity only on the last base, hence 4 dyes and 'read one skip ten' sequencing).*/
Well what about using ten different reactions, using ten different primers, where the i'th primer is one nucleotide longer than the last. That way you could combine ten of your read one skip ten sequences? Perhaps the cost per primer would be reduced that way, because each primer is a subsequence of the longest primer?
||Something like that might work, but then it's getting too
complex to be worthwhile. It would probably be more
economical to just get a long but intermittent sequence
profile, against which to align short reads.
||The newer sequencing technologies can give you short reads
by the million (literally) - all you need is a longer-range
profile against which to assemble them.
||One of the new-generation high-throughput sequencing systems - "SOLiD" - does actually use ligase and degenerate oligos, kind of like I was proposing here. They're gone for parallelism rather than length (as have all the new-generation systems.) You get millions of reads, but in ~35 base fragments. There's also some wierdness involved; they only have 4 fluoresences but relate these to two bases per ligation, presumably for experimental reasons. They do try to claim it as an advantage - 'each base is read twice'. (Of course with their system it has to be, just to find out what it is!)
||Just out of curiosity, Loris, are you in the field?