Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Where life irritates science.

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.



Ligase sequencing

A new method for sequencing DNA
  [vote for,

OK, so this is a bit technical, and probably of minority interest. Sorry about that. I'll give an overview of current state of the art DNA sequencing, but please see link for something more in-depth.

Biologists want to sequence DNA. Please accept that this is currently done and desirable.

Currently the standard method (Sanger sequencing, or di-deoxy sequencing, or dye-terminator sequencing) is to purify the DNA, then transcribe a section from one particular point in the presence of fluoresently labelled chain terminators. This reaction yields DNA fragments of different lengths, fluorescently labelled depending on their final base.
The final step is to sort these fragments by size, and read off their colours.

Now, the drawback to this is that it only works for a certain distance - as the chains terminate randomly at each base, the number of longer strings is less than shorter ones. Also, the larger fragments are harder to separate. Currently a good read with yield about 1000 bases, with quality (accuracy) falling off towards the end.

My method is proposed with the objective of making the read length longer (ideally as long as possible). Basically I propose replacing the single bases with longer units.

1) Single stranded DNA of the region is generated, perhaps using asymmetric PCR.

2) The sequencing reaction. In the mixture is included all n-mers, where n may be around 3. (there are 4 bases, so this would be 4^3 or 64 different oligonucleotides). Also included are di-deoxy terminator oligos, each with a different fluorophore. I don't know how many of these it is practical to distinguish - if 64 is too many maybe 16 would be OK - giving n=2.
DNA ligase is used to sequentially bind the annealing oligonucleotides onto the primer.
The idea is that while the n-mers are too short to form stable association with the template DNA, the primer is long enough, and so oligos can be sequentially attached to the end.

3) The sequence can be read in the same way as for Sanger sequencing, in a 'sequencing machine'.
The fragments which need to be distinguished differ by n, reducing both the drop-off per base and increasing the length to which resolution may occur. Hopefully this would result in longer, more accurate reads.

As a final thought, I don't know how many different flurophores it is possible to cheaply make and distinguish. I'm hoping the answer is more than 4, but if it isn't as much as 16, all is not lost. In any case, it would be desirable to be able to use the existing sequencing machines, at least for a proto-typing and change-over period.
Suppose we label the terminators with more than one fluorescence.
Using only the four current types in combination, there are 16 (2^4) possibilities (which include labelled with all 4 colours, and labelled with no colours).

So the scheme could easily be tested, and if it worked well, could be improved to give better data. Adding only two extra fluorescent marker types would be sufficient to increase the oligo ('n') by one. Or one extra marker could make the base peak data more solid, since there would be no need to use every theoretical combination (like unlabelled or 4 labels).

Loris, Aug 04 2005

(?) Sequencing http://en.wikipedia.org/wiki/Sequencing
from Wikipedia [Loris, Aug 04 2005]

Ligase Based Sequencing http://www.sfu.ca/~omp/stuff/lbs.pdf
A Variation on the Theme [Cuit_au_Four, Feb 20 2010]


       As far as I understand, Sanger sequencing breaks the DNA at each nucleotide, a different restriction enzyme batch for each of the four bases. That is, every base gets cut, and that's how every base is sequenced. If you want to cut your strand at every 3 base n-mer, your gonna have to use 63 different reactions (not 64, since you can infer the last one and fill in the gaps)
daseva, Aug 04 2005

       [Daseva] That;s not how Sanger sequencing works - it works by replicating the template strand from a common starting point, and terminating synthesis by means of di-deoxy bases. The proportion of di-deoxy NTPs to normal dNTPs in the mix determines the likelihood of the reaction terminating at each step. In the simplest version, four reactions are run, each one containing a different ddNTP. Hence, the first reaction produces a series of fragments, each ending at one of the "A"s in the sequence. The second reaction produces a series ending in C's, etc. In more modern implementations, distinguishable fluorophores are present on the ddNTPs, and the four reactions are run in one tube. No restriction enzymes.
Basepair, Aug 04 2005

       [Loris] This is not a bad idea (I do this kind of stuff, so I have seen plenty of bad ideas. Are you in the field?). A few potential problems:   

       1) How do you ensure that the ligating strand grows from a fixed starting point, and doesn't 'self prime' in mid- strand? I presume you've got a long primer to initiate ligation, but given the long length of 'free' template, the lower level of ligation elsewhere may swamp your signal. (The 'primed' ligation is effectively a trimolecular reaction, whilst false ligations requiring two or more short oligos to bind will be >tetramolecular, but I don't think the advantage will be great enough.)

(2) There still a problem in distinguishing >4 fluorophores. ABI (and others) have several more dyes, but probably not 16 good ones in total. Doubtless this will be addressed at some point, but it'll be a limiting factor. You might be able to use dye combinations, though, to expand the range.

(3) The main limitation on Sanger sequencing (I think) is not the sequencing reaction but the gel/capillary resolution. You will have a considerable advantage if you're resolving 2-base or 3-base steps rather than 1-base increments, but I don't know if you'd double or triple the read lengt.

That said, it is a bloody good idea. If you want to discuss it further, email me (work address is on my profile page).

[EDIT] Sorry - I just noticed that you already suggested dye combinations in your posting. By the way, I think you should be thinking about single-molecule approaches.
Basepair, Aug 04 2005

       I'm getting clear on this now. Or at least, the thick fog has settled into a fine mist.   

       So he wants to use base pairs (he hey!) and/or triplets for site specific termination? Am I not then correct that he will require 63 different reactions, 63 different dyes for the triplets, or 15 different dyes for the pairs? Will this alleviate the length problem at all?   

       I've done a few PCRs and recombinations, etc. In a lab, its pretty amazing stuff. I'd love to get into the field, once I get my damn degree!
daseva, Aug 04 2005

       Yes, the essence of the idea is to do Sanger-style reactions, but with two- (or three- or four...) base steps instead of single-base steps, and using ligase to processively add di- (or tri- ....) nucleotides instead of polymerase.

The advantage is that you read two bases at each step so, given a constant relative gel resolution, you could read twice as far. The drawback is the need to have 16 distinguishable dyes (or dye combinations) for 2-base steps (or 64 for 3-base steps etc), and possible unwanted reactions and noise.

I'm not sure if dinucleotides would work well - but I know that tandemly-annealing hexanucleotides have been used to prime conventional sequencing reactions. You might need to use PNAs (DNA analogues which anneal more strongly) instead of short oligos. But basically a bloody good idea.
Basepair, Aug 04 2005

       How will you get all your n-mers in solution? Usually, the nucleotides are just fully denatured to yield individual bases which then assemble via the ligase. You can't control denaturing specifically enough to produce only 2 or 3 base long pieces. Or.. am I off again?   

       Guess you could always centrifuge or something?
daseva, Aug 04 2005

       Ah - I'm not following here. There are two possible reasons: (a) you're off or (b) Bidoli pinot grigio. The oligos (2 or 3 or 4 base) would be synthetic, and they'd dissolve just fine....
Basepair, Aug 04 2005

       [daseva] Maybe with a comet assay, your smaller bits would separate. Toss in a few ligase-coated quantum dots to hasten their departure, and voila?
reensure, Aug 04 2005

       //bloody good idea// from someone who works in the field deserves a bun from me. I barely grasp it though, so forgive me.
Zimmy, Aug 04 2005

       ...which reminds me - [+]
Basepair, Aug 05 2005

       We've been discussing this in the group, and nobody's sure whether you'd get ligation happenning elsewhere on the template (effectively 'non-primed' concatenation). But you might be OK - I suspect that ligase needs more than 4 bases to grab hold of, so it probably wouldn't ligate two 2-mers that were sitting on the template, but it possibly *would* ligate a 2-mer to the growing strand. You'd need to suck it and see, and you'd want to play around with different ligases (and eventually perhaps make some modified ligases to do the job better).
Basepair, Aug 05 2005

       Thank you Basepair, I hoped you'd see this.
Yes, I do work in this area.

       Regarding the priming issue, I hope that wouldn't be a problem. If ligase could act on two n-mers bound adjacently at any significant frequency then the idea is sunk. This probably rules out longer extender units. Although optimising the reaction temperature would be useful. One degree can make a dramatic difference to mispriming in PCR, due to the thermal instability of imperfect matches.   

       My major reservation about the practicality of this is whether ligase would actually join on such a small oligonucleotide. Particularly a 2 or 3-mer.
Your suggestion of using more strongly binding DNA analogues is a good one.

       I also wonder whether the reaction would be very efficient, as the number of different types of extender unit increase. Thinking about it some more, perhaps we don't actually need all the different n-mer extenders. Using non-standard bases which pair with several bases like inosine might be sufficient. (Maybe only in the middle of an n-mer.)
Then the only combinatorial problem is the terminator units. It may be possible to increase their incorporation despite low concentration by adding extra 'any' bases to the 3' end.
Loris, Aug 05 2005

       As regards mis- priming, we know that ligase will link two consecutively - annealed 8mers (I thought 6-mers, but I'm not sure this is proven), so the limit must be below 8. But I think 2 would be safe, since that would only give the ligase 4bp to sit on if it were mis- priming. For "true" extensions, you've got unlimited length on the upstream side. I don't know if ligase will work to seal a 'nick' that's only 2bp from the 3' end, but it might. I don't think the number of different extenders will impact efficiency - at these short lengths you should have good specificity.

Using generic bases like inosine might also help. There are people here who work on a range of 'universal' base analogues.

Incidentally, if you can get sufficient read- length, you don't necessarily need the complete sequence. If you could read every (say) 10th base, you would create a sequence profile against which shorter shotgun reads could be assembled. The real value in long reads probably kicks in at >10kb, and you might be able to do this if you used 10mers (with specificity only on the last base, hence 4 dyes and 'read one skip ten' sequencing).

Edit - this of course would be silly as the 10mers would mis-primer everywhere. but there are ways around this, I think.
Basepair, Aug 05 2005

       Yes, yes, yes, all very well, but does it mean we'll be able to breed dinosaurs from amber and frogs?
coprocephalous, Aug 05 2005

       Copro - sadly not. Though we're getting mammoth and thylacine sorted.
Basepair, Aug 05 2005

       [Basepair] It seems like you're joking about the mammoth, but now I wonder as it would seem possible to try.
Zimmy, Aug 05 2005

       We (and others) have got some genomic sequence from mammoth, but only bits. There'll be a genome project underway soon to produce a draft sequence. Same goes for thylacine and cavebear. Of course, getting a draft sequence is one thing; getting a decent finished sequence is another. And then turning a sequence into a living animal is still probably 20 years away. But there's no doubt that it'll be doable within 20-50 years at most.

Anything older than mammoth is tricky, though. As far as we know, there is no DNA in anything that old, even under favourable conditions.
Basepair, Aug 05 2005

       Cool idea. (+)   

       /*Incidentally, if you can get sufficient read- length, you don't necessarily need the complete sequence. If you could read every (say) 10th base, you would create a sequence profile against which shorter shotgun reads could be assembled. The real value in long reads probably kicks in at >10kb, and you might be able to do this if you used 10mers (with specificity only on the last base, hence 4 dyes and 'read one skip ten' sequencing).*/   

       Well what about using ten different reactions, using ten different primers, where the i'th primer is one nucleotide longer than the last. That way you could combine ten of your read one skip ten sequences? Perhaps the cost per primer would be reduced that way, because each primer is a subsequence of the longest primer?
Cuit_au_Four, Feb 20 2010

       Something like that might work, but then it's getting too complex to be worthwhile. It would probably be more economical to just get a long but intermittent sequence profile, against which to align short reads.   

       The newer sequencing technologies can give you short reads by the million (literally) - all you need is a longer-range profile against which to assemble them.
MaxwellBuchanan, Feb 20 2010

       One of the new-generation high-throughput sequencing systems - "SOLiD" - does actually use ligase and degenerate oligos, kind of like I was proposing here. They're gone for parallelism rather than length (as have all the new-generation systems.) You get millions of reads, but in ~35 base fragments. There's also some wierdness involved; they only have 4 fluoresences but relate these to two bases per ligation, presumably for experimental reasons. They do try to claim it as an advantage - 'each base is read twice'. (Of course with their system it has to be, just to find out what it is!)
Loris, Feb 20 2010

       Just out of curiosity, Loris, are you in the field?
MaxwellBuchanan, Feb 20 2010


back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle