Saturday, June 26, 2010

Switching Roles

At Mather House, where I'm staying for the summer, there exists a common room, equipped with a TV, couches, kitchen, table, and ping pong table. Often I had come back late at night after work, hoping to get in a game, but for some reason my card never got me in. I finally found an alternate route on Monday, and got in a couple of good games. Up till now, because of my schedule and my somewhat reserved nature, I had not met many people other than my immediate neighbors, and I didnt know anyone very well. It was nice, then, to meet these people, and I've since gotten into the habit of spending an hour or longer in the lounge after getting back from work. This means I sleep daily after 2, and wake up at 6 or 7, but that's not too much of a loss, and most of it is self inflicted time wasting anyway.

Anyway, one day, relaxing in the common room after a few games of ping pong, I met a student who happened to be working in the same lab as Fiona Wood, one of Lily's roommates for the upcoming year. When I told him that I was doing bioengineering, he asked if I had heard of biocomputing. As it happens, I did know about the field, but this was a huge fluke: as far as I know, there are only a few good books currently out about this work as a whole, and I had come across one of them accidentally. More than anything else, though, I was surprised he had heard of the field, given that he was an EE major. I asked him about it, and he told me that went to USC, and worked with Leonard Adleman during the year.

I'm not sure how many people have heard of Adleman before. Those who have, though, almost exclusively associate him with RSA, the encryption algorithm used extensively today (this algorithm is actually pretty cool, you should all take a look into it). His more recent foray into biology is not as well known, despite the fact that he more or less founded the field of biocomputing. Apparently he now spends his time working on analytic problems in pure math. Talking about Adleman's work, old and new, got my really excited about biocomputing all over again, and since I think most of you haven't read the books I was talking about, I wanted to just write a little about the subject.

Computational Biology is the use of standard computing to simulate and understand biological systems and their underlying processes. Biocomputing is the exact opposite, i.e. the use of biological systems to solve difficult computational problems. The underlying principle here is that biological systems are inherently parallel in nature. It's never just one process going on, there are always thousands proceeding simultaneously. Of course, it doesn't have to be this way, we could easily make a living system with very few processes, but the point is that we can have a massively parallel system of processes that do not get hopelessly tangled up. Since parallel processing can speed up recursive or iterative processes by an arbitrarily large factor, biocomputing, in terms of general notions, seemed to be an excellent approach to vast computational problems. All that remained were the specifics: for a given problem, how to encode information in biological molecules, and how to encode operations on the information that would be performed legally, automatically, and in parallel.

Despite the work that has been done in the field, no system of encoding information and operations has been successful as a general computing framework., and consequently a new system had to be developed to fit every problem the field chose to tackle. Here I'll just describe the way in which biocomputing was used to solve the N-Vertex Hamiltonian Path Problem, i.e. finding a Hamiltonian Path in a graph with N vertices. For each vertex, assign a sequence of nucleotide basepairs of length M, such that 4^M is at least as great as N. Then, for each edge in the graph, assign a sequence of nucleotide basepairs of length L, such that 4^L is at least as great as E, the number of edges in the graph. Now, for each edge, create two DNA fragments:

1. M basepairs for the vertex from which the edge originates followed by the L basepairs for the edge.
2. The L basepairs that are complementary to the sequence for the edge followed by the M basepairs for the vertex at which the edge terminates.

Finally, for each vertex, also create the DNA fragment corresponding to the M basepairs of the vertex. If these strands are all synthesized and then replicated many times over using PCR, then just by putting them in solution, due to complementary basepairing they will automatically recurse through paths in the graph, all in parallel, very quickly, and each recursion will terminate when the strand corresponding to a lone vertex (i.e. no edge) is incorporated into the growing strand corresponding to the path. Finally, by checking the length of each path strand formed, and by ensuring the presence of the DNA corresponding to each vertex, one can very easily find a solution to the Hamiltonian Path Problem in almost linear time.

Anyway, I hope that gave you a little taste for the subject. I did finish writing this at 4:55 in the morning, so if something doesnt make sense, just let me know and I'll rewrite/clarify.

~jnub

4 comments:

  1. Oh man, that is actually really cool. Fiona's working in Radhika Nagpal's lab. I knew she's doing something CS and bio related but not really what (I was probably thinking more along the lines of computational biology than biocomputing). This is pretty much the coolest thing ever.

    ReplyDelete
  2. I approve of the attempt to convert bio into something useful :D

    ReplyDelete
  3. Isn't bio already pretty useful?

    ReplyDelete