Computational

Goals

DNA tiles serve as the building blocks for our crystals. These blocks must arrange 1) in desired secondary structures and 2) permit the binding of both insulating proteins and target third-party molecules. NUPACK is a software suite developed by researchers at Caltech for designing nucleic acid systems. Its tools perform rigorous calculations that, among other things, can optimize nucleic acids for specific orientations. The NUPACK source code can be downloaded here.

Through using these tools, our goal was to design programs that can process input secondary structure and strand data. The tools will then output information that details the validity of input designs. If the design is deemed feasible, the output information provides users with necessary strand identities for ordering from a DNA oligo vendor (e.g. Integrated DNA Technologies).

For this project, crafted tools were used to parse through tile designs. The most favorable  candidate was then assessed, ordered, and presented to the experimental team for annealing and crystallization.

Methods

Before crafting the  scripts and tools necessary to analyze input nucleic acid data, several candidate DNA tiles were designed. Past work in the Snow Laboratory indicates that the Ets-1 dimer DNA complex (PDB code 2nny) is a viable candidate for crystallization. The proteins in this complex electrochemically insulate the DNA strands, as well as support the structure of desired tiles and crystalline lattices.

single_unit_endon_black

When designing tiles, DNA alignment in the overall crystal lattice must be considered. The PYMOL supercell program helps reveal what this looks like.

insulation_black

Visual analysis reveals that many of the DNA helices lie in the same plane. We can examine exactly how these DNA-protein complexes line-up. Below is a look at how four DNA-protein tiles align in the same plane.

full_tile_pro_cartoon.png

To best develop the lattice, these parallel strands must be linked across the space between them. This way, the lattice will be structurally sound. To accomplish this, a phosphate chain long enough to span the apparent gap (approximately 12 Å) was modeled in PYMOL. This chain is our “linker,” which we’d use in designing numerous tile secondary structures.

linker

linker_tile

When designing our tiles, we must be aware of where our linker is placed within the strand base sequence. If in the wrong location, the linker will not span the gap between “adjacent” strands. This will be an important consideration when crafting tile designs.

Knowing how our strands align, as well as how they are linked together, we are ready to begin crafting unique tile designs. We recognize that so long as regions of the designed DNA permit the binding of Ets-1 proteins, we can focus our designs purely on DNA secondary structure and identity. With this in mind, we developed a cartoon that represented a 4-tile block, similar to the one shown earlier. From there, several unique tile blocks were designed.

tielcartoon

With all of these tile designs, we needed a way to determine which ones had potential to form their projected secondary structure and, by extension, crystallize. Additionally, we needed a way to fill in the designs with specific base identities for each strand. This is where NUPACK comes in.

nupack

NUPACK has the ability to flesh out input designs through identifying nucleotide base identities that optimize given secondary structures. Additionally, it can predict the minimum free energy, most probably secondary structure, and other likely secondary structures that can form when designed strands are in solution.  With this tool, we can finish out our tile designs and determine which ones are worth carrying over to the experimental side of the project.

To speed up the process of feeding simplified data into and receiving understandable data from NUPACK , we created PYTHON wrappers for the NUPACK executable files. This way, designs can be analyzed offline and the outputs are easier to interpret.

python

When submitting designs, NUPACK uses a dot-paren representation for representing nucleotide bases. Open parentheses are paired with closed parentheses to indicate hybridized bases, periods represent unpaired bases, and each character is associated with a nucleotide base. With the structure defined, we can specify domains that contain groups of characters. These domains also contain specific base identities for each character (e.g. A, T, G, C, N) In our designs, domains are represented by single letter-number combinations (e.g. a0). These domains are then grouped into strands, represented by capital letters (e.g. A).

In our designs, we must specify some of the base identities, rather than letting NUPACK perform a “perfect” optimization. This is required if we want our protein to bind to specific regions of the DNA. To accomplish this, we’ll specify some of the nucleotides in our tiles. Here, the specified bases are shown in cyan. The protein that binds to these sections is shown in turquoise and gray.

bound_DNA_2

Additionally, if our tiles are going to bind third-party molecules, we must specify another section of the tiles. In our 4-tile block system, this region lies in the center. Here, target molecules for analysis can bind to the tiles and remain in place for X-ray diffraction. The example below demonstrates this hypothesis with an engrailed homeodomain, shown in green (tile is in gray and white).

homeodomain

The culmination of all this information results in designs that can be easily crafted to enter into the design scripts. An example of a completed tile design and its design script input are shown below.

design11

11_code

The orange sections of specified bases indicate where Ets1 proteins will bind to the tiles. The purple sections are where third-party molecules will bind. The arrows crossing between the upper and lower tile halves represent the linkers. The shown design would go on to be our experimentally tested tile, though almost all of our designs were translated into both these cartoon shown and the design script inputs.

In the end, a given tile possesses a total of 88 nucleotide bases that must be specified by NUPACK. In many places, designed bases need to be complementary to ensure favorable hybridization. The number of strands varies based on the design, as does the number of linkers involved, though averages for both across all designs are six and two. Running inputs such as these through NUPACK reveals which of the hypothesized designs are feasible for wet-lab experimentation and crystallization.

Results

For any input design, all 88 bases were successfully specified and complementary. This drastically increases the likelihood of desired secondary structures forming in solution. Secondary structure probability was also determined for each input design, permitting the determination of more likely structures and, by extension, higher chances of crystallization.

One of our preliminary designs showed the most promise after initial testing of the scripts. This design (at the time, named “Design11”) was moved to the experimental side of our project.

Of those hypothesized, other proposed designs were deemed feasible by NUPACK. These designs are being analyzed further to aid in optimization of design scripts, testing of new functions, and verification of crystallization. Results from these tests may be detailed in future reports.

Discussion

The success of these scripts demonstrates the computational power of NUPACK and these novel PYTHON wrappers. Successful co-crystallization of DNA-protein systems, as shown in the experimental portion of this project, indicates that script output designs are feasible in wet-lab scenarios. In this way, our goal to use computational tools to aid in development of tile designs was accomplished. Further analysis of the effectiveness of these tools is suggested, yet initial results are encouraging.

All that said, many of the designs could not be handled by NUPACK. Linkers cannot be effectively modeled given the limitations of the software suite. These linkers normally result in structures being “pseudoknotted.” In other words, hybridization can occur across strands, rather than just between. Simplified standard hybridization between strands is shown here:

standard

While a pseudoknotted example would look like:

pseudoknotted

Current limitations of the script prevent pseudoknotted structures from being properly analyzed. Alternative representation or analysis tools are required to better analysis these initially “failed” tiles.

Despite their current limitations, implementation of the PYTHON scripts into  labs worldwide could advance understanding in nucleic acid nanotechnology and design.  As such, the early versions of the files used in generating the tested design are available on our Downloads page. Additionally, detailed documentation on the use of these files is provided. These files and documentation thereof will be updated periodically this fall and winter as the scripts are refined.

Future Work

The scripts crafted to optimize the design of given nucleic acid secondary structures are an excellent first start in simplifying complex calculations and their outputs. There are aspects of the code that require further analyzation and editing to better understand the details of ouput information. For one, overcoming the limitations of pseudoknotted structures will increase the versatility of the current scripts. Such advances will require additional experimentation and potentially other programs for verification (e.g. oxDNA). Finding ways of including linker information in the structure design and analysis could help in accomplishing this task. At the least, incorporating linker information into design input can better optimize output strands. Another area of development is detailed temperature analysis. The online NUPACK tools permit graphical representation of structure melting across broad temperature ranges. Creating functions that perform similar tasks offline with detailed visual indicators of critical temperature or defect measures (e.g. structure melting temperature) permits greater understanding of design limits. Finally, fleshing out external documentation of existing code can ease user understanding.

Currently, our “Computational Guru,” Alex Frickenstein, is working on an honors thesis that continues the work detailed here. Additionally, he is developing external documentation to download, along with the design scripts, for users worldwide to access.