Wednesday, January 7, 2009

MBB 141 Question 36


36. Comparative genomics offers insights into the relationship between homologous genes and the organization of genomes. When the genome of C. elegans was sequenced, it was striking that some types of sequences were distributed nonrandomly. Consider the data obtained for chromosome V and the X chromosome shown here. The following figure shows the distribution of genes, the distribution of inverted and tandem repeat sequences, and the location of ESTs in C. elegans that are highly similar to yeast genes.

a.
How do the distributions of genes, inverted and tandem repeat sequences, and conserved genes compare?

b. Based on your analysis in (a), what might you hypothesize about the different rates of DNA evolution (change) on the arms and central regions of autosomes in C. elegans?

c. Curiously, meiotic recombination (crossing-over) is higher on the arms of autosomes, with demarcations between regions of high and low crossing-over at the boundaries between conserved and nonconserved genes seen in the physical map. Does this information support your hypothesis in (b)?


MBB 141 Questions 27-35


27. BACs I-IV have been aligned in the contig shown in the figure above after screening a BAC library with a series of numbered STS and EST markers.

a) During the construction of BAC libraries, genomic DNA segments from different chromosomal regions sometimes are cloned together into a single BAC clone. This generates a chimeric BAC. How would you verify that none of the BACs in this contig was chimeric?

b) How would you identify the chromosomal region from which the contig derived?

c) How would you identify the orientation of the contig on a chromosome (i.e., which end of the contig is closer to the telomere or centromere)?

d) How would you identify the distance (in bp) of the markers in each BAC from each other and from the end of the insert?



28. When Celera Genomics sequenced the human genome, they obtained 13,543,099 reads of plasmids having an average insert size of 1,951 bp, and 10,894,467 reads of plasmids having an average insert size of 10,800 bp.

a) Dideoxy-chain termination sequencing provides only about 500-550 nucleotides of sequence. About how many nucleotides of sequence did they obtain from sequencing these two plasmid libraries?

b) Why did they sequence plasmids from two libraries with different-sized inserts?

c) They only sequencedf the ends of each insert. How did they determine the sequence lying between the sequenced ends?

29. Eukaryotic genomes differ in their repetitive DNA content. For example, consider the typical euchromatic 50-kb segment of human DNA that contains the human beta T-cell receptor. About 40 percent of it is composed of various genome-wide repeats, about 10 percent encodes three genes (with introns), and about 8 percent is taken up by a pseudogene. Compare this to the typical 50-kb segment of yeast DNA containing the HIS4 gene. There, only about 12 percent is composed of a genome-wide repeat, and about 70 percent encodes genes (without introns). The remaining sequences in each case are untranscribed and either contain regulatory signals or have no discernible information. Whereas some repetitive sequences can be interspersed throughout gene-containing euchromatic regions, others are abundant near centromeres. What problems do these repetitive sequences pose for sequencing eukaryotic genes? When can these problems be overcome, and how?

30. How has genomic analysis provided evidence that Archaea is a branch of life distinct from Bacteria and Eukarya?

31. Once a genomic region is sequenced, computerized algorithms can be used to scan the sequence to identify potential ORFs.

a) Devise a strategy to identify potential prokaryotic ORFs by listing features assessable by an algorithm checking for ORFs.

b) Why does the presence of introns within transcribed eukaryotic sequences preclude direct application of this strategy to eukaryotic sequences?

c) The average length of exons in humans is about 100-200 base pairs while the length of introns can range from about 100 to many thousands of base pairs. What challenges do these findings pose for identifying exons in uncharacterized regions of the human genome?

d) How might you modify your strategy to overcome some of the problems posed by the presence of introns in transcribed eukaryotic sequences?

32. One powerful approach to annotate genes is to compare the structures of cDNA copies of mRNAs to the genomic sequences that encode them. However, during the synthesis of cDNA, reverse transcriptase may not always copy the entire length of the mRNA and so a cDNA that is not full-length can be generated. This approach to gene annotation often uses ESTs that are not full-length cDNA copies of mRNA. Recently, a large collaboration involving 68 research teams analyzed 41,118 full-length cDNAs to annotate the structure of 21,037 human genes (see http://www.h-invitational.jp/).

a) What types of information can be obtained by comparing the structures of cDNAs with genomic DNA?

b) Why is it desirable, when possible, to use full-length cDNAs in these analyses?

c) The research teams characterized the number of loci per Mb of DNA for each chromosome. Among the autosomes, chromosome 19 had the highest ratio of 19 loci per Mb while chromosome 13 had the lowest ratio of 3.5 loci per Mb. Among the sex chromosomes, the X had 4.2 loci per Mb while the Y had only 0.6 loci per Mb. What does this tell you about the distribution of genes within the human genome? How can these data be reconciled with the idea that chromosomes have gene-rich regions as well as gene deserts?

d) The research teams were able to map 40,140 cDNAs to the current human genome sequence. Of the 978 cDNAs that could not be mapped, 907 could be roughly mapped to the mouse genome. Why might some (human) cDNAs be unable to be mapped to the current human genome sequence while they could be mapped to the mouse genome sequence? (Hint: consider where errors and limited information might exist.)

33. A central theme in genetics is that an organism’s phenotype results from an interaction between its genotype and the environment. Because some diseases have strong environmental components, researchers have begun to assess how disease phenotypes arise from the interaction of genes with their environments, including the genetic background in which the genes are expressed. (See http://pga.tigr.org/desc.shtml for additional discussion.) How might DNA microarrays be useful in a functional genomic approach to understanding human diseases that have environmental components, such as some cancers?

34. Mutations in the dystrophin gene can lead to Duchenne muscular dystrophy. The dystrophin gene is among the largest known: it has a primary transcript that spans 2.5 Mb, and produces a mature mRNA that is about 14 kb. Many different mutations in the dystrophin gene have been identified. What steps would you take if you wante to use a DNA microarray to identify the specific dystrophin gene mutation present in a patient with Duchenne muscular dystrophy?

35. Distinguish between structural, functional, and comparative genomics by completing the following exercise. The following list describes specific activities and goals associated with genome analysis. Indicate the area associated with the activity or goal by placing a letter (S, F, C) next to each item. Some items will have more than one letter associated with them.

a) Aligning DNA sequences within databases to determine the degree of matching.

b) Annotation of sequences within a sequenced genome.

c) Characterizing the transcriptome and proteome present in a cell at a specific developmental stage or in a particular disease state.

d) Comparing the overall arrangements of genes and nongene sequences in different organisms to understand how genomes evolve.

e) Describing the function of all genes in a genome.

f) Determining the functions of human genes by studying their homologs in nonhuman organisms.

g) Developing a comprehensive two-dimensional polyacrylamide gel electrophoresis map of all proteins in a cell.

h) Developing a physical map of a genome.

i) Developing DNA microarrays (DNA chips).

j) Identifying homologs to human disease genes in organisms suitable for experimentation.

k) Identifying a large collection of simple tandem repeat or microsatellite sequences to use as DNA markers within one organism.

l) Identifying expressed sequence tags.

m) Making gene knockouts and observing the phenotypic changes associated with them.

n) Mapping a gene in one organism using the lod score method.

o) Sequencing individual BAC clones aligned in a contig using a shotgun approach.

p) Using oligonucleotide hybridization analysis to type an SNP.

MBB 141 Questions 24-26

24. YAC clone contigs can be assembled following STS mapping. First, the locations of STSs on YAC clones are mapped using PCR. Then YAC clones that share STSs are aligned. The first step usually involves repeated screening of a library of YAC clones with different STS markers. In practice, some of the STSs may be well characterized and even localized to specific chromosomal regions or genes, whereas others may be less well characterized. In a typical screen of a YAC library constructed from the genome of a higher eukaryote, most STSs identify only a few YACs. However, a few STSs identify dozens of YACs. What might be the basis of this difference? How does the difference influence the assembly of a contig?

25. The average size of fragments, in base pairs, observed after genomic DNA from 8 different species was individually cleaved with each of 6 different restriction enzymes, is shown below:








Species

ApaI

HindIII

SacI

SspI

SrfI

NotI


GGGCCC

AAGCTT

GAGCTC

AATATT

GCCCGGGC

GCGGCCGC

Escherichia coli

68000

8000

31000

2000

120000

200000

M. tuberculosis

2000

18000

4000

32000

10000

4000

S. cerevisiae

15000

3000

8000

1000

570000

290000

A. thaliana

52000

2000

5000

1000

No sites

610000

C. elegans

38000

3000

5000

800

1.11M

260000

D. melanogaster

13000

3000

6000

900

170000

83000

Mus musculus

5000

3000

3000

3000

120000

120000

Homo sapiens

5000

4000

5000

1000

120000

260000


continuing the table...



Species

NotI


GCGGCCGC

Escherichia coli

200000

M. tuberculosis

4000

S. cerevisiae

290000

A. thaliana

610000

C. elegans

260000

D. melanogaster

83000

Mus musculus

120000

Homo sapiens

260000


a) Under the assumption that each genome has equal amounts of A, T, G, and C, and that on average these bases are evenly distributed, what average fragment size is expected following digestion with each enzyme?

b) How might you explain each of the following?

i. There is a large variation in the average fragment sizes when different genomes are cut with the same enzyme.
ii. There is a large variation in the average fragment sizes when the same genome is cut with different enzymes that recognize sites having the same length (e.g., ApaI, HindIII, SacI, and SspI).
iii. Both SrfI and NotI, which each recognize an 8-bp site, cut the Mycobacterium genome more frequently than SspI and HindIII, which each recognize a 6-bp site.

c) Based on this data, which enzymes would be good choices for constructing a restriction map of a chromosome (or a large segment of a chromosome) in each organism? Explain your choices.

26. STS mapping has been useful to generate clone contig maps. Perform the following exercise to consider the logistics of locating STSs using PCR.

a) A plasmid library contains 500-bp inserts generated from randomly sheared mouse DNA. How would you identify clones harboring STRs with the dinucleotide repeat (AT)N or the trinucleotide repeat (CAG)N?

b) How would you use these STRs as STSs in a mapping experiment?

c) Nusbaum and colleagues generated a YAC-based physical map of the mouse genome by localizing 8,203 STSs onto 960 YAC clones. If each of the STSs were assayed in each of the 960 YAC clones, how many different PCRs would need to be analyzed?

d) Although PCRs can be performed robotically, each reaction consumes time and material resources, and with each there is a certain chance of a false positive result or other error. It is therefore advantageous to reduce the number of PCR reactions by pooling individual YACs together. First, yeast colonies containing the YACs are grown individually in the wells of ten 96-well plates. Suppose the plates are numbered I to X and the wells of each plate are arrayed in an 8-row x 12 grid. The rows are coded A-H and the columns coded 1-12. This allows the position of a single YAC to be specified uniquely by a code (e.g., II-C6 specifies the YAC clone from plate II, row C, column 6). In one pooling scheme, all YACs from each row of a plate are pooled into one well (e.g., those on plate II, A1-A12 are pooled together into a well designated II-A) and all YACs from each column of a plate are pooled into one well (those on plate II, A7-H7 are pooled together into a well designated II-7).

i. How many different pools would now have to be screened with each STS?
ii. How many PCRs would have to be performed?
iii. If the II6 and IIF pools had a positive result with STS #6239, what is the code of the YAC containing this STS?
iv. How would you interpret a result where only the IV3 pool was positive for a particular STS?

e) Construct a YAC contig based on the results in the following table

STS marker

Positive YAC pools

63

II-6, II-A

210

II-6, II-A, IV-C, IV-3

522

VII-E, VII-12, X-G, I-C, I-8

713

I-C, I-8

714

VII-E, VII-12

719

X-H, X-9, IV-C, IV-3

991

X-H, X-9, VII-E, VII-12

1071

II-6, II-A, IV-C, IV-3, X-H, X-9

2631

II-6, II-A

3097

VII-E, VII-12, I-C, I-8

4630

VII-E, VII-12, I-C, I-8

5192

X-H, X-9, IV-C, IV-3

6193

X-H, X-9, VII-E, VII-12

6892

II-6, II-A, IV-C, IV-3


f) Devise a method to combine the YACs pooled in (c) to further reduce the number of PCRs. In your method, how many pools are there and how many PCRs must be performed?

MBB 141 Questions 21-23


21. A cDNA library is made with mRNA isolated from liver tissue. When a cloned cDNA from that library is digested with the enzymes EcoRI (E), HindIII (H), and BamHI (B), the restriction map shown above, part (a) is obtained. When this cDNA is used to screen a cDNA library made with mRNA from brain tissue, three identical cDNAs with the restriction map show in the following figure, part (b) are obtained. When either cDNA is used to synthesize a uniformly labeled 32-P-labeled probe and the probe is allowed to hybridize to a Southern blot prepared from genomic DNA digested singly with the enzymes EcoRI, HindIII, and BamHI, an autoradiograph shows the pattern of bands in the figure part (c). When either cDNA is used to synthesize a uniformly labeled 32-P-labeled probe and used to probe a northern blot prepared with poly(A) RNA isolated from live and brain tissues, the pattern of bands in part (d) of the figure is seen. Fully analyze these data and then answer the following questions:

a) Do these cDNAs derive from the same gene?
b) Why are different-sized bands seen on the northern blot?
c) Why do the cDNAs have different restriction maps?
d) Why are some of the bands seen on the whole-genome Southern blot different sizes than some of the restriction fragments in the cDNAs?

22. Draw the pattern of bands you would expect to see on a DNA sequencing gel if you annealed the primer 5’-CTAGG-3’ to the following single-stranded DNA fragment and carried out a dideoxy sequencing experiment. Assume the dNTP precursors were all labeled.

3’-GATCCAAGTCTACGTATAGGCC-5’

23. Katrina purified a clone from a plasmid library made using genomic DNA and sequenced a 500-bp long segment using the dideoxy sequencing method. Her clone (twin sister) Marina used PCR with Taq DNA polymerase to amplify the same 500 bp fragment from genomic DNA. Marina sequenced the fragment using the dideoxy sequencing method, and obtained the same sequence as Katrina did. She then cloned the fragment into a plasmid vector, and, following ligation and transformation into E. coli, sequenced several, independently isolated plasmids to verify that she cloned the correct sequence. Most of them have the same sequence as Katrina’s clone, but Marina finds that about1/3 of them have a sequence that differs in one or two base pairs. None of the clones that differ from Katrina’s clone are identical. Fearing she has done something wrong, Marina repeats her work, only to obtain the same results: about 1/3 of the fragments cloned from the PCR product have single base differences. Explain this discrepancy.

MBB 141 Question 19 and 20


Sara is an undergraduate student who is doing an internship in the research laboratory described above. Just before Sara started working in the lab, the restriction map above was made of the 47-kb NotI restriction fragment containing the prion-protein gene (distances between restriction sites are in kb).
Since smaller DNA fragments cloned into plasmids are more easily analyzed than large DNA fragments cloned into BACs, Sara has been asked to “subclone” the 6.1-, 10.5-, 4.1-, and 8.2-kb BamHI DNA fragments containing the prion-protein gene into the pUC19 plasmid vector. Her mentor gives her some intact pUC19 plasmid DNA, some of the purified 47-kb NotI fragment, and shows her where the lab’s stocks of DNA ligase and BamHI are stored. Describe the steps Sara should take to complete her task. In your answer, address how she will identify plasmids that contain genomic DNA inserts, and how she will verify that she has identified clones containing each of the desired genomic BamHI fragments.

20. Imagine that you have been able to clone the structural gene for an enzyme in a catecholamine biosynthesis pathway from the adrenal gland of rats. How could you use this cloned DNA as a probe to determine whether this same gene functions in the rat brain?

MBB 141 Questions 18-


18. The investigators in Question 17 were successful in purifying a BAC-DNA clone containing the gene for the mouse prion protein. They narrow down which region of the BAC DNA contains the prion-protein gene, they purified the BAC DNA, digested it with the restriction enzyme NotI, and separated the products of the enzymatic digestion by size using gel electrophoresis. Then, they purified each of the relatively large NotI DNA fragments from the gel, digested each individually with the restriction enzyme BamHI, and separated the products of each enzymatic digestion by size using gel electrophoresis. Finally, they transferred the size-separated DNA fragments from the agarose gel onto a membrane filter using the Southern blot technique, and allowed the DNA fragments on the filter to hybridize with a labeled cDNA probe. The figure above shows the results that were obtained: The pattern of DNA bands seen after the BAC DNA fragment is digested with NotI is shown in panel A; the pattern of DNA bands seen after each NotI fragment is digested with BamHI is shown in panel B; and the pattern of hybridizing DNA fragments visible after probing the Southern blot is shown in panel C.

Monday, January 5, 2009

MBB 141 Questions 1-17

  1. A piece of DNA 900 bp long is cloned and then cut out of the vector for analysis. Digestion of this linear piece of DNA with three different restriction enzymes singly and in all possible pairs of enzymes gave the following restriction fragment size data

Enzymes

Restriction fragment sizes (bp)

EcoRI

200, 700

HindIII

300, 600

BamHI

50, 350, 500

EcoRI + HindIII

100, 200, 600

EcoRI + BamHI

50, 150, 200, 500

HindIII + BamHI

50, 100, 250, 500

Construct a restriction map from these data

  1. The ability of complementary nucleotides to base pair using hydrogen bonding, and the ability to selectively disrupt or retain accurate base pairing by treatment with chemicals (e.g., alkaline conditions) and/or heat is critical to many methods used to produce and analyze recombinant DNA. Give three examples of methods that rely on complementary base pairing, and explain what role complementary base pairing plays in each of these methods.
  2. A new restriction endonuclease is isolated from a bacterium. This enzyme cuts DNA into fragments that average 4,096 base pairs long. Like many other known restriction enzymes, the new one recognizes a sequence in DNA that has twofold rotational symmetry. From the information given, how many base pairs of DNA constitute the recognition sequence for the new enzyme?
  3. E. coli, like all bacterial cells, has its own restriction endonucleases that could interfere with the propagation of foreign DNA in plasmid vectors. For example, wild-type E. coli has a gene, hsdR, that encodes a restriction endonuclease that cleaves DNA that is not methylated at certain A residues. Why is it important to inactivate this enzyme by mutating the hsdR gene in strains of E. coli that will be used to propagate plasmids containing recombinant DNA?
  4. Suppose you have cloned a eukaryotic cDNA and want to express the protein it encodes in E. coli. What type of vector would you use, and what features must this vector have? How would this vector need to be modified to express the protein in a mammalian tissue culture cell?
  5. Suppose you wanted to produce human insulin (a peptide hormone) by cloning. Assume that you could do this by inserting the human insulin gene into a bacterial host where, give the appropriate conditions, the human gene would be transcribed and then translated into human insulin. Which would be better to use as your source of the gene: human genomic insulin DNA or a cDNA copy of this gene? Explain your choice.
  6. You have inserted human insulin cDNA in the cloning vector pUC19 and transformed the clone into E. coli, but insulin was not expressed. Propose several hypotheses to explain why not.
  7. Genomic libraries are important resources for isolating genes and for studying the functional organization of chromosomes. List the steps you would use to make a genomic library of yeast in a plasmid vector. In what fundamental way would you modify this procedure if you were making the library in a BAC vector?
  8. The human genome contains about 3 x 10^9 bp of DNA. How many 20 kb fragment would you have to clone in a BAC library to have a 90% probability of including a particular sequence?
  9. Suppose a researcher wants to clone the genomic sequences that include the human gene for which a cDNA has already been obtained. She has available a variety of genomic libraries that can be screened with a probe made from the cDNA.
    1. Assuming that each library has an equally good representation of the 3 x 10^9 base pairs in a haploid human genome, about how many clones should be screened if the researcher wants to be 95% sure of obtaining at least one hybridizing clone and

i. The library is a plasmid library with inserts that are, on average, 7 kb?

ii. The library is a YAC library with inserts that are, on average, 1 Mb?

    1. What advantages and disadvantages are there to screening these different libraries?
    2. What kinds of information might be gathered from the analysis of genomic DNA clones that could not be gathered from the analysis of cDNA clones?
  1. A researcher interested in the control of the cell cycle identifies three different yeast mutants whose rate of cell division is temperature-sensitive. At low, permissive temperatures, the mutant strains grow normally and produce yeast colonies having a normal size. However, at elevated, restrictive temperatures, the mutant strains are unable to divide and produce no colonies. She has a yeast genomic library made in a plasmid shuttle vector, and wants to clone the genes affected by the mutants. What steps should she take to accomplish this objective?
  2. It’s 3 am. Your best friend has awakened you with yet another grandiose scheme. He has spent the last 2 years purifying a tiny amount of a potent modulator of the immune response. He believes that this protein, by stimulating the immune system, could be the ultimate cure for the common cold. Tonight, he has finally been able to obtain the sequence of the first seven amino acids at the N-terminus of the protein: Met-Phe-Tyr-Trp-Met-Ile-Gly-Tyr. He wants your help in cloning a cDNA for the gene so that he can express large amounts of the protein and undertake further testing of its properties. After you drag yourself out of bed and ponder the sequences for a while, what steps do you propose to take to obtain a cDNA for this gene?
  3. A piece of DNA 5,000 bp long is digested with restriction enzymes A and B, singly and together. The DNA fragments produced are separated by DNA electrophoresis and their sizes are calculated, with the following results:

Digested with

A

B

A+B

2,100 bp

2,500

1,900

1,400

1,300

1,000

1,000

1,200

800

500


600



500



200

Each A fragment is extracted from the gel and digested with enzyme B, and each B fragment is extracted from the gel and digested with enzyme A. The sizes of the resulting DNA fragments are determined by gel electrophoresis, with the following results

A fragment

Fragments produced by digestion with B

B fragment

Fragment produced by digestion with A

2,100 bp

1,900; 200

2,500

1,900; 600

1,400

800; 600

1,300

800, 500

1,000

1,000;

1,200

1,000; 200

500

500











Construct a restriction map of the 5,000 bp DNA fragment.

  1. A 10-kb genomic DNA EcoRI fragment from a newly discovered insect is ligated into the EcoRI site of the pUC19 plasmid vector and transformed into E. coli. Plasmid DNA and genomic DNA from the insect are prepared and each DNA sample is digested completely with the restriction enzyme EcoRI. The two digests are loaded into separate wells of an agarose gel, and electrophoresis is used to separate the products by size.
    1. What will be seen in the lanes of the gel after it is stained to visualize the size-separated DNAs?
    2. What will be seen if the gel is transferred to a membrane to make a Southern blot, and the blot is probed with the 10-kb EcoRI fragment? (Assume that the fragment does not contain any repetitive DNA sequence.)
  2. During Southern blot analysis, DNA is separated by size using gel electrophoresis, and then transferred to a membrane filter. Before it is transferred, the gel is soaked in an alkaline solution to denature the double-stranded DNA, and then neutralized. Why is it important to denature the double-stranded DNA? (Hint: Consider how the membrane will be probed.)
  3. A researcher digests genomic DNA with the restriction enzyme EcoRI, separates it by size on an agarose gel, and transfers the DNA fragments in the gel to a membrane filter using the Southern blot procedure. What result would she expect to see if the source of the DNA and the probe for the blot is as described as follows?
    1. The genomic DNA is from a normal human. The probe is a 2.0-kb DNA fragment obtained by excision with the enzyme EcoRI from a plasmid containing single-copy genomic DNA.
    2. The genomic DNA is from a normal human. The probe is a 5.0-kb DNA fragment that is a copy of a LINE (long interspersed elements) sequence with an internal EcoRI site.
    3. The genomic DNA is from a normal human. The probe is a 5.0-kb DNA fragment that is a copy of a LINE sequence that lacks an internal EcoRI site.
    4. The genomic DNA is from a human heterozygous for a translocation (exchange of chromosome parts) between chromosomes 14 and 21. The probe is a 3.0-kb DNA fragment that is obtained by excision with the enzyme EcoRI from a plasmid containing single-copy genomic DNA from a normal chromosome 14. The translocation breakpoint on chromosome 14 lies within the 3.0-kb genomic DNA fragment.
    5. The genomic DNA is from a normal female. The probe is a 5.0-kb DNA fragment containing part of the testis determining factor gene, a gene located on the Y chromosome.
  4. A molecular genetics research laboratory is working to develop a mouse model for bovine spongiform encephalopathy (BSE, “mad cow”) disease, which is caused by misfolding of the prion protein. As part of their investigation, they want to investigate the structure of the gene for the prion protein in mice. They have a mouse genomic DNA library made in a BAC vector and a 2.1-kb long cDNA for the gene. List the steps they should take to screen the BAC library with the cDNA probe.