Wednesday, January 7, 2009

MBB 141 Questions 27-35


27. BACs I-IV have been aligned in the contig shown in the figure above after screening a BAC library with a series of numbered STS and EST markers.

a) During the construction of BAC libraries, genomic DNA segments from different chromosomal regions sometimes are cloned together into a single BAC clone. This generates a chimeric BAC. How would you verify that none of the BACs in this contig was chimeric?

b) How would you identify the chromosomal region from which the contig derived?

c) How would you identify the orientation of the contig on a chromosome (i.e., which end of the contig is closer to the telomere or centromere)?

d) How would you identify the distance (in bp) of the markers in each BAC from each other and from the end of the insert?



28. When Celera Genomics sequenced the human genome, they obtained 13,543,099 reads of plasmids having an average insert size of 1,951 bp, and 10,894,467 reads of plasmids having an average insert size of 10,800 bp.

a) Dideoxy-chain termination sequencing provides only about 500-550 nucleotides of sequence. About how many nucleotides of sequence did they obtain from sequencing these two plasmid libraries?

b) Why did they sequence plasmids from two libraries with different-sized inserts?

c) They only sequencedf the ends of each insert. How did they determine the sequence lying between the sequenced ends?

29. Eukaryotic genomes differ in their repetitive DNA content. For example, consider the typical euchromatic 50-kb segment of human DNA that contains the human beta T-cell receptor. About 40 percent of it is composed of various genome-wide repeats, about 10 percent encodes three genes (with introns), and about 8 percent is taken up by a pseudogene. Compare this to the typical 50-kb segment of yeast DNA containing the HIS4 gene. There, only about 12 percent is composed of a genome-wide repeat, and about 70 percent encodes genes (without introns). The remaining sequences in each case are untranscribed and either contain regulatory signals or have no discernible information. Whereas some repetitive sequences can be interspersed throughout gene-containing euchromatic regions, others are abundant near centromeres. What problems do these repetitive sequences pose for sequencing eukaryotic genes? When can these problems be overcome, and how?

30. How has genomic analysis provided evidence that Archaea is a branch of life distinct from Bacteria and Eukarya?

31. Once a genomic region is sequenced, computerized algorithms can be used to scan the sequence to identify potential ORFs.

a) Devise a strategy to identify potential prokaryotic ORFs by listing features assessable by an algorithm checking for ORFs.

b) Why does the presence of introns within transcribed eukaryotic sequences preclude direct application of this strategy to eukaryotic sequences?

c) The average length of exons in humans is about 100-200 base pairs while the length of introns can range from about 100 to many thousands of base pairs. What challenges do these findings pose for identifying exons in uncharacterized regions of the human genome?

d) How might you modify your strategy to overcome some of the problems posed by the presence of introns in transcribed eukaryotic sequences?

32. One powerful approach to annotate genes is to compare the structures of cDNA copies of mRNAs to the genomic sequences that encode them. However, during the synthesis of cDNA, reverse transcriptase may not always copy the entire length of the mRNA and so a cDNA that is not full-length can be generated. This approach to gene annotation often uses ESTs that are not full-length cDNA copies of mRNA. Recently, a large collaboration involving 68 research teams analyzed 41,118 full-length cDNAs to annotate the structure of 21,037 human genes (see http://www.h-invitational.jp/).

a) What types of information can be obtained by comparing the structures of cDNAs with genomic DNA?

b) Why is it desirable, when possible, to use full-length cDNAs in these analyses?

c) The research teams characterized the number of loci per Mb of DNA for each chromosome. Among the autosomes, chromosome 19 had the highest ratio of 19 loci per Mb while chromosome 13 had the lowest ratio of 3.5 loci per Mb. Among the sex chromosomes, the X had 4.2 loci per Mb while the Y had only 0.6 loci per Mb. What does this tell you about the distribution of genes within the human genome? How can these data be reconciled with the idea that chromosomes have gene-rich regions as well as gene deserts?

d) The research teams were able to map 40,140 cDNAs to the current human genome sequence. Of the 978 cDNAs that could not be mapped, 907 could be roughly mapped to the mouse genome. Why might some (human) cDNAs be unable to be mapped to the current human genome sequence while they could be mapped to the mouse genome sequence? (Hint: consider where errors and limited information might exist.)

33. A central theme in genetics is that an organism’s phenotype results from an interaction between its genotype and the environment. Because some diseases have strong environmental components, researchers have begun to assess how disease phenotypes arise from the interaction of genes with their environments, including the genetic background in which the genes are expressed. (See http://pga.tigr.org/desc.shtml for additional discussion.) How might DNA microarrays be useful in a functional genomic approach to understanding human diseases that have environmental components, such as some cancers?

34. Mutations in the dystrophin gene can lead to Duchenne muscular dystrophy. The dystrophin gene is among the largest known: it has a primary transcript that spans 2.5 Mb, and produces a mature mRNA that is about 14 kb. Many different mutations in the dystrophin gene have been identified. What steps would you take if you wante to use a DNA microarray to identify the specific dystrophin gene mutation present in a patient with Duchenne muscular dystrophy?

35. Distinguish between structural, functional, and comparative genomics by completing the following exercise. The following list describes specific activities and goals associated with genome analysis. Indicate the area associated with the activity or goal by placing a letter (S, F, C) next to each item. Some items will have more than one letter associated with them.

a) Aligning DNA sequences within databases to determine the degree of matching.

b) Annotation of sequences within a sequenced genome.

c) Characterizing the transcriptome and proteome present in a cell at a specific developmental stage or in a particular disease state.

d) Comparing the overall arrangements of genes and nongene sequences in different organisms to understand how genomes evolve.

e) Describing the function of all genes in a genome.

f) Determining the functions of human genes by studying their homologs in nonhuman organisms.

g) Developing a comprehensive two-dimensional polyacrylamide gel electrophoresis map of all proteins in a cell.

h) Developing a physical map of a genome.

i) Developing DNA microarrays (DNA chips).

j) Identifying homologs to human disease genes in organisms suitable for experimentation.

k) Identifying a large collection of simple tandem repeat or microsatellite sequences to use as DNA markers within one organism.

l) Identifying expressed sequence tags.

m) Making gene knockouts and observing the phenotypic changes associated with them.

n) Mapping a gene in one organism using the lod score method.

o) Sequencing individual BAC clones aligned in a contig using a shotgun approach.

p) Using oligonucleotide hybridization analysis to type an SNP.

No comments: