Possessing Genomic (Mostly Chromosomal) Genetic Maps, i.e., ordered genes on the different elements of a genome, is an important tool for geneticists and, thus, an early aim when studying a new species. The level of achievement of this work has been closely related to the available techniques. Until the mid-1980s, genome mapping relied on the classical concept of recombination linkage and, thus, could be achieved only in strains for which suitable natural in vivo DNAtransfer processes were available. Possible drawbacks due to heterogeneity of recombination frequencies could not be avoided. More or less extensive chromosomal maps had thus been constructed for about 15 species. A major breakthrough has been the possibility to construct socalled physical maps, i.e., to position landmarks such as restriction sites (restriction, or physical, maps) along the DNA molecules. The first, still incomplete, physical map was published in 1987. Localization of genes along this physical map, i.e., its transformation into a genetic map, can be achieved, more or less precisely, by several methods.
Extant chromosomal maps obtained by in vivo approaches (the case of E. coli is particularly enlightening) or molecular techniques (gene identification through partial sequencing or via hybridization with, hopefully, conserved genes from gene banks) greatly facilitate the work. Thus, each new map construction is further facilitated by comparison with all maps or genetic information available. More than 100 such maps, which widely differ in the number of genes positioned and the precision of their localization, are presently available, and, in principle, there is no cultivable strain which cannot thus be mapped. Therefore, even though the traditional genetic methods are still fully valuable for strain constructions or gene function analysis, they should no longer be useful for mapping purposes. Whole genome sequencing has been the next advance, with the first one published in 1995. Sixteen fully sequenced genomes from Bacteria and Archaea are now published, and nearly 50 more are in progress. This, however, does not render obsolete physical and genetic maps, since genome sequencing will not, for some time yet, be performed on as many species and strains as the former, which also quite often constitute useful requisites for sequencing.
I. IN VIVO GENETIC MAPPING OF BACTERIAL CHROMOSOMES
As soon as genetic methods became available for bacteria, they have been readily used for gene mapping, mostly applied to chromosomes, as opposed to plasmids. Various portions of chromosomes have thus been defined, depending on the techniques available. Fine mapping, used for ordering intragenic mutations or very close loci, was opposed to broad mapping, allowing the treatment of (almost) whole chromosomes. The latter is sometimes taken as the only case of genetic chromosomal mapping. Fine mapping is based on the analysis of recombination data, assuming a direct relationship between the distance separating two markers and the recombination frequency between them (Fig. 45.1). Although it allows a reliable and precise estimation of the order of the markers, this approach is hampered due to heterogeneity of the recombination frequencies along the DNA molecule. Recombination events are initiated at particular sites (sequences) along DNA molecules, known as chi (_) sites, which are usually not distributed regularly; thus, construction of a genetic map on recombination frequencies may be biased. Fine mapping is essentially performed via transformation (natural or artificial) or transduction. This explains why it applies only to small portions (usually a few percent) of chromosomes. It is, however, available for most species, provided sufficient effort is devoted to finding the relevant DNA transfer tools. Chromosome mapping in gram-negative bacteria deduces gene order from the relative time required for their transfer during a sequential (polarized or oriented) transfer of the (whole) chromosome, after socalled interrupted mating experiments (Fig. 45.1). Transfer occurs between two (donor and receptor) suitably marked bacteria, the process being controlled by conjugative plasmids integrated in the chromosome and defining the polarized transfer direction. Recombination of the transferred material (i.e., marker exchange) is necessary for its stabilization, a prerequisite for its detection. However, the recombination frequencies are high enough to insure a high rate of integration, and, thus, this requirement does not bias the overall outcome. Conjugation was first discovered in E. coli, in which plasmid F performs the transfer (see Section VI).
Plasmid F can function in only a few related gram-negative bacteria. So chromosomal mapping could be performed in other species, either via endogenous conjugative plasmids, when existing (e.g., in Pseudomonas aeruginosa) or via broad host-range conjugative plasmids (often of the incP group), engineered so as to be able to integrate more or less randomly into the desired host chromosome. In gram-positive bacteria, the situation depends on the species. No system for polarized chromosomal transfer is available in species with low GC content, such as Bacillus or Streptococcus. Partial in vivo chromosomal mapping has been performed in B. subtilis, thanks to a very large transducing phage (genome size ≈ 2% of the host chromosome). In Streptomyces, conjugative plasmids are abundant, but there is no evidence for progressive chromosome transfer, and no kinetic mapping is possible. Therefore, final recombination frequencies are used for in vivo mapping. Analyses of the progeny of various four-factor crosses between doubly auxotrophic parents were first performed in S. coelicolor strain A3(2) (Fig. 45.1). A compiled analysis of the resultant partial maps was used to deduce the order of the markers. The resulting circular linkage map consisted of two wellmarked regions, separated by two very long ‘’silent’’ quadrants. Analysis of recombinants issuing from matings between appropriately marked parents showed very limited linkage between the two marked regions, thus defining the unusual length, in terms of crossover units, of these silent quadrants. Their relative lengths were later better estimated by statistical analysis of the heterozygous regions of merodiploids in a population of heteroclones (colonies arising from partially diploid cells). Linkage maps were similarly established for other Streptomyces species.
II. PHYSICAL (MACRORESTRICTION)
MAPS Physical mapping can readily be performed for small plasmids by comparing the sizes of fragments obtained after treatment with sets of restriction enzymes, since only a limited number of such fragments are formed. To obtain the same kind of results on a larger scale, e.g., for a chromosome, one must be able to cut the DNA into a reasonable number of pieces (about 20 is optimal). This implies the formation of large fragments (most of them larger than 50 kbp), for which resolution of their size is not possible via classical agarose gel electrophoresis. Two conditions, i.e., the availability of enzymes with very few cutting sites on whole chromosomes and tools allowing size resolution for large fragments, opened the era of chromosomal physical, or macrorestriction, mapping. A. Rarely cutting site-specific endonucleases Physical mapping requires the formation of reproducible fragments and, thus, precludes random breaks due to shearing during extraction procedures, particularly frequent for large DNA molecules, such as bacterial chromosomes (or megaplasmids). Prevention of random breakage is achieved by trapping the cells, before any other treatment, into small plugs of agarose. Cell lysis is performed inside the agarose. Small molecules diffuse out while the DNA remains entrapped. The endonucleases can penetrate into the agarose network and the cleaved DNA fragments are electroeluted from the plug. Several types of enzymes have only a limited number of recognition/cutting sites on whole chromosomes. Ten or so restriction enzymes (all belonging to Class II) are ‘’rare-cutters’’ for many genomes, due to their 8-bp recognition sequence (Table 45.1). NotI is the most extensively used. But extreme GC contents or reduced representations of particular sets of nucleotides of certain genomes allow adding a few restriction enzymes with six-base recognition sequences to the list of possible rare-cutters (Table 45.1). Works are in progress to modify the recognition specificity of some frequently cutting restriction endonucleases so as to increase the size of their recognition site (this has been recently achieved for EcoRV).
Proteins which participate in intron processing have a recognition sequence of 18–26bp and have often proved valuable as rare-cutters. One of these, largely used for bacterial chromosome mapping, is protein I-CeuI, produced by a chloroplast intron of Chlamydomonas eugametos. It recognizes a 26bp sequence, usually present only in bacterial-type 23s rRNA genes. Several devices have been used to protect part of the cutting sites of frequently cutting enzymes. “Peptide nucleic acid clamps” (bis-PNAs), that bind strongly and sequence-specifically to short homopyrimidine stretches, shield overlapping methylation/restriction sites and, thus, reduce the number of accessible sites for the corresponding enzyme. A strategy called “Achilles’ heel cleavage” (AC) consists in introducing into a genome a unique site (such as the phage lambda cos site) that can be cleaved only by a specialized enzyme. B. Pulsed field gel electrophoresis (PFGE) In 1982–1984, Schwartz and Cantor devised a method allowing the separation of DNA fragments ranging from 35 to 2000kbp. The molecules were electrophoresed in an agarose gel subjected to electric fields alternately orientated at roughly right angles, hence, the name of the method, “pulsed-field gel electrophoresis,” or PFGE. The rationale of the method is that the longer a DNA molecule, the slower it reorientates at each change of direction of the electric field. This, accordingly, slows down its overall migration speed along the average direction of migration. Various types of PFGE may be chosen, depending on the sizes of the fragments to separate. This also allows distinguishing circular plasmids, which migrate in a very special way in PFGE. The reliability of this method depends on the identification of the complete set of fragments generated by the restriction treatments. Its main drawbacks are that: (i) nonambiguous resolution of the digestion fragments may be hindered if the fragments are too numerous or if two (or more) have very close lengths; (ii) fragments too small to be detected or so small that they would have eluted from the gel before it was examined, may be generated.
C. Construction of macrorestriction maps Several strategies (a few examples will be described) can be used to reconstruct the alignment of the fragments along the chromosome. Only rarely will one strategy be sufficient. 1. The fragments obtained after digestion with one endonuclease are individually digested by a second enzyme, and reciprocally. Comparisons of the sizes of the resulting doubly digested fragments allow locating the different cleavage sites, as for plasmid mapping (Fig. 45.2). The method is optimized by 2Delectrophoresis. The main drawbacks, again, are the existence of several fragments with the same length and possible elution of small fragments from the plug during the preparation of the gel. The interest of a physical map with large intervals is limited, so further cutting with other enzymes can be pursued. The method is generally accurate if the total number of secondary fragments does not exceed 20–25, thus hampering simultaneous treatment with two enzymes. 2. Other approaches allow obtaining more precise genomic location of large numbers of fragments, i.e., to link adjacent fragments. A linking clone is a clone which contains a site for a rare-cutting enzyme and overlaps two adjacent macrorestriction fragments. Adjacent segments are called contigs, referring to their contiguous positions on the chromosome. Hybridization of a labeled restriction fragment with a whole DNA library allows the detection of the two contigs flanking the corresponding restriction site. Linking clones can be obtained from a DNA library by selecting clones which contain rare-cutter sites.
The latter can be tagged for instance by insertion of a marker. Thus, the Not1 sites of Listeria monocytogenes chromosome were individually labeled with a Kmr cassette (a DNA sequence originating from a transposon, containing a gene for resistance to an antibiotic; in this case, kanamycin), the Kmr clones from the corresponding EcoRI libraries, selected in E. coli, represented the linking clones for the NotI restriction fragments (Fig. 45.2). Known open reading frames (ORFs) can be similarly used as markers (Fig. 45.2). Willems et al. (1998) have sequenced the 58 NotI/Sau3A fragments of the Coxiella burnetii genome. Checking in databases whether chance partial reading frames present at the NotI side of some fragments could correspond to registered ORFs shared by two fragments allowed linking 10 out of the 29 NotI fragments. Amplification by polymerase chain reaction (PCR) was then performed on the whole chromosome, using random pairs of primers directed towards the NotI sites of the remaining NotI/ Sau3Afragments. Amplification meant that the two corresponding fragments were adjacent, i.e., were contigs. 3. Amethod derived from that devised by Smith and Birnstiel (1976) has been applied to Pseudomonas aeruginosa mapping (Heger et al., 1998). Fragments formed after partial digestion by a frequent-cutter are separated by PFGE and hybridized with end probes from each of the rare-cutter fragments of the same genome (Fig. 45.2). Hybridization of a single frequent-cutter fragment with two probes identifies the rare-cutter contigs. In addition, comparison of the sizes of the partial digests can be used to establish a restriction map. The presence of repetitive sequences (multifamily genes) or copies of mobile elements may lead to false alignments biased by erroneous apparent identity of the corresponding regions. Other methods, such as fingerprinting assembly (comparison of the restriction patterns of supposedly overlapping clones), have been devised to check previous results or to construct contig charts. The main limit to a list of available approaches is the imagination of the workers, which should be applied to devise any technique or combination of techniques that allow overcoming a suspected problem. The first physical chromosomal map, that of E. coli (strain K12), was published in 1987 (see Section VI). Since then, an exponentially growing number of macrorestriction maps have been issued. These maps, however, are not very precise, and most do not carry any genetic information. A further step was to provide actual ordered cloned libraries (so-called encyclopedias), which are useful tools for subsequent genetic mapping.
III. ORDERED CLONED DNA LIBRARIES, OR ENCYCLOPEDIAS
An encyclopedia consists of a library of DNA fragments, cloned into a vector, ordered so as to reconstruct with sufficient overlap the order on the chromosome. The first one produced, again, was for E. coli, in 1987 (see Section VI). A. Choosing the vector for an ordered cloned library The available vectors cover a large range of possible sizes of the inserts they carry: _-based vectors (10–25 kbp), cosmids (5–50 kbp), P1-based vectors (90 kbp), yeast artificial chromosomes (YACs) (75–2000 kbp), and bacterial artificial chromosomes (BACs) (20–100 kbp). Due to their lack of stability in E. coli, YACs have been used for bacterial genome mapping only for B. subtilis, Myxococcus xanthus, and Pseudomonas aeruginosa. The BACs, based on the E. coli F plasmid origin of replication, are much more stable, and have been extensively used for eukaryotic gene libraries. Their first use in bacteria was for the construction of a Mycobacterium tuberculosis encyclopedia in 1998. B. Assembling the encyclopedia Constructing an encyclopedia implies finding overlapping clones, in order to define contigs. One must start with a number of clones 10- to 20-fold in excess over the number corresponding to the length of the chromosome. This ratio depends on the portion of the insert required for detection of linkage (the minimal detectable overlap, MDO). The methods used to order the clones are similar to those described for the construction of macrorestriction maps. The work can be strongly facilitated by available knowledge, such as macrorestriction or genetic maps, or sequenced regions. Then, a minimal overlapping map may be proposed. Thus 420 BAC clones (20–40 Mbp) have allowed the construction of the 4,4 Mb map of the circular chromosome of M. tuberculosis, but the minimal overlapping set (or miniset) requires only 68 unique BAC clones (Fig. 45.3) (Brosch et al., 1998).
IV. CONVERSION OF PHYSICAL MAPS INTO GENETIC MAPS: POSITIONING GENES ALONG THE PHYSICAL MAP Restriction maps are converted into genetic ones by locating genes with reference to the restriction sites. Localization can be achieved by DNA hybridization or by sequence comparison when available. The limiting factor of this conversion lies in the number of known genes available (i.e., cloned or sequenced) and in the precision of the physical map. In favorable cases (for instance, E. coli), the restriction pattern of a cloned fragment encompassing a given gene is sufficient to localize the gene on the fragment, and thus, on the chromosome. Difficulties may arise if one or several cutting sites are protected as part of an overlapping site for a different modification system in the original host, but no longer so when subcloned and amplified in the cloning host (usually E. coli). The genes used as probes can originate from the host itself or from a heterologous host. In the latter case, the strains should be sufficiently related and possess the same function, so that DNA sequence conservation can be expected to allow efficient annealing. PCR probes using primers covering conserved protein or DNA regions have also been thoroughly used for widely distributed genes. For instance, the cleavage site for I-CeuI, which is specific of 23S rRNA genes, allows localizing these rRNA loci.
A very efficient method using transposons with rare-cutting sites was also developed. When a mutation in a known gene has been obtained with such a transposon, the corresponding macrorestriction pattern displays the replacement of one fragment by two fragments, and subsequently allows a precise location of the transposon, hence, of the gene. The transposon Tn5 naturally harbors a single NotI site, but other transposons have been engineered so as to harbor similar single rare-cutter sites. In addition, transposon insertions can be intraspecifically transferred by conventional genetic methods, thus allowing easy comparison of chromosomal organization of related strains. Comparisons of genetic with physical maps have shown a general good agreement with regard to gene order, but less so to distances between genes. This reflects biases introduced by nonregular distribution of recombination hotspots along the chromosome in the course of genetic mapping. Thus, restriction mapping of the chromosome of Streptomyces showed the estimates obtained by in vivo mapping to have been rather accurate. Surprisingly, however, the Streptomyces chromosome has turned out to be linear instead of circular.
V. GENOME SEQUENCES Whole genome sequences are presently, and will more readily in the future be, available. Will this render the approaches via physical mapping obsolete, as the genetic methods have mostly become? This does not seem likely for several reasons. The most obvious is that, even though genome sequencing becomes cheaper, it is still time consuming. Thus, it does not seem likely that the 4000 or so presently cultivable bacterial species will have their genome sequenced, whereas a physical map is more easily feasible. It should also be recalled that a physical map, even more as an ordered library, is often a prerequisite for a genome-sequencing project, mainly for the larger bacterial genomes. Furthermore, sequencing a genome means sequencing the chromosome of a representative strain (and possibly isolate) of a species. There is growing evidence of a large plasticity of the genomes. Thus, physical maps and/or encyclopedias will remain the easiest way to approach this problem. For instance, SpeI-restricted fragments of 97 strains of Pseudomonas aeruginosa isolated from clinical and aquatic environments have been hybridized with YACs carrying 100 kb inserts (about 3% of the chromosome) from three chromosomal regions of the well-known strain PAO. At this scale, little genomic diversity was detected in these representatives of the species. In contrast, a study of 21 strains of the same species isolated from cystic fibrosis patients, and analyzed by several rare-cutting endonucleases, revealed that blocks of up to 10% of the genome could be acquired or lost by different strains. The problem in translating a genomic sequence into a genetic map is the identification of the genes, i.e., of the encoded functions. Sequence homology with a known element (at the DNA, RNA, or protein level), be it of prokaryotic or eukaryotic origin, allows postulating a function with some confidence. However, about 30% of the potential ORFs of all sequenced genomes do not show homology to any known element. Their identification will constitute the main challenge of what is now referred to as the ‘’aftersequencing,’’ or post-genomic, molecular biology.
VI. THE E. COLI K12 CHROMOSOMAL MAP The building of the chromosomal map of an E. coli K12 strain represents, historically, a typical textbook example, since all available techniques have successively been applied until the complete molecular information was reached with the sequencing. It, thus, allows summarizing, for just one strain, all the steps in genome mapping. A genetic linkage map showing 99 genes (first edition in 1964), then 166 (1967 edition), was based on conventional mapping procedures, using recombination after conjugation or transduction by phage P1 (Fig. 45.4). This approach also provided the first proof of the circularity of a bacterial chromosome. Further accumulation of information via the same approach during the next 10–15 years led to a more complete map positioning about 1500 genes, representing 20–25% of the whole chromosome. During the early 1980s, the introduction of molecular techniques allowed the cloning and sequencing of one-third of these genes and, thus, also provided their complete restriction maps. In 1987, Kohara and coworkers prepared a genomic library, now called an encyclopedia, using as cloning vector a modified _ phage. The whole library was contained in 1056 clones, each carrying 15–20 kbp-long inserts. From this library, an almost complete physical map showing restriction sites for eight endonucleases was obtained, using PFGE procedures and adapted computer programs (Fig. 45.4). It took another 2304 clones and the use of newly published information on restriction or sequencing of local regions to deal with most of the remaining ambiguities or gaps, yielding a 4700 kbplong molecule. Simultaneously, Cantor’s group, using similar approaches, proposed a 4600 kbp-long chromosome, of which all ambiguities but one were solved. This whole map was covered by 22 NotI fragments. Correlation between these restriction maps and the known linkage map was excellent in terms of gene order. Some distortions of the genetically estimated distances, however, were necessary for an optimal alignment with the restriction profile (Fig. 45.4). One cause for these discrepancies probably lies in unequal crossing-over frequencies along the chromosome. The present Kohara’s phage _ ordered library covering the whole genome is commercially available as a set of 476 phages, immobilized on a nylon hybridization membrane. A computer program was developed by Danchin’s group to ease the experimental work of mapping a new character, by first working out its most probable location(s) through comparisons of restriction profiles. Another program (1995) allowed localizing a cloned fragment by simply determining the sizes of the hybridizing fragments obtained by the 8 restriction endonucleases used by Kohara. This allowed localizing a locus within 7 kb. The complete sequence, published in 1997, describes a 4,639,221-bp-long chromosome, a figure very close to those reached by the previous physical data. Compilation of all data derived from genetic and molecular approaches, or predicted from sequence analyses or comparisons, has yielded what could be the complete set of information on this chromosome (see Fig. 45.4 for an example of this map). It is interesting to note that, even though location and precise length of a gene are now defined on a base level, the standard coordinate scale using a 0–100 arbitrary units (so-called minutes), derived from the times of transfer via conjugation, has been maintained. The available sequenced genome has now provided a new, easier means to localize any cloned fragment, by sequencing a small part of this fragment and aligning it by computer methods on the whole map. The knowledge available, mostly as restriction data, from various E. coli strains provides growing evidence for a very low level of polymorphism in this species. Most of it may be due to movement of mobile elements (ISs, transposons, prophages, pathogenicity islands). As a consequence, the K12 map gains a wider validity than its use for the specific specimen strain from which it was constructed. This, however, is known not to be a universal situation (e.g, Streptomycetes, Nesseria, and to a lesser extent, Pseudomonas).
VII. CONCLUSION: THE INTEREST OF MAPPING BACTERIAL GENOMES
The publication of the first physical maps of bacterial genomes has started a new field in molecular biology called genomics, i.e., the study of integral genome structures. Although genomics has reached its full significance with the study of genome sequences, physical maps have provided, besides the initial tools for genome sequencing, numerous important results, such as original genomic structures (linear or multiple chromosomes, very large plasmids, and linear plasmids). Comparison of related strains or species has been and will be of paramount importance to study the plasticity of genomes, i.e., their capacities of variations in chromosome organization, gene sequence, gene content per species, etc. Examples of this are the existence of large deletions in Streptomyces spp. genomes, the presence of inversions, deletions, or additions of genes or regions (e.g., in Neisseria, Pseudomonas, and Bacillus). To this purpose, PFGE, gene encyclopedia, or genome sequences of one or a few well-known strains serve to compare other strains of the same or related species.
No comments:
Post a Comment