INTRODUCTION
Our ability to “see” the microbes that surround us
first arose in the 17th century when Anton
van Leeuwenhoek created a microscope, which
provided the first physical evidence of the
diversity and ubiquity of microbes in the world.
Another giant leap occurred in the 19th
century when Koch first demonstrated that bacteria
could be grown in pure culture,
beginning with the analysis of blood from cows
infected with the anthrax agent. He
subsequently became most widely known for the
postulates regarding microbial disease
causation that bear his name. For decades
following, a combination of culture and
microscopy was the only tool available to see
microbes. In the last decade of the 20th
century, however, consensus ribosomal PCR revealed
that a much greater diversity of
bacteria exists beyond what could be cultured, and
that cultured bacteria represented
only~1% of all bacterial species. Today,
sequencing technology has evolved to the point that
it is feasible to comprehensively define the
collection of microbes present in humans (i.e.,
the human “microbiome”) or any other ecological
niche in a fashion that is completely
independent of culture or microscopy.
A BRIEF HISTORY OF PATHOGEN DISCOVERY Back to top
Classic Methods
Classic methods of microbial discovery have relied
heavily on the ability to readily cultivate
or passage the organism in question. Clinical
material associated with a disease thought to
be of infectious origin is used to inoculate diverse
growth media to cultivate the microbe(s)
present in the sample. In the case of suspected
bacterial agents, selective or nonselective
growth media can be utilized, while primary or
immortal cell lines are inoculated for
suspected viruses. In addition, the clinical
specimens could also be used to infect animal
models. If a microbe could be cultivated, attempts
to identify it would follow. For bacterial
identification, differential stains and growth
conditions are used to categorize and ascertain
the genus or species present. Viral identification
is similarly based on differential growth in
various cell types as well as serological
reactions to a variety of specific antisera. The use of
microscopic techniques, including light and
electron microscopy, is also extremely important
in the process of identification. These classic
tools have been incredibly useful and resulted in
the discovery of many currently accepted human
pathogens, for example, Bacillus anthracis,
Mycobacterium tuberculosis, Yellow fever virus, and Poliovirus. However, there are two
fundamental limitations to this approach: (i)
these methods are dependent on the ability of
the microbe to grow in the substrate provided, and
(ii) even if the microbe can be cultivated,
that fact alone will not necessarily lead to
unambiguous identification of the unknown agent.
Molecular Approaches: Candidate Dependent
In the late 1980s, with the advent of PCR,
scientists could now easily use molecular
approaches to detect microbes present in a given
clinical sample if there was existing
sequence for the microbe(s) to be targeted. The
recognition that selecting PCR primers
designed to highly conserved regions in a set of
sequences (e.g., multiple bacteria or several
viruses from a common taxonomic group) could enable
the detection of previously
unsequenced or unidentified microbes provided a
novel approach for the identification of
microbes. These types of approaches are referred
to interchangeably as broad-range or
consensus PCR methods. One of the broadest applications
of this technique has been the
design of primers to the 16S rRNA gene, which
enables the detection of nearly all members
of the bacterial domain. Alternatively, more
specific primers can be selected that are
conserved within a given taxon (family, genus, or
species) to identify a more targeted set of
microbes. The ability to design highly conserved
primers is, of course, predicated on the
existence of sufficient sequence data to identify
the appropriate conserved regions. There are
a large number of microbes that have been
discovered by using PCR in conjunction with the
classical methods of microbial detection and
discovery.
Two pioneering papers by Relman and coworkers
describe the first examples of using
consensus PCR primers to identify the causative
agents of specific human diseases. Bacillary
angiomatosis was commonly thought to be of
infectious origin, but for many years no
specific microbial agent could be identified. A
putative agent could be visualized in tissue
sections following staining, but efforts to
culture the organism had failed. Sequencing of
amplicons generated by PCR using 16S rRNA gene
consensus primers demonstrated that a
previously uncharacterized rickettsia-like
bacterium was present in tissue samples of patients
with bacillary angiomatosis (67).
The bacterium was later identified as a member of the
genus Bartonella. A similar approach was
subsequently applied to address the etiology of
Whipple’s disease. Whipple’s disease was first
described in 1907 as a rare systemic disorder
that primarily caused malabsorption but could
affect any part of the body. Consensus PCR
using primers targeting the bacterial 16S
ribosomal gene resulted in the identification of an
uncharacterized actinomycete, which would later be
classified as Tropheryma whipplei (68).
These two cases demonstrated the power of
molecular pathogen discovery methods.
The use of conserved PCR primers has also been
applied to the discovery of viruses.
However, since no universally conserved sequence
akin to the 16S rRNA sequences in
bacteria is present in all viruses, consensus
sequences must be identified and consensus
primers designed for each given viral taxon of
interest. A seminal example of using
consensus PCR to identify a viral pathogen
occurred during the emergence of hantavirus
pulmonary syndrome in 1993 (61).
In the course of investigating an unusual outbreak of a
lethal pulmonary disease in otherwise healthy
young adults in the southwestern United
States, extensive testing by classic
microbiological methods ruled out most of the likely
candidates known to cause severe respiratory
disease. Serological tests revealed that patient
sera were cross-reactive with known hantaviruses.
From this lead, PCR primers were
designed to conserved regions of known hantavirus sequences,
which were then used to
amplify nucleic acids extracted from tissue
samples isolated from dying patients. Sequencing
of the amplicon generated by the primers resulted
in the identification of a novel member of
this family, which was ultimately named Sin
Nombre virus.
Since these seminal applications of consensus PCR
for the identification of bacterial and viral
pathogens, there have been many instances of
microbial identification using this strategy.
Consensus PCR, either alone or in conjunction with
classic culture and antigen detection
methods, continues to be of great utility in the
discovery of novel microbes, as illustrated by
the recent discoveries of a new phylogenetic group
of rhinoviruses (47, 49,51, 52, 59), a
spate of parechoviruses (4,
7, 8, 23, 40, 55, 78), the arenavirus Chapare virus (19),
and
Bundibugyo ebolavirus (70).
However, what is not documented in the literature is the
number of times that broad-range PCR strategies
were applied but failed to identify an
agent. It should also be evident that in order for
this approach to be useful and successful,
the list of potential candidates must be
relatively short. In these successful examples
described above, the authors had a strong
hypothesis regarding the nature of the microbe
(i.e., bacterium versus virus) present or which
specific candidate viral taxon might be
present. However, in many situations, there may
not be a leading candidate(s), thus limiting
the feasibility of using consensus PCR approaches,
especially for viral identification. Thus,
despite these successes, there has been
significant impetus for the development of pathogen
discovery strategies that are broad range and not
candidate dependent.
Molecular Approaches: Candidate Independent
The discoveries of Hepatitis C virus (HCV)
and Human herpesvirus 8 (HHV-8), also called
Kaposi’s sarcoma-associated herpesvirus (KSHV),
represented two breakthroughs in the
application of candidate-independent molecular
methods for pathogen discovery. In 1989,
the identification of HCV in patients with non-A,
non-B (NANB) hepatitis relied upon a library
immunoscreening strategy (17).
A randomly primed cDNA library was made from material
from infected animals and screened using patient
serum from NANB hepatitis patients with
the goal of identifying cDNA clones that generated
peptide sequences recognized by the
patient sera. From over a million clones that were
screened, a single clone reacted
specifically with NANB hepatitis patient sera.
From this initial cDNA clone fragment, the HCV
genome was eventually sequenced. Today, HCV is
recognized as being responsible for the
vast majority of cases of NANB hepatitis. In 1994,
human herpesvirus 8 was discovered in
the lesions of AIDS-associated Kaposi’s sarcoma (13).
The identification of Kaposi’s sarcomaassociated
herpesvirus relied on representational difference
analysis, a subtractive
hybridization-based method, to enrich for and then
identify unique sequences present in
Kaposi’s sarcoma lesions but not in healthy tissue
controls. While these two examples
demonstrate the potential of these methods, there
have been few subsequent success stories
using either of these two methods, most likely due
to technical challenges associated with
both of these strategies. Thus, there remained a
clear need for further improved strategies
for pathogen discovery.
By the end of the 20th century, the classic
culture-based methods for microbial discovery
had been augmented by multiple molecular
approaches, such as consensus PCR, library
immunoscreening, and representational difference
analysis. In parallel, targeted sequencing
of specific microbes was starting to become
feasible, thus setting the stage for the
convergence of pathogen discovery efforts and
microbial sequencing efforts.
MICROBIAL GENOMICS HISTORY Back to top
Microbe sequencing in the 20th century relied
exclusively upon Sanger dideoxy sequencing,
the dominant sequencing strategy since its
invention in 1977. From its initial incarnation
using slab gels as a readout, incremental advances
in sequencing capacity evolved as the
readout transitioned to capillary electrophoresis,
and then from single capillaries to
simultaneous analysis of 96 capillaries, which is
still used today.
Formally, the era of microbial genomics began with
the complete sequencing of
the Haemophilus influenzaegenome in 1995.
However, it was recognized almost two decades
earlier that an organism’s genomic sequence, as
the ultimate marker of evolution, could
serve to classify and define the relatedness of
both prokaryotic and eukaryotic organisms
(80). rRNA was a molecule with an appropriately broad
distribution which mutated slowly
over time, permitting the detection of
relatedness. With the advent of Sanger sequencing,
entire 16S rRNA genes could be sequenced,
including the Escherichia coli 16S rRNA gene in
1978 (11). Sequencing at the time
and in the ensuing decades was time-consuming and
expensive and was performed to obtain the minimum
amount of data that was needed.
When the H. influenzae genome was
sequenced, this bacterium became the first free-living
organism to have its genome sequenced in its
entirety (31). This was a landmark
achievement, notable also because of the use of a
“shotgun” strategy to assemble the
complete genome. “Shotgun” refers to the random
fragmentation and cloning of DNA
fragments followed by computational assembly of
the overlapping regions to generate a
complete genome sequence. Based on this proof of
principle, genomes of larger microbes
and eukaryotic organisms were subsequently
sequenced in this fashion. The following
year, Saccharomyces cerevisiae was the
first eukaryotic organism to be fully sequenced (35),
and then in 1998, the first multicellular
eukaryotic genome to be sequenced, that
of Caenorhabditis elegans, was published (12).
Since then, the complete genomes of many
human and animal pathogens have been sequenced,
including notable pathogens such
asMycobacterium tuberculosis (2001), Yersinia
pestis (2001), and Plasmodium
falciparum (2002). In 2004, the complete 1.2-Mb genome of mimivirus, the
largest known
virus, was published (65).
The human genome project was first proposed in
1990, and initial sequencing began in
1995. By 2001 two drafts of the human genome had
been published (50, 75). During the
course of this massive project, many technological
refinements in the efficiency of Sanger
sequencing itself, as well as novel tools for the
downstream computational analysis, were
implemented. These developments could naturally be
applied to sequencing of much smaller
microbial genomes and therefore contributed
substantially to the rapid increase in the rate of
microbial sequencing.
Over the past 5 years, a number of new sequencing
modalities that together have been
termed the next generation, or NextGen, of
sequencing have been developed. The three
major platforms in current use are 454 (Roche
Titanium), Solexa (Illumina), and SOLiD
(ABI). Key characteristics of these platforms
include the fact that all of them have
geometrically increased the raw sequence
generation capacity and decreased the cost per
base pair 10- to 100-fold relative to Sanger
sequencing. Although each of these new
platforms utilizes a fundamentally different
sequencing modality, in all cases, clonal
amplification of the template DNA has been moved
from bacteria (thereby eliminating the
need for plasmid cloning and propagation) to an in
vitro setting. In the 454 technology,
approximately 1 million sequence reads averaging
400 bp, or about 400 Mb of total
sequence, is generated per run. Clearly, in terms
of microbes such as bacteria, one
sequencing run of a 454 instrument is sufficient
to generate greater than 100 times the
coverage of the average 3-Mb bacterial genome. By
comparison, the Illumina platform
currently produces~20 Gb of sequence with read
lengths up to 100 bp, while the SOLiD can
generate~30 Gb with an average read length of 35
bp. For a more detailed description of
each of these platforms and capabilities, see
reference 58.
With these increases in sequencing capacity, the
sequencing of microbial genomes has
become routine. In fact, there are currently
ambitious projects that have been conceived to
comprehensively sequence the microbial diversity
of the human microbiome and the viral
diversity, “the virome,” present in humans (seechapter 13). These efforts will vastly expand
the world of sequenced microbes far beyond the
current 5,900 species of bacteria, fungi,
parasites, and viruses that have been completely
sequenced and whose sequences have
been deposited in GenBank (GenBank Genome Records
11.12.09;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome).
THE MERGING OF PATHOGEN DISCOVERY AND
MICROBIAL GENOMICS Back to top
The onset of the 21st century has seen the
convergence of the fields of pathogen discovery
and microbial genomics (Fig. 1). The goals of systematically defining the microbes present in
a given clinical sample dovetail with the goals of
“pathogen discovery,” namely, to identify
one or more microbes present in a clinical sample
that are responsible for a disease
phenotype. Two new molecular approaches for
massively parallel analysis have recently
emerged that are capable of defining the spectrum
of microbes present in clinical specimens:
microarray and sequencing-based detection. Both of
these strategies have benefited greatly
from the increased focus on human and microbial
sequencing. The vast database of nucleic
acid sequences provides the substrate from which
consensus PCR primers (described above)
and probes for microarrays (described below) have
been designed. Naturally, sequencingbased
approaches have evolved along with the sequencing
capacity of new platforms. In the
first half of this decade, Sanger method-based
sequencing dominated these efforts, and due
to the costs and limited throughput, experimental
strategies were devised to minimize the
extent of sequencing necessary to identify
microbial agents. As the decade progressed, and
more robust sequencing methods evolved, the need
to enrich for microbial sequences
diminished and more brute force strategies have
come to the forefront. For all the
sequencing-based methods, a critical component of
the process is the bioinformatic analysis
of the sequences generated. Typically,
computational pipelines have been established that
compare the resulting sequences against various
public sequence databases to determine the
origin of each sequence read. The “discovery” of a
novel microbial sequence results when a
sequence with only limited similarity to existing microbial
sequences is encountered.
Sequencing-Based Approaches
The discovery of Human metapneumovirus (HMPV)
in 2001 combined classic viral culture
with a molecular strategy termed random
arbitrarily primed PCR (73). Efforts to culture
respiratory secretions from children suffering
respiratory tract infections led to the
identification of a putative unidentified virus
that could be passaged in tertiary monkey
kidney (tMK) cells, Vero cells, and, to a lesser
extent, A549 cells. In order to identify the
virus present, random, arbitrary primers were used
to generate PCR amplicons. Differentially
represented amplicons present in infected cells
but absent from control cells were identified
by gel electrophoresis and selectively sequenced.
Multiple fragments having limited sequence
identity to avian pneumoviruses were detected,
indicating that a novel virus, now known as
HMPV, was present in the infected cells.
Seroprevalence studies indicated that by the age of
5 to 10 years, most individuals were antibody
positive, suggesting that this virus is a
common infection acquired in childhood. HMPV can
cause severe respiratory infections,
including pneumonia and bronchiolitis, and is
responsible for 5 to 10% of hospitalizations of
patients with respiratory tract infection (18).
The presentation and case severity are very
similar to those of respiratory syncytial virus (18).
In the same year, a candidate independent
sequencing strategy for identification of novel
viruses termed DNase-SISPA was described (2).
The experimental strategy relied upon
sequence-independent single primer amplification
(SISPA), wherein an adaptor containing a
primer binding sequence is ligated to both ends of
a cDNA fragment and a single primer is
then used for PCR. To enrich specifically for
viral nucleic acids present in virions, the clinical
sample is first subjected to ultracentrifugation
to pellet the virions and is then treated with
DNase to degrade any cellular nucleic acids that
are not protected within the viral capsids.
Following this enrichment, the sample is then
extracted for DNA or RNA and amplified using
SISPA. The enrichment steps in this protocol are
necessary to increase the chances of
sequencing a virus-derived sequence, given the
labor and costs of performing extensive
Sanger sequencing on the unenriched sample. In
this proof-of-concept study, two novel
bovine parvoviruses were identified (2).
A variation of the DNase-SISPA strategy, called
virus discovery cDNA-AFLP (amplified
restriction fragment length polymorphism), was
utilized in 2004 to identify a novel virus from
the family Coronaviridae, human coronavirus
NL63 (HCoV-NL63), from a child with
bronchiolitis (74). In this protocol,
following the standard ultracentrifugation, DNase
treatment, nucleic acid purification, restriction
digestion, linker ligation, and PCR
amplification, a second set of PCRs was performed
to identify differentially expressed bands
present only in the putatively infected sample.
There were 16 such bands that were cloned
and sequenced. Of these, 13 had limited sequence
similarity to known coronaviruses. Once
completely sequenced, genome analysis demonstrated
that HCoV-NL63 is most closely
related to HCoV-229, a known respiratory pathogen,
with 65% nucleotide identity.
Subsequent studies have associated HCoV-NL63 with
croup, and infection rates approaching
80% by the age of 6 as defined by serology have
been reported (22). Of note, a third novel
coronavirus, coronavirus HKU1, was identified
using consensus PCR from a patient with
pneumonia (82).
Application of DNase-SISPA to examine plasma
samples from patients with febrile illness
resulted in the identification of sequences with
only limited identity to known members of the
family Parvoviridae and theAnellovirus genus
in 2005 (41). The entire sequence was
obtained for the novel parvovirus, Parvovirus 4(PARV4),
and phylogenetic analysis revealed
that the greatest similarity was only 24 to 29%
identity to open reading frame 1 (ORF1)
of Adeno-associated virus and Avian
parvovirus. PARV4 has been detected in blood, bone
marrow, and lymphoid tissue from patients with
either HCV or human immunodeficiency
virus/AIDS and in plasma of kidney transplant
patients, and high frequencies of exposure
have been reported for hemophiliacs and injection
drug users. Two novel anelloviruses, SA1
and SA2, were also identified in this study.
Notably, these viruses were highly divergent from
known viruses and shared only 32 to 35% similarity
to TT virus (TTV). TTVs and TTV-related
viruses have been ubiquitously found to infect
humans, but there has been no direct causal
evidence to link these viruses to any specific
disease. The findings of these three novel
viruses with very low amino acid similarity to
known viruses highlights the importance of
sequencing over methods that require nucleic acid
homology for detection, as these viruses
would likely have been missed by those methods.
A similar method was also used to identify several
novel sequences with similarity to
anelloviruses in the blood of healthy donors (9).
Blood samples were subjected to density
centrifugation followed by chloroform and DNase
treatment before nucleic acid extraction.
The nucleic acids were amplified using a
polymerase with strand displacement activity,
randomly sheared, ligated to linkers, and then PCR
amplified before cloning and sequencing.
Using this technique, seven sequences with limited
similarity to known members of the
genusAnellovirus were identified. Detailed
analysis of two of these sequences demonstrated
that one shared 35% amino acid identity to SA2,
which had just been discovered only
months earlier, while the second sequence shared
63% amino acid identity to SEN virus,
another known anellovirus. As discussed
previously, the role of anelloviruses in the causation
of disease has not been established.
The discovery of Human bocavirus (HBoV) in
2005 relied on yet another slight variation of
the DNase-SISPA strategy (3).
Pooled respiratory secretions from multiple patients with
unexplained respiratory illness were
ultracentrifuged to concentrate viral particles and DNase
treated. Following nucleic acid extraction, a
random-primer-linker-based amplification
(similar to that used for the previously described
DNA microarray studies) was used.
Amplicons of 600 to 1,500 bp were cloned and
sequenced using high-throughput Sanger
sequencing (one 384-well plate). Sequences were
identified with amino acid similarity to
known Parvoviridaefamily members, Bovine
parvovirus and Canine minute virus. The original
sample was identified from the pool, and the
complete genome was obtained. Phylogenetic
analysis demonstrated that this novel genome is a
previously uncharacterized species of the
genus Bocavirus, HBoV. Subsequent studies
of this virus have demonstrated that HBoV is
frequently detected in children with respiratory
tract infection, children with asthma
exacerbation, and children with acute gastroenteritis.
Seroepidemiology studies have
confirmed infection in a Japanese cohort, with
71.1% overall prevalence with exposure by
age 6 (26), while a second study in
Sweden reported a lower rate, 33%, in a cohort of
children with acute wheezing (43).
Using the exact same method on pooled respiratory
secretions, a novel member of the
family Polyomaviridae,KI polyomavirus (KIV),
was identified in 2007 (1). A single sequence
read of 363 bp was identified that had limited
similarity to the simian virus 40 (SV40) VP1
protein, and using primers to span the circular
genome resulted in the completion of the
genome of 5,040 bp. KIV has approximately 36 to
48% amino acid identity to the BK virus,
JC virus, and SV40 T antigens, the most conserved
proteins of the family. As described in the
initial publication and subsequent publications,
KIV has been identified in both respiratory
secretions and feces. Although no clear disease
association has been demonstrated to date,
seroprevalence rates ranging from 55 to 90% (60)
have been described, indicating that KIV
infection is relatively ubiquitous.
The discovery of the WU polyomavirus (WUV)
utilized a similar strategy of high-throughput
Sanger sequencing analysis of respiratory
secretions, although in this instance, individual
samples rather than a pool were analyzed (34).
Total nucleic acid of the respiratory
secretions of a child with pneumonia was randomly
amplified, and 384 clones were
sequenced. The library contained six sequence
reads that shared 35 to 50% amino acid
identity to JC virus and SV40. The genome of WUV
followed the canonical organization of the
familyPolyomaviridae and was 5,229 bp.
Initial experiments as well as numerous subsequent
publications have identified WUV in respiratory
secretions. It has also been found in feces,
blood (whole, plasma, and serum), cerebrospinal
fluid, and lymphoid tissue. Using
seroprevalence as a measure, infection rates with
WUV range from 69 to 98% (60), but no
disease association has been established to date.
Saffold virus, a novel member of the Cardiovirus genus, was discovered
using DNase-SISPA
on a virus cultured from a patient stool sample (42).
The following year, Saffold-like viruses
were found in patients with acute enteritis and in
respiratory secretions from three countries
(24). In parallel, a series of related cardioviruses
were identified using the ViroChip and
subsequent PCR screening (15).
To date, there has been one study examining the
seroepidemiology of Saffold virus. Using a virus
neutralization assay to Saffold virus 3, a
seropositivity rate of 75% was observed by 24
months, which increased to ~90% in older
children and adults (83).
The first application of high-throughput
sequencing to identify viruses in diarrhea patients by
shotgun sequencing resulted in the identification
of a novel species in the
family Astroviridae, Astrovirus MLB1 (28).
The methodology used was essentially identical to
that used in the discovery of WUV except that a
filtration step was implemented to minimize
the recovery and amplification of bacterial
sequences. Sequencing of a stool sample from a
3-year-old child from Australia with acute
diarrhea resulted in the identification of seven
sequences with 67% or less amino acid identity to
known astroviruses. Phylogenetic analysis
of the complete genome demonstrated that it was a
highly divergent astrovirus. Subsequent
studies have described astrovirus MLB1 in 4 out of
254 additional stool samples of children
with diarrhea (29). To date, no case
control studies have been described.
A novel genus, Cosavirus, of the family Picornaviridae
was first proposed in 2008 following
the identification of a handful of novel viruses
with limited similarity to other picornaviruses.
The first isolate, human cosavirus A1 (HCoSV-A1),
was identified from a stool sample of a
child with nonpolio acute flaccid paralysis from
Pakistan (45). Analysis of the complete
genome of 7,634 bp established that this virus
shared 33 to 49% amino acid identity to its
closest known relative, Seneca Valley virus (37),
a member of the genus Cardiovirus.
Subsequent PCR screening of a cohort composed of
57 symptomatic patients and 9 healthy
contacts resulted in 34 positives that could be
classified into four distinct genetic groups (A
to D) based on sequence analysis and phylogeny. A
proposed genetic group E has also been
described following its identification from a
child with acute diarrhea in Australia (38).
Two independent groups using similar Sanger
sequencing-based screening of stool samples
described Human bocavirus 2 (HBoV2), a
putative new species of the Parvoviridae family
(5, 44). In a similar cohort of patients with acute
flaccid paralysis from which human
cosaviruses A to D were discovered, Kapoor et. al.
identified HBoV2 from two consecutive
stool samples of a child with acute flaccid
paralysis. The entire genome was sequenced; it
shares 67 to 80% similarity to the corresponding
HBoV proteins (44). In parallel, HBoV2 was
also identified in an Australian cohort of
children with acute gastroenteritis. In this study,
analysis of cases and controls revealed a
statistically significant association between infection
with HBoV2 and acute gastroenteritis (5).
In subsequent PCR-based screens to define the
prevalence of HBoV2, yet another divergent
bocavirus, HBoV3, was identified (5).
These discoveries demonstrate that
sequence-independent amplification followed by limited
Sanger capillary sequencing (typically ≤384
clones) is a robust method for identification of
novel viruses present in clinical specimens. With
the advent of next-generation sequencing
technology, samples could be sequenced in much
greater depth, allowing for detection of
microbes present at lower titers as well as
facilitating the generation of complete genomes of
novel microbes.
The discovery in 2008 of Merkel cell
polyomavirus (MCPyV) was the first study describing
identification of a novel virus using a
next-generation sequencing platform (Roche/454 FLX
platform) to identify a novel virus (27).
In this instance, cDNA libraries were made from
Merkel cell carcinoma (MCC) tumors and then
sequenced using the 454 FLX platform. From
~382,000 high-quality sequence reads generated,
one fragment had detectable sequence
similarity to a known polyomavirus. Further
analysis demonstrated that a highly divergent
polyomavirus genome, that of MCPyV, was present in
the majority of the MCC tumors
examined. Subsequent studies have corroborated
this finding, and mapping of integration
sites demonstrated that in several instances the
virus was clonally integrated in the
respective tumors. Given the very low abundance of
MCPyV mRNA sequences in these
samples, the detection of viral transcripts would
not have been possible without the use of
the next-generation platform, which enabled the
samples to be sequenced deeply in a costeffective
fashion. If the same study had been attempted with
Sanger sequencing to the same
depth, the effort would have been prohibitively
expensive.
Next-generation sequencing played a pivotal role
in defining the etiology of a mysterious
case cluster of five patients with undiagnosed
hemorrhagic fever (10). RNAs from two
postmortem liver biopsy samples and one serum
sample were randomly amplified and
sequenced with 454. Analysis of the approximately
300,000 sequences generated yielded
nine fragments with limited sequence similarity to
viruses in the
genusArenavirus. Phylogenetic analysis of
the novel virus Lujo virus demonstrated that it
branched from the Old World arenavirus complex and
had the greatest identity to Mobala
virus, Lassa fever virus, and Tamiami viruswith
67 to 74% amino acid identity in the
nucleoprotein. Further examination of the
receptor-binding portion of G1 demonstrated that
Lujo virus is equally distant from the Old World
and New World arenaviruses.
The identification of Human klassevirus 1, a
novel picornavirus, also utilized 454 sequencing.
This virus was most similar to members of the
genus Kobuvirus and has been detected in
both human stool specimens and raw sewage (36,
39). Greninger et al. were able to
sequence the complete genome of 7,889 bp
[excluding the poly(A) tail] from an infant with
gastroenteritis of unknown etiology. Subsequent
screening of 751 stool samples identified a
second positive sample, which turned out to be
from the twin sibling of the index case. Holtz
et al. identified a similar virus in an acute
diarrheal sample collected in 1984 from a child in
Australia. Reverse transcriptase PCR (RT-PCR)
screening for klassevirus 1 resulted in the
identification of two slightly divergent isolates,
one from raw sewage collected in Barcelona,
Spain, and one from a pediatric patient with acute
diarrhea (out of 340 pediatric stool
specimens tested). Given the low homology of human
klassevirus 1 to Aichi virus at 34.8 to
43.3% amino acid identity in the P1, P2, and P3
coding regions, a novel
genus,Klassevirus, has been proposed.
In an unexplained outbreak of gastrointestinal
illness, a novel astrovirus, Astrovirus
VA1, was identified by simultaneous Sanger and 454 mass sequencing
efforts (30). The
complete genome of 6,586 bp was sequenced using a
combination of Sanger shotgun
sequencing and targeted RT-PCR and rapid
amplification of cDNA ends. In parallel, 454
sequencing alone generated a contig of 6,581 bp,
demonstrating again the benefits of the
next-generation platforms. In the most conserved
region, ORF1B, VA1 shared 61% amino
acid identity to mink astrovirus and 62% amino
acid identity to ovine astrovirus. RT-PCR
screening of the six samples from the outbreak
demonstrated that three samples were
unequivocally positive, with high copy numbers.
While these initial results support a potential
role for VA1 in this outbreak, further studies are
necessary to explicitly define the
relationship between VA1 and human diarrhea.
ASSESSING THE ROLE OF PATHOGENICITY Back to top
From the above examples, it has become
increasingly clear that tremendous microbial
diversity is being uncovered and many more
microbes remain to be discovered. The pace at
which new microbes (and viruses in particular) in
clinical samples from humans are being
discovered is growing geometrically. The challenge
that now faces the scientific community is
how best to define the relevance of the growing
list of new microbes to human disease. This
has long been a challenge in the study of
infectious diseases. In 1890, Robert Koch published
a set of postulates in an attempt to standardize
the evidence needed to demonstrate a
causal role for a microbe in a disease. Koch’s
postulates are well known to this day and,
despite being over 100 years old, still serve as
guidelines for proof of causality. They are as
follows.
1. The parasite occurs in every case of the
disease in question and under circumstances
which can account for the pathological chances and
clinical course of the disease.
2. The parasite occurs in no other disease as a
fortuitous and nonpathogenic parasite.
3. After being fully isolated from the body and
repeatedly grown in pure culture, the parasite
can induce disease anew.
A major challenge in the fulfillment of Koch’s
postulates, especially in a molecular era, is that
many microbes cannot be grown in pure culture.
Another limitation is that microbes that
have either a carrier state or can cause
subclinical infections, such as Neisseria
meningitidis and Mycobacterium tuberculosis, violate Koch’s postulates.
Other scenarios that
limit the applicability of Koch’s postulates
include cases in which coinfection with more than
one microbe causes disease, or situations in which
the host genetic background contributes
to the disease state.
Over the years, various incarnations of Koch’s
postulates have been formulated. Bradford Hill
(1965) and Alfred Evans (1976) proposed broader
criteria for causation, including
epidemiological and immunological data. Most
recently, a guide for disease causality that
accounts for molecular methods of microbial
detection has been proposed by Fredericks and
Relman (33). These revisions of
Koch’s postulates have remained focused on the traditional
concept that disease arises from the presence of a
foreign microbe (and the biological
consequences of its presence). However, in the
genomic era, the concept of a “pathogen”
and how it causes disease must be reimagined in
the 21st century. With the increasing
recognition that humans (and animals) are hosts to
large communities of bacteria and
viruses, more complex models of human disease,
such as those resulting from imbalances or
alterations in the endogenous community of
microbes, must be entertained. For example,
researchers have begun to elucidate the complex
role of microbial communities in obesity by
analysis of the microbiomes of animal models of
human disease, as well as humans
themselves.
In a genetic model of obesity, sequencing of 16S
ribosomal DNA (rDNA) of the distal gut of
genetically obese (ob/ob) mice and their lean
(ob/+) and wild-type (+/+) siblings
demonstrated that the microbial composition in the
ob/ob mice differed in the relative
abundance of Bacteroidetes and Firmicutes
(53). Specifically, ob/ob animals have a 50%
reduction in the number of Bacteroidetes and
a proportional increase in Firmicutescompared
to lean (ob/+) mice. Similar results were obtained
in human studies in which 12 obese
humans were assigned to either a fat- or
carbohydrate-restricted diet, and their gut
composition was analyzed throughout the year by
monitoring of 16S rDNA sequencing (54).
Obese humans have fewer Bacteroidetes than Firmicutesin
comparison with lean controls.
Whether this imbalance is the cause of disease or
whether the imbalance is a consequence of
the disease is currently unclear. Regardless,
these observations demonstrate that a
“pathogenic state,” in this case obesity, can be
associated with the makeup of a microbial
population rather than the presence or absence of
a specific, singular, “causal microbe.”
Thus, in efforts to define the role of the newly
identified microbes in human disease, we must
not limit ourselves to the traditional one-microbe, one-disease
definition of a pathogen.
No comments:
Post a Comment