Insertion sequences finding in oral pathogen

Chi Yang, Chuan-Hsiung Chang, Gary Xie

You can toggle the contents by clicking the header with the gray background.
Introduction

Introduction

Insertion sequence (IS) is a kind of the mobile genetic elements. ISs are capable of independent DNA rearrangement and some of them are known to regulate their neighbor genes.Therefore, IS may contribute to the bacterial phenotype. Several insertion sequence families were identified based on relatedness of transposases, conservation of the catalytic site and organization, and the similar inverted repeats (IR). In this study, the purpose is to identify the ISs in bacterial whole genomes. Based on the typical IS element features, the blastx and inverted repeats finding are used without the current knowledge about IS families and IRs. The annotation is still included as a reference to indicate the result quality. Therefore, the hypothetical proteins may be included and some degenerated/truncated IS elements could also be predicted.

The typical IS element structure

Figure 1, The structure of a typical IS element.

In this analysis, the IS finding is based on the structure of the typical IS element (fig. 1). The IS elements contains the direct repeats (DR) which is the target site duplication, left and right inverted repeats (IRL and IRR) and the open reading frames which may encode transposase. The features used in this analysis are IRs and the open reading frames which probably encode transposase. The size of IRs is about 10~40 bps and the total length of the IS element is less than about 2.5 kbps. The size of DRs are too small to be a feature, only 2~14 bps.

Flow chart of the analysis

Figure 2, The flow chart before the mannually curation
		A.	Perform genomic DNA sequences blastx against the ISPEP database
		B.	Extend 1000bps at each end of the hit regions
		C.	Use Inverted Repeat Finder (IRF) to find inverted repeats
		D.	Attach gene annotations and calculate the D1 and D2 distance
			Filter the records with D1 or D2 larger than 100bps
			Filter the records with at least one IR region located in open reading frames
		E.	Perform the second blastx search of regions within IRs and delete the following cases:
			1.	Hit regions locate in the IR regions
			2. 	No blastx hits
			3.	The redundant hits with smaller IRF score
	
Material and Methods
Data presentation

Results

Organism selection:
Regions will be shown here

Sequence

Blastx results will be shown here