DNA RESEARCH 7, 31-63 (2000) Short Communication
Structural Analysis of Arabidopsis thaliana Chromosome 5.X. Sequence Features of the Regions of 3,076,755 bp Covered bySixty PI and TAC Clones
Shusei SATO, Yasukazu NAKAMURA, Takakazu KANEKO, Tomohiko KATOH, Erika ASAMIZU,
Hirokazu KOTANI, and Satoshi TABATA*
Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan
(Received 24 January 2000)
Abstract
In our ongoing project to deduce the nucleotide sequence of Arabidopsis thaliana chromosome 5, non-redundant PI and TAC clones have been sequenced on the basis of the fine physical map, and as of January,2000, the sequences of 16.6 Mb representing approximately 60% of chromosome 5 have been accumulatedand released at our web site. Along with the sequence determination, structural features of the sequencedregions have been analyzed by applying a variety of computer programs, and we already predicted a totalof 2697 potential protein coding genes in the 11,166,130 bp regions, which are covered by 159 PI and TACclones. In this paper, we describe the structural features of the 3,076,755 bp regions covered by newlyanalyzed 60 PI and TAC clones. A total of 715 potential protein coding genes were identified, giving anaverage density of the genes identified of 1 gene per 4001 bp. Introns were observed in 80% of the genes, andthe average number per gene and the average length of the introns were 4.5 and 147 bp, respectively. Thesesequence features are nearly identical to those in our latest report in which the data were compiled basedon a new standard of gene assignment including the computer-predicted hypothetical genes. The regionsalso contained 12 tRNA genes when searched by similarity to reported tRNA genes and the tRNA scan-SEprogram. The sequence data and information on the potential genes are available through the World WideWeb database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/kaos/.Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; TAC genomiclibrary; gene prediction
In order to investigate the whole genetic system in genes in the sequenced regions have been analyzed us-higher plants, we have been operating a sequencing ing a variety of computer programs for similarity searchproject of the genome of a dicot model plant Arabidop- and gene modeling, and we so far predicted the potentialsis thaliana. Of five chromosomes that constitute the genes in a total of 11,166,130 bp which are representedA. thaliana genome of approximately 120 Mb, we fo- by 159 PI and TAC clones.3~n In this paper, we newlycused our efforts on chromosomes 5 and 3. For pre- investigated the structural features of the 3,076,755 bpcise localization of the clones for DNA sequencing, we regions covered by an additional 60 PI and TAC clones,constructed the fine physical maps of both chromosomeswith clones from YAC, PI , TAC, and BAC libraries.12
lm isolation and Sequencing of PI and TACOn the basis of the fine physical map information, PI Clonesand TAC clones were selected and assigned on the mapby polymerase chain reaction (PCR), and then subjected DNA sources and the method of clone isolation wereto sequence analysis. As of January 2000, the regions of essentially the same as described in the previous paper.3
16.6 Mb representing approximately 60% of chromosome The PI and TAC clones containing the DNA regions5 have been sequenced and the data are available at our which cover a total of 60 DNA markers on chromosomeweb site KAOS (Kazusa Arabidopsis data Opening Site, 5 were isolated by screening the Mitsui PI 1 2 and TAC13
http://www.kazusa.or.jp/kaos/). In parallel, potential libraries by means of PCR using primers designed from~ : ——— ~~ : the sequence information of DNA markers. The DNACommunicated by Mituru lakanami
* To whom correspondence should be addressed. Tel. +81-438- markers and selected clones are listed in Table 1. Rel-52-3933, Fax. +81-438-52-3934, E-mail: [email protected] ative positions of the markers and the sequenced clones
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
32 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
on chromosome 5 are shown in Fig. 1. The relative ori-entation of each clone and contig on the chromosome hasbeen confirmed by anchoring both ends of the clone tothose at the corresponding positions of the contig map.
The nucleotide sequence of each PI or TAC insert wasdetermined according to the bridging shotgun methoddescribed previously.3 The length of the nucleotide se-quence of each PI or TAC insert finally confirmed is listedtogether with the accession numbers in Table 1.
2. Assignment of Potential Coding Regions
For assignment of the protein coding regions and genemodeling, similarity search and computer prediction wereperformed as described in the previous paper.3 Briefly,similarity search against the non-redundant protein se-quence database nr (compiled by NCBI) was carried outusing the BLASTX14 program. In parallel, the posi-tions of potential protein coding regions were predictedwith the Grail,15 GENSCAN16 and NetGene218 com-puter programs. The transcribed regions were assignedby comparison of the nucleotide sequences with Arabidop-sis ESTs18-19 in the public databases using the BLASTNprogram.14 All the results obtained were compiled withthe aid of our new web-based tool, named ArabidopsisGenome Displayer (manuscript in preparation), then as-signment of the potential protein coding genes was car-ried out by taking both similarity to known genes andcomputer prediction into consideration. Therefore, theregions predicted only by the computer programs withno apparent similarity to known genes were also assignedas genes. This standard of gene assignment has beenadopted since the analysis in our last report,11 whilesuch computer-predicted hypothetical genes were not in-cluded in the earlier analyses.3"10 To sum up, 715 poten-tial protein-coding genes as well as 54 partial genes lo-cated at the terminal regions of the clones and 43 pseudogenes were assigned in the 3,076,755 bp regions, givingan average gene density of 1 gene per 4001 bp. Thisvalue is lower than that in our latest report11 in whichthe data were compiled based on a new standard of geneassignment described above, and is higher than that ob-served in regions of chromosomes 220 and 4.21 The reasonfor this inconsistency is thought to be the difference inthe ratio of heterochromatic regions within the analyzedsequences.
In addition to the protein-coding regions, the RNAcoding regions were assigned on the basis of sequence sim-ilarity to the reported structural RNAs. For tRNA genes,the prediction by the tRNAscan-SE program22 was alsotaken into account. As a result, 12 tRNA genes corre-sponding to 12 amino acid species and genes for Ul, U3and U4 snRNAs were identified in the 3,076,755 bp re-gions. Both potential protein and RNA coding genes aredenoted by numbers with the clone names followed bysequential numbers from one end to another of the in-
length (Mbp)
0 —,
10
20 —
mi121
mi97
mil 74-
mi322 .mi438 -
mi138 .mi433 -mi90 '
mi219-
mi125-
mi137-PHYC
mi323 "
mi194-mi83 •
mi61
nga129~
g4130 -
mi69 -mi70 -mi184"
g2455
mi335
,K5A21,MQM1MSK10K2K18MWP19MAB16K15O15MUD 12K1O13MUF8K5J14K16E1MWF20K17022K9E15K2N11K11I1MSD23MQL5K15F13MIF21MJE7K20J1K7J8K6M13/MNI5K9P8K6A12MFB16MWD22K17N15MJM18K24M7MNB8MYN8MJP23K5F14MWJ3K24C1MKN22MPI10MHM17MUL3/MJB24K21L19MCK7MZN1K18B18MGO3K22G18MMI9MRG21K9H21MBM17MXK3K22J17/K14B20K1L20MSN2K21H1
30 —J
Figure 1. Relative locations of the sequenced PI and TAC clonesand the associated markers on the physical map of chromosome5. The positions of DNA markers used for PI and TAC isolationand of other major DNA markers were localized on the mapon the basis of the YAC tiling path and map information inre/. 1. The vertical open bar represents the entire length ofchromosome 5. The names of PI and TAC clones are givenat the right side, and those of markers at the left side. Thedistance (Mbp) from the telomeric site of the top arm is givenin the vertical scale.
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1] S. Sato et al. 33
Table 1. Information of the sequenced PI and TAC clones.
Clone
name
(~~KIL20K 1 0 1 3K2K18K2NI1K5A21K5F14K5J14K6AI2K6M13
K7J8K9E15K9H2IK9P8Ki l l ]
K14B20K15OI5K16EI
K16F13K17NI5K17022K18B18K20JIK21H1K21L19K22G18K22J17K24C1K24M7MAB16MBM17MCK7MFB16MG03
MHM17MIF21MJB24MJE7
MJM18MJP23
MKN22MMI9MNB8MNI5MPI10MQL5MQM1MRG21MSD23MSK10MSN2
MUD 12MUF8MUL3
MWD22MWF20MWJ3
MWP19MXK3MYN8MZN1
DNA markers
endsof K2A18&K1F13ends of MEE6&MYC6
CICIIFILendsof MFC19&MRA 19
MDJ22_right endCIC5H10R
ends of MJC20&MDH9MXI22 left end
nga!29K21P3_rightend
endsof K18C1 &MFC 19endsof MDCI2&MLE2
MPF2l_rightendMPLI2_rightendK2A18_leflend
CIC10E5Rends of MDH9&MFO20
endsof MCA23&MDN 11CIC11FI0
endsof K21C13&K18C1ends of K19M22&MNC 17endsofK19E20&K21P3
CIC3B1MCK7 left end
endsof MAC9&MTG 10endsof MPA24&K 14B20
MDA7_right endCIC11F10CtC10E5L
endsof MGI19&MHJ24mil84
CIC11F10MMN10_nghtend
CIC10B4LMDNll_rightendMSF19 left endKI5N18 left end
endsof MIO24&MSG 15endsof K19P17&K18G13
endsof MCD7&MIK 19endsofMTG10&K19Bl
MXC20 right endK6MI3_rightend
mi69CICI1B8L
endsof K19M13&MR0 IIendsof K19B1&MQB2
ends of MZA15&MQD22C1CI1F1
endsof K1F13&MUD21endsof MSN9&MYH 19
endsofMBK23&K16L22MJB24 left endK3K7 right end
CIC5F12LMDF20_right end
endsof MXH1&M1K22C1C4B2L
endsof K19E1&MNC6K19M22_leftend
Confirmed
length (bp)
476652527541465303401388031178597626413677129569636205215319706705152940251230263396319742812936772035896362437434241087454531121129498739997047552717870906608743570784235937258589742981620331827272298173646872210112960588398813655515133479814146292722601137768201087180911934235611026814945452881672
Accssion
number
AB0222I1ABO 19225AB02303IAB0222I3AB024030AB0222I4AB023032AB024031ABO23O33AB023034AB020744ABO23O35AB024032ABO 19223ABO181O8AB024026AB022210AB024025AB018109ABO 19224AB024027AB023028AB020742AB024029AB022212AB020743AB023029ABO 19226AB0181I2ABO 19227ABO 19228AB023037ABO 19231AB024035AB023039AB019233AB020745AB025623AB018115AB019234AB019235AB018116AB025627AB020747ABO18117AB025633AB020751AB022221AB024037AB018119AB022222AB025635AB023042AB023044AB025638AB018120AB020753ABO 19236AB020754AB020755
sert, which are listed in the table below the figure, andare also schematically represented in Fig. 2.
3. Structural Features of Potential ProteinGenes
In this paper, the complete structures of 715 potentialprotein coding genes were predicted. Structural featuresof these genes as well as those of 2619 genes includingthose previously identified are listed in Table 1. Theyamount for approximately 13.1% of the total gene con-stituents (2xlO4 genes) assumed for A. thaliana. Ap-proximately 77% of the protein-coding genes containedintrons, and the average number per gene and their av-erage length were 4.0 and 167 bp, respectively.
4. Expression Level of Potential Protein Genesand Gene Segments
The nucleotide sequence of each of the potential pro-tein coding genes was compared with those in the Ara-bidopsis EST database, and the number of matched Ara-bidopsis ESTs was counted to monitor the transcrip-tional level of each gene. Of 715 complete and 54 par-tial genes that we have identified in chromosome 5 inthis study, 290 carried matched ESTs. The putativeproducts of the genes hit by 10 or more EST files,suggesting to be a class of highly expressed genes, in-clude those showing sequence similarity to multicat-alytic endopeptidase complex, proteasome component,alpha subunit in A. thaliana (K2K18.4), xylosidase inAspergillus niger (K7J8.3), hypothetical protein in A.thaliana (K18B18.8), subtilisin-like proteinase homologin A. thaliana (K18B18.9), outer membrane lipopro-tein Blc precursor in Citrobacter freundii (K21L19.6),26S protease regulatory subunit 6B homolog in Solariumtuberosum (MCK7.16), unknown protein in A. thaliana(MIF21.5), RNA helicase in A. thaliana (MMI9.2),40S ribosomal protein S20 in A. thaliana (MMI9.13),tubulin beta-2/beta-3 chain in A. thaliana (MRG21.11and MRG21.12), cytoplasmic malate dehydrogenase inA. thaliana (MWF20.2), NOI protein in A. thaliana(MWJ3.3), and glutamate synthase precursor in Med-icago sativa (MYN8.7).
The sequence data as well as the gene informationshown in this paper are available through the World WideWeb at http://www.kazusa.or.jp/kaos/.
Acknowledgements: We thank S. Sasamoto for ex-cellent technical assistance. Thanks are also due toT. Kimura, T. Hosouchi, K. Idesawa, K. Kawashima,M. Matsumoto, A. Matsuno, A. Muraki, N. Nakazaki,5. Shinpo, C. Takeuchi, T. Wada, A. Watanabe, M.Yamada, and M. Yasuda for their excellent teamwork insequence analysis. We are grateful to A. Tanaka for tech-nical advice, and Mitsui Plant Biotechnology ResearchInstitute and Arabidopsis Biological Resource Center atthe Ohio State University for providing the DNA mark-ers and the DNA libraries. This work was supported bythe Kazusa DNA Research Institute Foundation.
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
34 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
Table 2. Structural features of potential protein coding genes in A. thahana chromosome 5.
Features 715 genesa 2619 genes'1'Gene length (bp) including intronsProduct length (amino acids)Genes with intronsNumber of introns/geneExon length (bp)Intron length (bp)GC content of exonsGC content of introns
74-14479 (1993)25-2216 (445)5750-42 (4.5)3-4473 (245)26-1450 (147)43%32%
62-14479 (1965)19-2756 (433)20120-42 (4.0)2-4473 (260)8-5405 (167)43%32%
Structural features of the potential protein-coding genes assigned so far are listed. The715 genes are assigned based on the new standard in this studya' and the 2619 genes1''include previously assigned 1901 potential protein genes. Average values are shown inparentheses.
References
1. Kotani, H., Sato, S., Liu, Y-G. et al. 1997, A fine physicalmap of Arabidopsis thaliana chromosome 5: Constructionof a sequence-ready contig map, DNA Res., 4, 371-378.
2. Sato, S., Kotani, H., Hayashi, R. et al. 1998, A physicalmap of Arabidopsis thaliana chromosome 3 representedby two contigs of CIC YAC, PI, TAC and BAC clones,DNA Res., 5, 163-168.
3. Sato, S., Kotani, H., Nakamura, Y. et al. 1997, Structuralanalysis of Arabidopsis thaliana chromosome 5. I. Se-quence features of the 1.6 Mb regions covered by twentyphysically assigned PI clones, DNA Res., 4, 215-230.
4. Kotani, H., Nakamura, Y., Sato, S. et al. 1997, Structuralanalysis of Arabidopsis thaliana chromosome 5. II. Se-quence features of the regions of 1,044,062 bp coveredby thirteen physically assigned PI clones, DNA Res., 4,291-300.
5. Nakamura, Y., Sato, S., Kaneko, T. et al. 1997,Structural analysis of Arabidopsis thaliana chromo-some 5. III. Sequence features of the regions of1,191,918 bp covered by seventeen physically assigned PIclones, DNA Res., 4, 401-414.
6. Sato, S., Kaneko, T., Kotani, H. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. IV. Se-quence features of the regions of 1,456,315 bp covered bynineteen physically assigned PI and TAC clones, DNARes., 5, 41-54.
7. Kaneko, T., Kotani, H., Nakamura, Y. et al. 1998,Structural analysis of Arabidopsis thaliana chromosome5. V. Sequence features of the regions of 1,381,565 bpcovered by twenty one physically assigned PI and TACclones, DNA Res., 5, 131-145.
8. Kotani, H., Nakamura, Y., Sato, S. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. VI. Se-quence features of the regions of 1,367,185 bp covered by19 physically assigned PI and TAC clones, DNA Res., 5,203-216.
9. Nakamura, Y., Sato, S., Asamizu, E. et al. 1998,Structural analysis of Arabidopsis thaliana chromosome5. VII. Sequence features of the regions of 1,013,767 bpcovered by sixteen physically assigned PI and TAC
clones, DNA Res., 5, 297-308.10. Asamizu, E., Sato, S., Kaneko, T. et al. 1998, Structural
analysis of Arabidopsis thaliana chromosome 5. VIII.Sequence features of the regions of 1,081,958 bp cov-ered by seventeen physically assigned PI and TAC clones,DNA Res., 5, 379-391.
11. Kaneko, T., Kato, T., Sato, S. et al. 1999, Structuralanalysis of Arabidopsis thaliana chromosome 5. IX. Se-quence features of the regions of 1,011,550 bp covered byseventeen PI and TAC clones, DNA Res., 6, 183-195.
12. Liu, Y.-G., Mitsukawa, N., Vazquez-Tello, A., andWhittier, R. F. 1995, Generation of a high-quality PIlibrary of Arabidopsis suitable for chromosome walking,Plant J., 7, 351-358.
13. Liu, Y.-G., Shirano, Y., Fukaki, H. et al. 1999, Com-plementation of plant mutants with large genomic DNAfragments by a transformation-competent artificial chro-mosome vector accelerates positional cloning, Proc. Natl.Acad. Sci. USA, 96, 6535-6540.
14. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., andLipman, D. J. 1990, Basic local alignment search tool, J.Mol. Bwl., 215, 403-410.
15. Uberbacher, E. C. and Mural, R. J. 1991, Locatingprotein-coding regions in human DNA sequences by amultiple sensor-neural network approach, Proc. Natl.Acad. Sci. USA, 88, 11261-11265.
16. Burge, C. and Karlin, S. 1997, Prediction of completegene structures in human genomic DNA, J. Mol. Biol.,268, 78-94.
17. Hebsgaard, S. M., Korning, P. G., Tolstrup, N. et al.1996, Splice site prediction in Arabidopsis thaliana DNAby combining local and global sequence information,Nucl. Acids Res., 24, 3439-3452.
18. Newman, T., Bruijn, F. J., and Green, P. 1994, Genesgalore: A summary of methods for accessing results fromlarge-scale partial sequencing of anonymous ArabidopsiscDNA clones, Plant Physioi, 106, 1241-1255.
19. Cooke, R., Raynal, M., Laudie, M. et al. 1996, Furtherprogress towards a catalogue of all Arabidopsis genes:analysis of a set of 5000 non-redundant ESTs, Plant J.,9, 101-124.
20. Lin, X., Kaul, S., Rounsley, S. et al. 1999, Sequence
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
and analysis of chromosome 2 of the plant Arabidopsisthaliana, Nature, 402, 761-768.
21. Mayer, M., Schuller, C, Wambutt, R. et al. 1999,Sequence and analysis of chromosome 4 of the plant Ara-
S. Sato et al.
bidopsis thaliana, Nature, 402, 769-777.
35
22. Lowe, T. M. and Eddy, S. R. 1997, tRNAscan-SE: a pro-gram for improved detection of transfer RNA genes ingenomic sequence, Nucl. Acids Res., 25, 955-964.
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
36 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
K11I1 (51529 bp)
l i m n ! • • ! • ! i MI ii inun HIMlit ii II I I ii ii
II II Ii I I1 2 3 ^ 4 5 ^ 6 7 8 9
I I
| | I I II I II
Grail exon
Protein db hit
ESTdbNt
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
iJeKlKlKlKlKlKlKlKlKlKl
itifier11.111.211.311.411.011.011.711.811.911.10
Kim.11K11I1.12
Kl 111. 13
PositionDirection 0"
+ 1092+ 5022+ 0400+ 7211+ 118984- 14294+ 20763+ 22441+ 28047+ 32044+ 39319+ 47678
48788
3' Exo44835999C743
1113513988189442141525433293773C8004389C48309
51478
103151C252CC3
0
EST
161101001110
0
Sequence IDC53 tJ133 r93
1122 y097 fi
1100 r192 M858 «417 *
1353 v1298 r
138 v
761 r
]|3914654|*pi|5810996|ei
i|0791483|ei
P74035blCAB53C51.1
>b|CAB53027.1i| 4263522|«b| AAD15348i|5791483|eii|5791483|e.i|5791483|eni|0791483|eui|0791483|eii|5791483|ei
ib|CAB03027.1ib|CAB53527.1ib|CAB53527.1b|CAB53527.Hb|CAB03527.1ib|CAB53527.1|
i|0042402|Kb|AAD38289.1j
i|5810991|ei ib|CAB03646.1
Overlap
132
1112657
1091185820292
11061241
118
702
Identity
96.2
00.838.954.408.654.138.251.348.047.1
68.8
Definition
AL110123) ribo.ou
AL110110) puAC004044) byAL11011G) puAL110110) puAL110110) puAL110110) puAL11011G) puAL110110) pu
at>o
atatatatat
iv*lie
ivt
i v tIVf
IVe
IV
at vAC007789) putativ
Oryza aativapartial) (AL110123)ike A- tlialiaiia
ill prote
proteintical prtproteinproteinproteinproteinproteilprotei.
e ABA
multidr
»K
Jte//;/j
iii
Js
protein RIMML32-like protein A. tfudiai,.
. tual/'ariain A. thaliana. thaiiana. thaliana
Umlianathalianathaliaim
. Mia/iwiaduced plat.il!* lneuibrane protein
re^anceprotein/r^^protei.
K14B20 (40251 bp)
mil
2
t
! I1 1 I
1
451 1
l l t t lBI
•6
1 17
II 1 I
II •in ii
1 1 Illl
I
n it
MUMIlll III II till
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
No. of JTcTof LengthExon EST Overlap
Identity DefinitioL021G84) LRR-like(AL021G84) predicted(AL021G84) predicted
(AL021C84) predicted(AL021084) predictedIAL021084) DEAD biA. tlwlimui(AL021084) predicted pi(AL021084) predicted p.(AF002078) kinesin-like(AC002340) 3-bydroxA. thalia/,ajpartial]
K14B20.1K14B20.2K14B20.3K14B20.4K14B20.0K14B20.6K14B20.7
K14B20.8K14B20.9K14B20.10K14B20.11
K14B20.12
3048444083249826
1243215059
18326210072710333564
422354178783
103221444917972
20072223883312535892
11226
12
31
2114
37021 40019
3207702
477033
432444
1260378
727
gi|2827704|emb|CAA10G77 169 90.0gi|2827703|emb|CAA10676 231 89.7gi|2827702lembiCAA10670| 325 100.0
gi|2827701|emb|CAAlG674| 42 69.8gi!2827701|emb|CAA16674| 376 89.7K"]i2827700|emb|CAA16673| G32 82.5
gi|2827699|emb|CAA16G72 431 84.7gi|2827699|einb|CAAlGG72. 435 90.1gi|2224925 1259 93.4Ki|2880043|gb|AAC02737.1i 371 83.9
i A. tfiWiai A. «,ali.
tiial/atlutlia
tin A.jtem•obuty bydrola?,
Figure 2. Gene organization in the 60 PI and TAC clones. Positions of the identified or predicted genes in each insert of the PI andTAC clones are schematically represented by color-coded boxes above (rightward) and below (leftward) the wide line in the middlewhich represents the entire insert sequence. The length of sequenced region in each insert was given in parenthesis together withthe clone name at the top. The names of the adjacent overlapping clones of which sequences had been reported are shown on themiddle bars. Arrowheads indicate the directions of the DNA strands (5' to 3'). Dark and faint blue bars with numbers representthe positions of the assigned potential protein coding genes, and pseudo and partial genes, respectively, and red bars the positionsof RNA coding genes. Gray bars indicate the positions of the regions which matched to the Arabidopsis ESTs. The regions whichshowed similarity to the sequences in the protein database are shown by yellow, orange and red bars, each of which correspondsto BLASTX scores of 70-100, 100-250, and 250 or more, respectively. The green bars indicate the positions of the potential exonspredicted by the Grail program. Each of three different colors with increasing depth corresponds to the regions with Grail scores ofless than 70, 70-90, and 90 or more, respectively. The potential protein and RNA coding genes assigned as described in the textwere listed below each of the figures. In this table, the number of amino acid residues and nucleotide length (in italic) of putativegene products of the respective potential protein and RNA coding genes are indicated.
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
K15O15 (23026 bp)
S. Sato et al. 37
II
2
inn
II!5
!•..•• J . , 1 - .
3 4
I
GraUexon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrailexon
deduced genes
identifie.K15O15K15O15.2K15O15.3K15O15.4
K15O15.5K15O15.G
Infoi i the: ID
0 290 gi|5032279|*b|AAD38227.10 559 gi|30C3e91|einb|CAA18582.1|0 C370 301 ui|4C78207|8b|AAD2C953.1
2 821 gi|4104931|gb|AAD02219.1|0 53C ui|5732068|gb|AAD489C7.1|
rlap Identity781 [partial] (AF1472C4)
.9 (AL022537) putative; a p*eudogene A. thaliai A. tlutluuia
12983573212895
1502719493
110434
1904021815
275 40.2 [paeudo] (AC007134)tmnocriptH*e A. blialiun
774 93.5 (AF042190) auxin re«p271 80.9 (AF147203) contain* «i
cay transacting factor;
nwe factor 8 A. thaliauunilarity to nou.eu.e-mediated mRNA de-
K16F13 (19742 bp)
1 2 3 4
II I
GrailexonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrailexon
dedi
id«iit
iced genet
lfier D:irectioJ'ocition
0 3"No.Exc
of>u
No. ofEST
LeiiKtlIi IiSi
l i t
eq>niiHti on oi
IDii the i.uost dimlilar ceq
U v euencerlap lde nt:i t y Deri nitioii
K16F13.1K16P13.2K1GF13.3K16F13.4
K16F13.0
20594140 G623
11901 1322810206 17203
18298 19742
000
749420223
Ki|4iy3382
Ki|224489G|emb|CABl0318.1|Ki| 132021|*p|P2576G|
423210
63.483.9
(Z97338) H1
rat.-related
i S27 A.
; protein A. tliuliaiiuRGP1 (GTr-binding reguUtory |
RGP1)[partial] (AE000478) ORF, hypotljeticul protein Bicherichi;
K16E1 (33963 bp)
III IIIH Hi I
I
. .
4 5II
I I II I Ii I
GrailexonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrailexon
Identity Definition(AF064257) Dhml-Iike protein Hoiuo wpieiut(AF019630) patho^enicity protein Ma^naporthe grisea(AC002131) Contains similarity to BAP31 protein gb|X81816from Mus muyculuf. A. thalinua(Z97338) cytochrome P400 like protein A. th*li*n*(AC002340) putative cytochrome P400 A. th&li*itu(AC002340) putative cytochrome P450 A. tiiaJJana
K16E1.1K16E1.2K16E1.3
K16E1.4K16E1.5K16E1.0
504 61710713 75228521 9957
13907 1048810040 1822024211 20631
220 «i|4102999320 «i|3107943
252 Ki|2244893|einb|CAB10310.1|499 Ki|2880054|Kb| AAC02748.1|497 Hi |2880054|gb| AAC02748.11
697133150
491465465
47.429.947.0
58.300.455.4
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
38 Sequencing of Arabidopsis thaliana chromosome 5
K18B18 (35896 bp)
I I I I i l l II II I
i3 5 7
[Vol. 7,
1 2 4 6 8 9
i! ail
i i i i • i
GraiexonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein dbNtGraiexon
deduced genes
identifier Identity DefiniteK18B18.1K18B18.2K18B18.3K18B18.4K18B18.5K18B18.0
K18B18.7K18B18.8K18B18.9Kl8Bl8.ll)
911C118921C7721937419640
20405228022729634122
962512344176871944519945
22494235713014435870
10O00
010134
170151220
72102
521135707447
Ki|5921832|,p|Q39005|Ki|5541669|emb|CAB51175.11
AB005781gi|4078207|gb|AAD20953.1
8i|4582486|einb|CAA16923.2|xi|5541071|emb|CAB51177.11l!i|5541675|emb|CAB51181.1|Ki|5541C74[emb|CAB51180.11
12578
72101
433100C96444
100.059.5
100.034.3
43.550.564.867.9
(AC0045C1) unknown proteincopper transporter 1(AL096859) copper trwiwporte protei
tRNA-Asp(GTC)[p*eudo] (AC007134) putative non-LTRtrau*criptide A. tluduuu*(AL0217C8) putufive protein A, th*li*it*(AL0908&9) hypotheticwl protein A. tludit(AL096809) ^ubtili^ii-like pvoU:]n^- hum(AL090859) aubtiliwu-likg prottiiuwf Loin
muloK A,
retrolelcmeut
Ann, A. thalliu>log A. thaJiw
K22J17 (11211 bp)
2 34I
GraiexonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Proton db hitGraiexon
deduced genes
identifierK22J17.1K22J17.2K22J17.3K22J17.4
Dir
-
l'onitiuitti 5"
9660217821
10341
3 'C40
70869731
11177
No.Exon
2
11
No. ofEST
0010
147219637279
Sequence ID«i|2129515|pir
Ki|2827705|emKi|2827704|eu,
||S71174
b|CAA10C78|b|CAA10677|
Overlap
629256
;e
Identity
94.998.4
Definition
(AL021684) predicted protein(AL021684) LRR-like protein
A. tiudijt/m
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
K17N15 (81293 bp)
S. Sato et al. 39
•I •II III
I 111 i! mill! I l l
11n 12
I [ I III
II I
I! II, M II I II
I B
13 M e
rii i at n ». .«. ..in
Grail exonProtein db hit
ESTdbhft
Gene
Gene
ESTdbhft
Proton db hitGrail exon
deduced genes
identifiK17N1K17N1
K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1
K17N1K17N1K17N1K17N1
r Direction.1 +.2 +
.3 +
.4 -5 _.0 +.7 +.8.9 +.10.11 +.12 +.13.14.15 -.10 +.17 +
.18 +
.19
.20
.21 +
Position5
8415802
9008122581C507200872455525090300403837144382480805102055441590390415200322
08044739797078178825
3'28047912
117251418118413213902510030077308153938147185498005173057504020550537700740
73272709037782481028
No. ofExon
114
92201
1219
14515
1142
7330
No. ofEST
00
022300000302012
0000
Length
385581
592530540170204
1149800337C80292239419707182
78
1239501200337
1 kfomiatioii on the moot similarSequence IDKi|2944178gi j 2829890
g
gg
gggggggggg
g
K
g
|4454012|emb|CAA23000||0541707|emb|CAB51212.1|5541707|einb|CAB51212.1|
|4454013|emb|CAA23000||5903057|gb|AAD55C10.1|5174507|ref|NT .000923.1|4084340|gb|AAD20141.1|2583120|gb|AAB82029.1||0094555|gb| AAF03497.1|10400801 |gb|AAF13032.1||4454020|emb|CAA23073||4522009|gb| AAD21782.1||400983|»p|P311C4|3249066
|5302803|emb|CAB40044.1|5123925|emb|CAB45513.1|
g |170C101|«u|Q105C9|
sequenceOverlap
384570
533501505
1971135055300050282
-15238748815134
930490
278
Identity100.0G1.5
82.800.0G0.1
03.5C0.231.055.058.907.022.900.442.908.082.9
50.300.2
38.4
Definition(AF007778) trelialo(AC002311) highlyKp|X00033| 18591 A(AL035390) Pollen-(AL0908C0) pectiue(AL090800) pectine
(AL030390) putativ
e-G-phosphate photiplmtrt.se A. tfiaJiaiMiaimihir to nuxin-regulated protein GH3.
thaliatta••pecific protein precursor like A. thtdiiUiHutertwe-like protein A. thulium**ter»*e-like protein A. thulUn*
e protein A. thulimia(AC008010) F0D8.33 A. thalianamitochoudrial inter(AC007127) uukuo.(AC002387) putativ(AC010070) unknow(AF113G1G) iutestii(AL03039G) putativ(AC0070C9) unkno.00S ribosomal prote(AC004473) SimilaESTs gb|F15433A. thalia/ia(Z97342) disease re(AL079350) putativ
[partial] cleavage a
nedijite peptidn^eii protein A. thaJUnue receptor-like protein kinttue A. thuliaiian protein A. thaluttui»t mucin 3 Homo oHpu-m,e protein A. tltnliniinn protein A. thalituiuin L l l . chloropltwt precursor (CL11)
to S. cerevi^e SIKlP protein Kb|984964.Hlid gb|AA390158 come from tin* ge»e.
wtHiice RPP5 like protein A- thaliai,*e protein A. th*li*i,n
id poly»denyUtion specificity factor, 160 kd»ubunit (CPSF 100 kd subunit)
K17O22 (67720 bp)
1 • • = • - :
I Hii l l n • • • II
ii HI
11 O M B
I III
"I•+•
II III II I II • III I I II I
Grail exon
Protein db hit
EST db hit
Gene
Gene
ESTdbhH
Protein db hitGrail exon
deduced g . n «
identifie. ID Ov,,b|CAA18120|Lb[CAB53784.1
rlap Identity n.t,.:t;».[partial] (AL022141) putative dis ^sistance protein A. thaliana
protein rps4-RLD A. thaliana
ity to TMV resistance protein Ngb|U15000 from Jftctuuu giutiuoia. A. thahana(AC007918) Similar to gi|42G3048 TCA13.8 putative En/Spmtransposon protein homolog (mosaic protein) from A. thalianachromosome II sequence gb|AC000250.(AF128394) contains sirniiarity to Petunia PTTA' (GB:AF009510)A. thaliana[pseudo] (AF128394) similar to Antirrhinum majus (garden snap-dragon) TNP2 protein (GB:X07297) A. thalia/ia(AL02448C) lectin like protein A. thaliana(AL024480) putative protein A. thaliana(pseudo] (U27090) Fe(II) transport protein A. thaliana(AL02448C) putative protein A. thaliana[pseudo] (AL024480) putative protein A. thaliana[p»eudo] (AC007259) Hypothetical protein A. thaliana(Z97341) hypothetical protein A. thalianaras-rel»ted protein RHA1
K17O22.1K17O22.2K17O22.3K17O22.4
K17O22.5
K17O22.7
K17O22.8K17O22.9K17O22.10K17O22.11K17O22.12K17O22.13K17O22.14K17O22.15
28348098
12374
2001G8C0
1015814003
10292
18489 22319
230&239 565431125011052770547020179905291
257904053445742521275442158770032710C853
7231100354392
Ki|2901373|.gi|5823585|.gi|4203705|gb|AAD15391.1gi|3335340
305 «i|0272382|gb|AAF00088.1
0 108 gi|4325349|tfb|AAD17347
0 1277 ui|4325301|gb|AAD17349|
gi|3250093|emb|CAA19701.1gi|3250079|emb(CAA19087.1Ki 13532GGgi|3200070|emb|CAA19C83.1|gi|3200C74lemb|CAA19082.11gi|5734730[gb|AAD50001.1gi|2245012|emb|CAB10432.18i|400970|sp|P31582|
0000000
332294877593001
1308491200
7141112212380
292
840
1002931890124221320390199
05.053.1
90.2
40.2G8.0GO.372.301.044.138.0
100.0
(AJ249203) dis.(AC000223) putative dis.
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
40 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
K1L20 (47665 bp)
i iiiii i
I ilti! Ill1 2 34 6
• II I12
III
•i linn II ii ii i
Grail exon
Proton db hit
ESTdbhft
Gene
Gene
ESTdbnK
Protein db hitGrail exon
deduced get
identifier
l e e
b l r ectioiPosition
:i 5 3 'No.Exo
of No. ofEST
Lengthi Iyiifoneque
nationnee ID
on the iuost f imiktr sequelOverl*
• ICC
LP Idei.itity Ii -h mtiyliK1L20.1K1L20.2
K1L20.3K1L20.4K1L20.5K1L20.CK1L20.7K1L20.8K1L20.9K1L20.10K1L20.11
K1L20.12K1L20.13K1L20.14
00317800
10091130101400110089221003279237110
390834129840817
09048236
12147140971C022178282C8803389038308
397314272447427
224103
1421
324
10033001
1
000
1 0 0328
30699
4 4 9
3 0 1
2022 9 2
1037339418
89331147
K 6002270gi[1086086
„
«
|4400819303681016002270[0319186
J2901370006226710062200
euib|CAB62640.1|
Kb|AAD20127.1|eml>|CAA18000|enib|CAB62C40.1J8b|AAF07199.1
emb|CAA18122emb|CAB02637.1eijjb|CAB02030.1
|46C2629|t!l,jAAD26901.1|
49298031Kb, AAD34102.1 [12029680 Kbl AAC62808.il
270
30484
427360
2 8 09 2 4
282322
330-03
00.131.7
03.868.244.2
100.0
00.702.400.930.0
70.200.0
] (AL132980) liypotheti^l pri(U41007) »imil»r to G beta rePe»t» (PROSITE:PS00670)Ceeiivrhubditw elegiuiuIAC006201) unknuwii prottin A. tlmlitmti(AL022373) putative auxin-induced protein A. thuliuna(AL132980) hypothetical proteiu A. «,«««,.«(AF 1901 40) GDr-D-mauno»e 4.0-Jeliydrata.e
(AL022141) NAM like protein A. tlmliiuntIAL132980) putative protein A. thaliana(AL132980) traiwcriptiuu factor-like protein A. tUlwn(AC0072C7) putative leucine-rich repeat disease reciftance proteinA. tluiluuia
(AF102000) putative ainc fingt[partial] (AC002030) putativA. tlWi.™
!iu SHI A. tietl,yladeno
K1O13 (25275 bp)
I I III III I fl. ii
5 7 8 9 K
• • • • • IVWC6
i i
•II H I I I
Grail exon
Protein db hit
EST db hit
Gene
Gene
ESTdbhH
Protein db hitGrail exon
deduced gen«
identifieK1O13.K1O13.K1O13.K1O13.K1O13.K1O13-K1O13.K1O13.K1O13.K1O13.K1O13.
r Direction
-
-
-
+0 *1
0
6023742000073090086362237180621841010932726
3' Exc3008002969008991
13006130931010916810190802218820270
i i
8371
10132228
EST20204000000
090270242061487
72326280287287408
Sequence IDKi|4218144lenib|CAAl 0602.1Ki|6033849|nb|AAF19708.1Ki|4218144|emb|CAA10002.1Ki|0C33849|xb|AAF19708.1Ki|6633801|Kb| AAF19710.ilX72896
Ki|040C168|Kb|AAF09106.1Ki|6033800|Kb|AAF19719.1Ki|04001C8|Kb|AAF09100.1iKi|C456171|iibiAAF091O9.1
Overlap243
0G241484480
72
233233221407
Identity88.002.798.806.484.090.6
46.242.347.370.9
Definition(AJ132398) Klutathioue transferase. GST 10b A. thaliana(AC008047) F2K11.17 A. thaliaua(AJ132398) Klutathioue tranoferaye. GST 10b A. thaliaim(AC008047) F2K1I.17 A. tlujiarm(AC008047) F2K11.13 A. thaliauatRNA-Cy.(GCA) A. thaliaua
(AC011022) unknown proteiu A. thaliaua(AC008047) F2K11.4 A. thaliaua(AC011022) unknown protein A. thaliuua[partial] (AC011022) kiiieoin-like protein A. thaliaua
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
Xo. 1]
K20J1 (36243 bp)
I II I II Mil HI
3
12 9 V
I I I
:P3
S. Sato et al. 41
Grail exonProtein db hit
ESTdbhH
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
identifiei n r r Overlap Identity DefuJtTcK20J1.1K20J1.2K20J1.3K20J1.4
K20J1.5K20J1.0
K20J1.7K20J1.8K20J1.9K20J1.10K20J1.11K2OJ1.12K20J1.13
128411103100
53308069
12000140002310020200302063344035500
44921054497
71018580
13242202742308425G18318743432130243
1 96 X159330 166 X531750 350 gi|0067172|dbj|BAA88308.1!0 464 gi|5001734|6b|AAD37122.1
2 G09 gij0731761lemb!CAB525C2.10 116 Ki|304708G
Ki|6573707|gb|AAF17G87.1Ki|1871179|gb|AAB03539.1*i|4972060|enib|CAB43928.1Si[3757510|)!b|AACC4218.1gi|4850290|en,b|CAB43052.1
2000000
39C1487
150139502294210
96160102422
47100
70112484291190
100.0
41.1
50.8
83.3
09.2
4C.5
09.0
39.8
40.8
54.5
[partial] U4 snRNA P'&uuVI snRNA A. thalian*(AB028860) mDjlO M"« /(AF129M1) very-loiim-clidA. th*li*na(AL109819) extensin-like(AF058914) Minila.script Jact.himii. sec
•"fatty
: 72.31) A. thai™
(AC009243) F28K19.24 A. tli.Iiana[pseudo] (U90439) hypothetical protein(AL078470) putative protein A. thaliuuu(AC005107) putative disease resistance[partial] (AL049870) RPPl-WsA-like cA. tluJiuu
K21H1 (74342 bp)
ii i 11 11IIImii!i II
i if23 4 6
Illl I I
I17 19I •
t l
I • • • • • I • •7 8 10 tiO M B 16 18
II I
II III III SIS III ii mi i
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
deduced g
identifier
enes
Position
Direction 5' 3"No. of
Exon
No. of
ESTLen^tl.L I.
Siiformatii
equence lb:i the most similar sequt
Overl•nee
up hientity Dehnitioii
K21H1.1K21H1.2K21H1.3K21H1.4K21H1.5K21H1.0K21H1.7K21H1.8K21H1.9K21H1.10K21H1.11
K21H1.12
K21H1.13K21H1.14K21H1.15K21H1.16K21H1.17K21H1.18K21H1.19K21H1.20K21H1.21
777290101422215010184902083129329332353550837379
39199
412444301800034053920051501775640020595373176
8494100021020317817205972783431233352223070938722
40000
432504543051180080510140503422650740959474339
130
0•>.
00100100
2411293447361492210272300228448
Ki|82777|pir(|A34959Ki|0523044|emb|CAB62312.1Ki|0050412|(!b|AAF02876.1gi(0523042|emb;CAB02310.1gi|1086249|pir||S52709Ki|0010010|sp|O48603|gi|4000880|enib|CAB10798.1t!i[2911008iemb;CAA17068Ki|4006878|errib|CAB10790.18i|5091501|dbj|BAA78730.1gi!6523039|emb|CAB62307.1
434 Ki|0023039|emb|CAB62307.1
382302184051317423492782388
gil0300263|gblAAD41995.1*i|1351940|sp|P47927«i|0023037|e,nb|CAB02305.gi|0523034|emb|CAB62302.«i IG523033 lemb ICABG2301.(!i|C3235G0|ref|Nrj013G31.1Ki|400G882|emb|CABlG800gi|6522931|emb|CAB62118KiiG022929)einb[CABG2110
239833430901193
11686200120442
51.7
45.2
70.2
45.6
00.2
59.8
42.5
06.7
04.3
57.3
131330163G33310298454423
03.0
34.850.008.9G7.881.045.206.454.062.7
(AL13297C) putative protein A. thullaii*(AC009520) Unknown protein A. thaluum(AL132970) protein kinawe-like protein A. thahanasuUilisin-like protege - Alnut* glutittoaaDNA polyinerHye alpha, catalytic subunit(Z99707) putative protein A. thalian*(AL021961) putative protein A. thuliana(Z99707) MAP3K-Hke protein kina*e A. tlmliaiia,(AB023482) Hypothetical protein Oryza sxtiv*(AL13297C) anthranila.te N-hydroxyciimamoyl/bei^ilike protein A. tlialituiu(AL132976) anthranilate N-hydroxycinna.itiuyl/benz<like protein A. thaluui*(AC006233) unknown protein A. thttl'uuutfloral hoKieotit protein APETALA2(AL13297C) putative protein A. thalian*(AL132970) receptor protein kinase-like protein A. ,(AL13297C) putative protein A. thuliumYmlOSOwp(Z99707) UDP-glucuronyltrwinfertiwi-like protein A.(AL132978) hypothetical protein A. tlmliiui*'partial] (AL132978) putative protein A. thalian*
yltrau
yltran
lia/iaii
tluJia
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
42 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 1,
K21L19 (41087 bp)
ii m mi mi H I in mil m a n HI
i n %3 5 011 12 14
2I I I
III II
1 I
i IHi I
Grail exonProtein db hit
EST db hit
Gene
Gene
ESTdbhH
Protein db hitGrail exon
deduced genes
identifieiNo. LExon
No. ofEST
InfoSequel
K21L19.1K21L19.2K21L19.3K21L19.4K21L19.5K21L19.GK21L19.7K21L19.8K21L19.9K21L19.10K21L19.11K21L19.12K21L19.13K21L19.14
12473080871C
12492147301600018424197742204329490317073540838986
8012G987905
116011434010371183C518992214082907530691351393798140926
88862622
23644r.
1004
19
a0038030
287192
1189703199186632100477932196
1046748403
Ki|0063236|gb| AAF17212.1|Bi|4909712|«b|AAD34459.1|8i|4836896|(fb|AAD30099.1|Bi|4455192|euib|CAB36015.1|«i|4097561Ki|2497702|«p|Q46036|t!i|2827643|emb|CAA16097.1|
gi|0223646|8b| AAF05860.1|
Uil4186184lmblAAD09623.llgi|640G158|Kb|AAF09146.1|8i|4322670lgb|AAD16120|gi|539M42||jb|AAC27»3.2|
27.7 (AF111168) unknown Hc-uiu «.pien»990 650 (AC011622) putative di*etu>e resistance protein A. tlialiana473 20.0 (AF094008) d«ntin phosphoryu Home- „,«„,«402 96.8 [partial] (AF053941) uoii pljototropk hypocotyl 1-like A. thaiiaiia
K22G18 (45453 bp)
it: f il mil I All IIM I H 1
4 6 7 8
Illl III I
HI KMlit
II II I I
MTG10
Graft exonProtein dbhit
EST db hit
Gene
Gene
ESTdbhtt
Protein dbhRGralexon
deduced genes
identifierK22G18.1K22G18.2K22G18.3K22G18.4K22G18.0K22G18.6K22G18.7K22G18.8K22G18.9K22G18.10
K22G18.11
Uirecti.
__+_+
+_
+
jti 0'
8294098899C
1301714435159151G342217532719137107
42659
3"19317337
11928138121585210211209012C2773144241263
45453
Sqgi| 1707016|gb|AACC9127.11Ki|2342690xi|2702273|gbj AAB91976.1|«i|4204281d!i|5042416|gb| AAD38255.1|
ni|59O3057|i(b| AAD55616.1|ni|5903057|n;b|AAD5561G.l|gi|2443329|dbj|BAA22374|gi|932|«<ib|CAA37773|
lentity Uefanition36-9 (U78721) putative AP2 domain transcription factor A.60.3 (AC000106) Similar to Homo copme I (,(b|U83246). A.47.7 IAC003033) unknown protein A. tLiJimia24.6 (AC004146) Hypotl.etical protein A. llialia,,,40.8 IAC006193) Unknown protein A. thaiiona
64.3 (AC008016) F6D8.33 A. tfiajiwia02.4 (AC008016) F6D8.33 A. thaiiana88.5 ID8C122) Mei2-Uke protein A. tlialiaua32.3 (X53744) 68kDA xubunit of signal recognition
40.3 [partial] (AJ130878) GCN4-coniplementing
24857877623839799
11341112884005389 (GCP1)
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
K24M7 (73999 bp)
S. Sato et al. 43
I I m m i• •i in
3 4
UN Illl II II I I l l l l l l l l l II i RII 11
7 8
I
* B«17
• M lII i l l
22 2426
• I!II 111
intoII
1 II III I !
IllI
2021 23 25I
III II HI I I '
Grail exonProtein db hit
EST db hit
Gene
Gene
EST db hit
Protein db hitGraiexon
deduced g
identifier
enes
D!rectioiiPosition
0" 3"NuEx
. of No. ofEST
LeUKtli InfonSeque
nation oinee ID
it the inost »i:iiiiltir sequOvei
enceUp Uentity Dehi:litioii
K24M7.1K24M7.2K24M7.3K24M7.4
K24M7.5
K24M7.6K24M7.K24M7K24M7.K24M7.K24M7K24M7K24M7
K24M7.14K24M7.10K24M7.10K24M7.17K24M7.18K24M7.19K24M7.20K24M7.21K24M7.22K24M7.23
K24M7.24K24M7.25K24M7.2C
20778920
12050
10835
1932022523200322955529881322823324030005
4407047702510305353854817580176070004099CC01108042
705787200173296
82751112215044
18340
2145325010293302903030357329243478236894
47227510115296854263507000033163470654796822769483
719577268473994
535 gi|320776|pir| |S30782
912
1009017710
24.9 [partial] i «0
047«2
130142208190
530790239242438200486331308241
Ki|1170839|.p|Q04980|Kijll70840|»p|Q00738|
002 «i3335303
Bi 2341042 Kb AAB70440gi|4087550lgb|AAD25781.1gi|4587000|Kb|AAD25781.1X72897Ki|2980791|emb|CAA18167.1|
Ki|101752Ki|2129950|pir||S02700
Ki 4204278Ki| 1052971 |dbj|BAA17888|Ki|4335737|nb|AADl 7415.1
Ki|2980788|emb|CAA18104.1|Ki|4894914|Kb| AAD32C52.1|Ki|3337307|Kb|AAC27412.1Ki|2060G60Kil6003C81lKblAAF00M2.llKi|5732OO0|Kbj AAD489C5.1|
Ki|4914450|eXI4902
b|CAB43094.1
365003408
82129
212107
514244210
404259400270
24184
83 285.4
70.4
03.830.730.0
100.090.8
28.601.9
30.534.747.9
60.784.276.441.099.603.6
75.694.0
low-temperature-iiiduced 65 kd proteinlow-temperature-iuduced 78 kd protein (ileyyication-renpuiiniveprotein 29B)(AC004512) Similar to cytoclirome P450 K"|X90458 from A.timJiiuia. A. tluduuin(AC000104) F19IM9.26 A. (Juliana(AC006577) EST Kb|R64848 tome, from tbi« Kene. A. tlialiaiu(AC006077) EST Kb|R04848 comes from thin xene. A. thalijtnatRNA-Ser(TGA)(AL022197) actin depolymeriKinK factor-like protein A. thallium
(L03710) cnjB Tetroliyweu* ther/nopiiilaphotoaffiiiiilate-responsive protein PAR-lc precursor - common
(AC004146) putative Cytwhrome P450 protein A. tluduuw.(D90910) hypothetical protein Sytu^hucyvtin up.[p»eudo] (AC00C248) unknown protein A. thuliHiiu
(AL022197) putative protein A. tlutliMiiu(AF139188) HCFlOfi A. tlntluuia(AC004481) hypothetical protein A. tiinluui*(AC002342) hypothetical protein A. tlurfitui*(AF187871) fibrillarin homolog A. tliaJiaua(AF147203) contains similarity to Mediuago truncatula N7 protein(GB:Y17013) A. tludiaiu.(AL050400) fibrillarin-like protein A. thxlut,,*tRNA-Leu(CAA)
K2K18 (41465 bp)
m
•4 "s
.•lilllf
:'f It•11 m 1 ii
1 HII II ! II I I
Grail exonPratoin db hit
EST db hit
Gene
Gene
EST db hit
Protein db hitGrail exon
deduced genes
identifierK2K18.1
K2K18.2K2K18.3K2K18.4
K2K18.0K2K18.6K2K18.7
K2K18.8
DinKCtlOIl
-
+
_
0"1461
74851330110777
214812771030523
39220
3'4740
103521520519125
230092970037302
41321
Exo.9
4
9
19
4
EST0
00
15
000
0
662
825404246
409CGI
1400
570
Uverltip Idei«i|4890180|8b|AAD32773.1| 407 63.0
Ki|2505011|gb|AAB81881| 052 36.9Ki|4006829|Kb|AAC95171.1| 4C3 73.9Ki|2511088|emb|CAA74020.1| 244 100.0
Ki|4193320 402 08.0Ki|4086021|Kb|AAD20640.1| 77 79.0Ki|4263831|Kb|AAD10474.1| 373 81.8
Ki|4203830|KblAAD10473.1 383 02.3
(AC0070C1) tfiinilnrA. tliaJuum(AC002983) putative MuDR-A-like tIAC005970) uutHtive protein kiiiH*e(Y13C91) uiulticHtHlytic eiidopeptidnponent. ttlpIiH subunlt A. tltalittiut(AF045473) histone dencetylaae Zea tnayu[pt-eudo] (AC007170) cytoplasm^ wtoiiitnte hydrtit[pseudo] (AC006067) putative retroeleinent poA. thaliuia|p»eudo| (AC000067) liypotbetical protein A. th«li»
from Nkvtituut IHIHUUIH
ponuu protein A. tlmlitutu
•mplex. prote»»o..ie com-
1
! A. tlUllUt
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
44 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
K24C1 (29498 bp)
m
"e \
I I I I II II
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhtt
Protein db hit
Grail exon
deduced
identifier
genet
1>
i
i ructionPosition
3'No. ofExon
No. ofEST
LeuKth Infori,Seque
rmtin t e
on 01
11)
ii the movt *imiUrOve rLp Ideiitlty Defiulition
K24C1.1K24C1.2K24C1.3K24C1.4K24C1.0K24C1.6K24C1.7
29007108
11058123782120827839
46001007G1101010318241002833C
gi|1786140[dbj|BAA19113| 346509 Ki|2982461|emb|CAA18225.1] 424973 Ki|34:>10C9|emb|CAA2046S.l| 918180 «i|3953470 1079498111G6 gi|56C8608|Kb|AAD45979.1| 165
28.0 [ptirtml] (AB000454) PEThy40.0 (AL022223) putative protein A.37.1 (AL03132C) hypothetical protein39.2 (AC002328) F22O2.21 A, th*li*ii
34.3 [AF110334) MenG
K2N11 (30340 bp)
HI III •H'i !
II I
MR A!
4 5 67 !
IIi i aimm in HIM
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhtt
Protein db hit
Grail exon
deduced genesIiifoi nation )»t S
identifierK2N11.1K2N11.2K2N11.3K2N11.4
K2N11.0
K2N11.0K2N11.7
K2N11.8K2N11.9
L)irection
+
+
0"1
66871049611380
10481
1839020913
2244020302
3 '01309907
1090813947
17032
2061321801
2404929303
Sequeqi|2832632|emb|CAAl 0761.1
921 gi|0791481|eiiib|CAB53025.1|104 gi|4836929|gb|AAD30031.1797 gi!0732431|gb|AAD49099.1
308 8i|4830700l8b|AAD30233.1
262313 gi!4044460|gb|AAD22308.1
2 268 ui|4400818|gb|AAD20126.10 1334 Ki|0734736lKb|AAD00001.1
701 731) (AL021711) hypothetical protein A. thulinna834 80.1 (AL110116) putative protein A. t/iahaii*.90 77.1 [p«eudo) (AC006085) Hypothetical protein A. tlialiuu
043 90.0 (AF177030) contain, .iuiilarity to maize traiwpi(GB:M76978) A. (Indian*
189 43.2 (AC007202) Containtt similarity to Kb|AB01709tor (WERBP-1) from Nkotitum tulxtuuii,. E S T PKb|T41870. Bb|H38232 and nb|N38320 come fromA. thaliana
310 07.9 [p»eudo] (AC006092) putative non-LTR retroelement»cripta»e A. tl.aliana
163 61.6 (AC006201) unknown protein A. ll.aliai.a1310 43.1 (AC007209) Hypothetical protein A. thaJiana
Kb|H39299.
K5A21 (13874 bp)
I I I
I- -!—g
1 2I 1
i i HIIII i miai
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Prateindb hit
Grail exon
deduced genes
No. of No. ofExon EST Overlap Identity Definitic
K0A21.1
K0A21.2
1 7141
9218 13792
22
18
3
4
773
621
Ki]113336|»p|ri7427
Ki|113337|«p|r>18484
709
566
41.4
38.3
partial] alpl.alph»-C lar«e
-adaptiu C (clathrin ac h a
neiiibrune tutnptorpHrtinl] tilph,lph»-C lar,-eiieinbraue ad
-adac h a
>ptor
n) (100 kd coatedHA2/AP2 adaptiu
ptin C (tlatlirin a11) (100 kd coatedHA2/AP2 adaptin
aembly proteivesicle proteialpha C Bubuseiiibly proteivesicle proteiialpha C «ubu
c)
•it)
CJ)it)
u.plex 2(plasma
ii.plex 2(plas.na
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
K5F14 (31178 bp)
in mi I I II iHim:.
II 1 1 I I
3 4 S 6 78 9 It l111 I I
1 . I - >hI H III III I III II
S. Sato et al.
Grail exon
PicAundb hit
EST db hit
Gene
Gene
ESTdbWt
Protein db hitGra« exon
ident
iced gi
ifier
ene«
Di rectiun 5' 3 'No.Exo
ofIi
No. ofEST
LeiitftlWUKti IL) O v i :rlfip ntity I)eh ilitiuii
K5F14.1K5F14.2K5F14.3K5F14.4
K5F14.5K5F14.CK5F14.7K5F14.8K0F14.9
K0F14.10K5F14.11
7451901913280
140211735522479232C824101
2882430195
402191711190013078
1509020081230332371120985
2931831178
74G gi|1170021|sp|P4G870234 gi|2244797|euib|CAB10220.1|030 ni|4539330|eiiib|CAB37483.1109 gi|703400
428 gi|491433G|gb|AAD32884.1|002 gi|491433GJgb|AAD32884.1|185 gi|491433G|t!h|AAD32884.1123 «iil402902|eiiib|CAAGG907|703 gi|377G5G7
Ki|112737[sp|IM5457*i|2190559
201434-81
32204012100
509
15785
37.038.934.1
43.344.932.871.237.2
48.705.1
n-like protei(Z97330) hypothetical prob(AL030039) putative proteiIL20329) multiple banded i
In A. tluilia/utI A. tlwluuui•itigeii Urmplmiu, nJj-tk-UIL
[p»eudu) (AC005489) F14N23.22 A. t luj iuw(AC000489) F14N23.22 A. theliuiia(AC005489) F14N23.22 A. tlialiuiajp.eudo] (X98323) peroi!ida.-.e A. tlialiaiw(AC005388) Strong umilnrity to F21B7.33 ni|28092C4A. tluJimia BAG gb|AC002500. EST tib|NC5119 come, fromgene. A. tiiuiiaftu2f ceed storage protein 1 precursor (2s albumin storage proti[p^eudo] (AC001229) F5I14.1C A. iiiaJiana
from, this
K5J14 (59762 bp)
I I HUB HI ! I III Mil m M l I M \• I 1
111 I HI I I1 ^ 2 3 4 S 6 7 9 11
I I I I
III
" s io
i • i i mi
Grail exon
Protein db hit
EST db l i t
Gene
Gene
EST db Fit
Protein db hitGrail exon
deduced g«i
ident fierK5J1K5J1K0J1K0J1K0J1
K0J1K0J1K0J1K0J1K0J1
.1
.2
.3
.4
.0
.G
.7
.8
.9
.10
K0J14.11K0J14.12
Direction 0*"~+ 0320+ 11437+ 20293+ 272014- 32031
+ 3720G+ 41203
44050+ 50244
52108
+ 5440457198
No.5 ^ Exo
753915990224082947230991
3877043G43400015057352370
0G77909389
of
1710109
10311
5
10
No. ofEST
0
004
20025
00
Length
204902409473390
007704080110
73
080370
Information on the most sinSequence IDgi|C098300[gb|AAF18094.1gi|3983139gi|4733981 |gb| AAD280C2.11Ki|4733981|Kb| AAD28CC2.1|gi|913445|bbs|lG0507
gi|3128187|gb|AAC10091.1|gi|5734790|gb|AAD50000.1gi|42042G9
gi|1820G40
gi|3289002|gb| AAC25099.ilgi|378C502
ilar sequencOverlap
183133404402374
490084032
71
022244
Identity38.038.873.478.870.9
80.381.203.7
70.4
04.720.5
Definition
(AF098011) Scythe XenopuB luevie(AC007208) putative serine carboxypeptidase II A. tluditui*(AC0072G8) putative serine carboxypeptidase II A. thalianit(S75487) alcohol dehydrogenase ADH=alcohol dehydrogenase ho-uiolog EC 1.1.1.1 LycopenaW. escutoitum(AC004521) putative beta-|(lucosidase A. tl.aliar.a(AC007980) ATP-dependent inetalloprotease A. tlialuiw(AC005223) 04111 A. Mujiaria
(U88173) weak similarity to A. thaUana ubiquitin-like protein 8
(AF073522) CRP1 Zea may.(AF098994) similar to ainc carhoxypeptldascs (Pfam:Zn_carbOpept.hmm. score: 259.73) CnenurhitfjJititf e/egane
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
K6A12 (64136 bp)
Ill ! • • MM I • • II • • • I I I IIlil i =•
I Hi 1 I i2 3 4 6 7 « 1213 14 SMP-\ • ^ B • | • I • • • •
I I I I 1 I I II B U I ! I I I
GcaNexonPraieindbhit
ESTdbhtt
Gene
Gene
ESTdbhtt
Protein db hitGraflexon
deduced genes
identifierK6A12.1K6A12.2
K0A12.3
KGA12.4K6A12.5K6A12.CK6A12.7K6A12.8K6A12.9K0A12.10
KdA12.llK0A12.12KGA12.13KGA12.14K6A12.15K6A12.10
PositionDirection 5 ^
1+ 10553
+ 15871
4- 2583727949
+ 31333+ 37524
4079243356
+ 45972
47401+ 50320+ 52208+ 53959+ 56535
01210
No5 ^ Exc
109613300
20242
27G01298823225340424419214550747179
486805158053591562225801402850
ofn
59
9
6727244
433231
No. ofEST
11
0
0022000
110000
Length
251713
1027
340215210718357515269
289361423723335547
Information on the moat similaSequence IDgi|6175145 gb|AAF04872.1gi|2462833
gi 13850588
Ki|2129C98|pir||S01760«i4938485 enib|CAB43844.1|Ki|4220518|emb|CAA22991|Ki|3025124gi|5903000
sp]P74523|gb|AAD55615.1
lii|3122952|,p|O15730|gi|2499569|>plQ42539|
Ki| 1070305 pir||S53492Ki|2944440gi|0061974|emb(CAB62440.1gi|3004555|gb| AAC09028.ilgi|4914341|gb|AAD32889.1gi|6016698|gb| AAF01525.1
r sequenceOverlap
207080
1016
33313820713034944G13G
288360412480201511
Identity51.931.1
45.1
70.946.857.735.942.932.472.3
70.271.538.024.932.852.3
Definition[partial) (AC010790) unknown protein A. rhaliana(AF000657) highly similar to froha and frohb. potential frolicA. UuJIuia(AC005278) Contains similarity to gb|AB011110 KIAA0538 pro-tein from Homo sapiens brain and to pho.pholipid-biiidiiiK domainC2 TFI001G8. ESTs gb|AA585988 and gh|T04384 cuine from tin.Kene. A. chalianaprotein kinaae ATNl (EC 2.7.1.-) - A. thaliana(AL078404) putative protein A. thaJiana(AL035356) hypothetical protein A. (ha/wiahypothetical 17.7 kd protein slrl419(AC008010) F6D8.29 A. ihalianatipd proteinprotein-L-isoaspartate O-methyltransferase (protein-beta-aspartate inethyltransferase) (TIMT) (protein L-isoaspartylmethyltrausferase) (L-isoaspartyl protein carhoxyl methyltrans-
ferase)RNA-binding protein cp31 precursor - A. rlialiana(AF05075GJ cysteine endopeptidaae precursor Riciniw conmiuriia(AL132979) putative protein A. rliajiana(AC003673) putative salt-iuducible protein A. thalia/ia(AC005489) F14N23.27 A. thaiia/ia(AC009991) hypothetical protein A. thaiiana
K6M13 (77129 bp)
mill i i II
i1 2 3 5
I I •
i i i uim M I i n i 111
US 16II I
•1 II II
I • • • • •9 on eIII I f i !ii III ii i n II II i
Grail exon
Praieindbhit
ESTdbhtt
Gene
Gene
ESTdbhtt
Protein dbhttGrail exon
deduced genes
identifierK0M13.1K6M13.2K6M13.3K6M13.4K6M13.5K6M13.6
K0M13.7K0M13.8K0M13.9KGM13.10KGM13.11K6M13.12
K6M13.13
K6M13.14
K6M13.15KCM13.10
KGM13.17K6M13.18
Directio,+
+-++
_
-_
+
-
Tositionti 5"
15957
10473127041587420035
301943607939876418504610947219
58971
02745
0388105925
6659672639
3 '2G940436
11327153511759828009
303533938440754457474693051270
61958
03257
0469006437
67G7274529
No.E x t m
1011903
1133
171
17
9
1
21
13
No. ofEST
070G00
00512'•>
0
0
00
00
Length
380100285497195399
ICO689159647274721
G95
171
128171
359307
Information on the most similar secSequence ID 0"vgi|6553906|gb| AAF16572.1gi J2352828gi|2191196gi|1711510|»p|r49966|Ki|4507873|ref|Nr>j003363.1gi|2275204|gb| AAB63820.il
X53175Ki|6049274|Kb|AAF02535.1
gi|2749982gi|4835230|emh|CAB42914.1gi|4049518|emb|CAA21253|
gi|3152572
gi|6598399|gb|AAD03505.2|
Ki|42G2175|Kb|AAD14492|Ki|4337175|Kb|AAD18096|
gi|0598344|gb| AAF18592.1gi|3941510
erlap379148120490102179
50402
543262298
462
161
77142
280211
Identity83.2
100.034.794.044.247.2
91.525.8
50.065.840.5
39.5
51.9
94.941.3
25.689. G
Definition[partial] (AC012503) putative protein kinase A. tluiliium(AF009228) NaCl-inducible Ca2+-bindiuK protein A. Minium*(AF007271) contains a MADS domain A. tliiliarmsiKnal recognition particle 54 kd protein 2 (srp54)von Hippel-Lindau binding protein 1(AC002337) putative WRKY-type DNA binding protein
Ul suRNA A. thaliiui*(AF151390) Sex-lethal interactor DroBopliiJa iimltuioguuUr
(AF030705) Similar to phytoene desaturase(AL049862) putative protein A. thaliana(AL031852) conserved hypothetical protein Sciiixonacch&rotnyces
(AC002986) Contains homology to DNAJ heatshock proteingb|U32803 from Hanrioplnlua uirluetiiae. A. tnaliaru(AC003952) putative I IOI I -LTR retroelement reverse transcriptase
(AC005508) 12894 A. thaliana(AC000416) ESTs gb|T20589. Kb|T04048. gb|AA597906.gb|T04111. Kb|R84180. gb|RG5428. gb|T44439. gb|T7G570.gli|R9O004. gb|T45020. Kb|T42457. gb|T20921. gb|AA0427G2 andgb|AA720210 come from this gene. A. thaliana(AC002335) hypothetical protein A. tlialia/ia(AF062909) putative transcription factor A. thaliana
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
K7J8 (56963 bp)
II III
S. Sato et al.
Ilium III I nl i i I • • I I •
I I I (II4 5 6 7 8 " ? P -J*IBM 11 , 1,1 I • •
IBBBi1 2 3
• • I I I• ! • n
• - r ' iII flBlllllll I \
47
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
identifierK7J8.1K7J8.2K7J8.3K7J8.4K7J8.5K7J8.6K7J8.7K7J8.8K7JS.9K7J8.10K7J8.11K7J8.12K7J8.13K7J8.14K7J8.10K7J8.16
Directi._--+
++--+
+_+
J:I 5'1
19924290
10110176732011023247250092078529449389314102044253471505192155724
3 '114630868767
16635191832210524080258562827837562390024150844687503005372050959
No. ofExou
No. olEST
Lens,l,rnr~ Overlap Identity
(AC002292) Hypothetical protein A. t h a u W(AP000492) hypothetical protein Oryia. aativa(Z84377) xylosidase A«per#i/iu. fliger(AL049558) hypothetical protein Scl,i^uaccltnro,,,yca(AL049558) hypothetical protein SchiwjaaccfiJirorriycet(AF010448) No definition line found CuMorlwIxlitn <
(AC000267J putative polyprotein A. t^^iana(AC002535) putative WD-40 repeat protein A. thatRNA-Hi . (GTG)
(AL023094) bZIP transcription factor ATB2 A. tin[AJ243483) ATP citrate lyase Cyanophora pariuloxi[pneudo] reverse transcriptase - A. tlialissiu retrotrm[partial] (AC012563) putative protein kinase A. bh;
3022987741182184222801453081570
090300
72181145608002348
ei|2402745l<i|5922608|dl>j|BAA84009.1|gi|2181180|etiib|CAB06417|(!ii4581502|einli|CAB40161.1|«i |4581502|emb|CAB40161.1|*i|2315451
«i|4689454|glijAAD27902.1|gi|0598380|gb|AAC02845.2|X153C1
(•it3096928|emh|CAA18838.1|«i |5304837|emb|CAB40077.1|niJ2129709|pir||S65612Ki|6553900|Kb| AAF10572.1
201104066
94164131
72792
72
128603487347
38.135.437.928.530.3
90.450.1
100.0
38.003.980.168.4
ispoayn T a l l - 1
K9E15 (62052 bp)
K18C1
•1 I
2 3
nil i a iI M
! ' 6 " .
i i
B I IB7 9 B
Ml I • I! • • i n i l i l l M a n i i )•••
Grail exonProtein db hit
ESTdbntt
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced
identifier
genes
D rectlonPonitioii
3 'NoEx
of No. ofEST
Leiifftl InforiSeque
IHt
ice
OH O
IDj the most fimihir seq
O vueucerl*ip Identity b efjn tlon
K9E1K9E1
K9E1K9E1
K9E1K9E1K9E1<9E1K9E1K9E1
K9F,K9E
t 23
.4
•5
.07
.8
.910
.11
5.1?5 13
3375773512815
1593720160
283203150035094408344381545102
74801176513951
1807822010
309433355038151415934451148002
12610
1217 gi|5459305|eniLi|CAB50708.1|1187 Ki|2901373|euib|CAA18120|
101 8i |3033379|gb[AAC12823.1|
478
391546
229198
gi|3080375|emb|CAA18632.1[gi|2191180
Ki[3080371|euib|CAA18028.1Ki|6552736|Kb|AAF16535.1|Kii33866O0JsbJAAC28536.11
gi|4510373|gb| AAD21401.ilH7496\d\i\B\M3l
5043801908
5230761985
463 Ki|3040815|26
mb|CAA10713.1
1160 52.8 (AL022141) putative diyease resistance proteii129 58.5 (AC004238) putative WRKY-type DNA
A. tlialuuut508 69.9 [pseudo] (AL022580) putat ive protein A. t U u u u252 30.2 [pseudo] (AF007271) contains simiUrity to tropomyosin
ne>iu A. tiuiliana387 76.5 IAL022580) putative pectinacetyle.terase protein A. tfudi.
60 40.9 (AC013482) T26F17.19 A. t h j u u u578 49.4 (AC004605) putative beta-amylase A. tlinliuiiH
133 23.1 (AC007017) putative harpiii-induced protein A.\V, V>& WJ\«S-| iuita W Sintarour)i» tm.i
SWISS- rROT Acce»»ion Number T4S978 S
A. tliniiubinding
thalUuaSCD6
protein
and ki-
p88.9 (AL021087) cytochr P450 A. t/iaiiana
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
K9H21 (15319 bp)
I I ,"i
! • • I1 2 3 51 II I
IHIII i H I
Grail exonProtein db hit
ESTdbhH
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced ge
identifier
nes
Lt:irectioiTo
5' 3 ' Exon ESTLengtl
Sequenice IDti similar ne<
O v erlap Identity Dehi litionK9H21.1
K9H21.2
K9H21.3K9H21.4K9H21.0K9H21.CK9H21.7K9H21.8
415058077192
110131193014657
575060448097
114571370515248
0
201(100
009
477172302104012140
gi| 1033072
Ki|6010737|Kb|AAF01003.1
ni|0023082]enib|CAB02340.11
Ki|6382035|Kb|AAF078)7.1|Ki|1003724
329
370
281
547133
23.9
38.0
58.9
01.057.5
dipho*phate kil(NDP kiniue II) (NDPK II)(U52004) Herpe.viru. «aimiri ORF73 liomoluij K«po«iV
(AC009325) hypotheticsl prote
(AL133315) hypothetical protei
(AC011020) putative protei[partial] (U50846) 4-coun
A. tlialiu
A. thaJitt
K9P8 (70670 bp)
ii miII
nn i
11
H':II
4 6 7 9 «
•mi i i B i a i i iiiiniw i i m mi
Grail exonProtein db hit
ESTdbhft
Gene
Gene
ESTdbhft
Protein db hit
Grail exon
deduced gene
intity Definition
K9P8.2K9P8.3K9P8.4K9P8.5K9P8.CK9P8.7K9P8.8K9P8.9
K9P8.10K9P8.UK9P8.12K9P8.13
1220018452271983343030000382234542449834
54130091730438307827
1773222103319793025237543420904938953012
07047028070049170040
107
1082
147
14
12133
12
0
02030040
0130
001
720779928718288
1080084037
824030019002
gi |4220458
Ki|441C407|Kb|AAD20309|Ki|100003CleiKi|4539009|eiKi|5051770|eiKi|5541007|eiKil3859083|eiKi|5051775|e,
inb|CAA70310|inb|CAB39030.1mb|CAB40063.1mb|CAB01173.1ii,b|CAA22020|.iib|CAB450C8.1|
gi|0091014|>!u|AAD39002.1
Ki|80783|uir||JL0032l!i|3237304Ki|0051781|e,Kil4249382lK
mb|CAB45074.1|blAAD14479
582
028778839710141074408525
270215008498
07.9
24.088.240.088.043.729.901.053.0
20.404.084.069.7
jim BAC «l>| AC002294. Ar*bi<lopviM
o tsapiet
(AC00621G) Similar toprotein homolug from Athuliuna(AF123318) mitotic checkpoint protein Ho(Y09095) cliloride chtmnel A. thulium*(AL049481) putative protein A, thaiimia(AL078037) hsp 70-like protein A. thulimm(AL090859) putative proton A. tlmlituiH(AL033503) conserved hypothetical protein Ouulid* Maun[Ac] (AL078037) putative protein A. tltnlitut*(AC007454) Contains « rF|00501 alplm/beta i.ydroWe fold do-main. A. thalituttihypothetical 31.7K protein (aphE region) - Streptomycev grixeus(U915C1) pyridoxine 5"-phoephate oxidate fUttut* iiorvvgk-ua(AL078037) transport inhibitor reKpon^-like protein A. tludiiui*[partial] (AC005966) Strong similarity to gi|3337350 F13P17.3 pu-tative permeate from A. tlmliatia BAC gb|AC004481. Aru.bi<li>p*i*
l
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
MAB16 (70475 bp)
I I I III III I Till: I
S. Sato et al.
i l l l l 1I.. I
49
MEE132 3I •_
'« \
IIIIII I
11 Oi III H
III I I
Grail exonPrcteindb hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced gei
identifier
l e s
b i r ettionI'o dition
b 3 'N o .
Exoiof No. t
ESTii Leii^tl
Seqiiriimti (III Oi
IDi the mot•t uiuiiltt
0;quenceverl»p I d e ntity Definition
MAB1C.1
MAB1G.2MAB1G.3MAB1G.4MAB1G.5MAB16.CMAB1G.7MAB16.8MAB16.9MAB16.10MABie.l lMAB16.12MAB1C.13MAB1G.14
955514190179902050923125295413442837347460495839GG12180433068244
10400150851878G22742243353302734943383915055460201631066752168078
0000000001200
1531412 3 04 7 7
2 1 3
11401 4 0
3187294 2 04 5 0512145
gi|5732430|«b| AAD49098.1|
si|3892701|emb|CAA22150.1 73 33.8Ki|G587850|gb| AAF18539.1 115 73.3«i|4640194|gb|AAD20867.1| 211 40.1Bi|4512G70|Kb|AAD21724.1| 432 31.48i|1001C8G|jbj|BAA10421| 01 45.2gi|99721|pir||S05465 1104 62.4Ki[4512670|gb|AAD21724.1| 138 42.4Kii5915851|np|Q42569| 277 29.5gi[4836882|8b|AAD30585.1| 722 76.8Ki|4512C51|8b|AAD2170C.l| 408 49.9gi|lG530G5|dbj|BAA18577| 277 68.7Sil0598853isb|AAF18707.1| 440 90.78i|6561951|emb|CABG2455.1| 87 47.7
MuDR.(AF177030) contain, . imil . r i t , to ul»i»« tr.u.po.oi(GB:M7G978) A. tlmli&itx(AL033545) hypothetiojil protein A. thulium*(AC006551) Hypotbetititl protein A. thuliaiia(AC007230) T23K8.1 A. tiialiuia(AC006931) putiitive tytochrome P450 A. thulium*(D64002) hypotbeticjil protein Syitechtxyutia sp.retrovirua-related polyprotein - A. thuliunu retrotriuiypo.on Tnl-3(AC006931) putative tytothroine P450 A. thuliutiuCYTOCHROME P450 90A1(AC007260) ldlpttj.eq No definition line found A. 6li«li«n«(AC007048) putative tyrosine jiminotrHns.ferape A. thuliunu(D90915) peptide chnin release factor Synechocj-Btie ap.(AC010556) putative ferine carboxypeptidiwe A. tha/iana(AL132964) hypothetical protein A. thaliana
MBM17 (52717 bp)
nun i in i M II
• • ' I
IHIWII III II I I
6 8 9 1)
1I I I
i mini
1112I I
III I
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
idei
uced
titter
genes
D i r ctiuTon! ti on
i 0N o . A No. i
EST\ Le iKth Infon
Sequemtioice I
1 O
bi the mo t niiiiilwr yequ
Overlaentep MKI tity b e h i ition
MBM17.1MBM17.2MBM17.3MBM17.4MBM17.5
MBM17.GMBM17.7MBM17.8
MBM17.9MBM17.10MBM17.11MBM17.12MBM17.13
6569129
1437718280
257203311536125
3951042787449304666051088
7350125121734924528
327363503937922
4185644598403024815752717
2 42
102 0
30107
88034
66920
1102705
1053
Ki|2501242|s'p|Q13472Bi|2924777|(!b|AAC04906.1Bi|3540207Bi|4185142|gb|AAD08945.1
1081 gi|3913525|»p|O48901|387 Ki|601673C|gb|AAF01562.1353 gi|3913518|«p|Q42546|
353 «i|2765607|emb|CAB05889347 gi|2705607|emb|CAB05889246390 Ki|3236254|gb|AAC23042.1402 8i[33373Cl|t!b|AAC27406.1
8351060681608
1044333332
337346
395401
46.259.650.034.6
81.360.288.0
64.295.4
75.044.8
[Partial]DNA topoi»o.nera»e IIIIAC002334) putative receptor-like protein kina»e A. thIAC004260) Putative protein kina»e A. tlwliaiia(AC005724) putative SNF2/RAD54 family DNA rep,combination protein A. thuYmuuDNA polymeraae delta catalytic chain(AC009325) unknown protein A. thaJiana3'(2-).5'-bi.pho«phate uucleotidara (3'(2').5-bi.pho»phu3'(2>pho»phohydrola.e) |DPNPa«e)(Z83312) 3'|2').5'-bi»pho»phate uucleotida.e A. thuliu,,,(Z83312) 3'|2').5'-bispbo.pl1ate nucleotidate A. thalia/,.
(AC004084) unknown pro[partial] (AC004481) unk
A. t
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
MCK7 (87090 bp)
I I inn • mil in i mill 11minimiw ni I la II II
n i
illI I
• I I9 n 13
I U • I I I •(I t IIIII II
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protem db hitGrail exon
deduced genes
identifierMCK7.1MCK7.2MCK7.3MCK7.4MCK7.5MCK7.0MCK7.7MCK7.8MCK7.9MCK7.10MCK7.11MCK7.12MCK7.13MCK7.14
MCK7.15MCK7.1CMCK7.17MCK7.18MCK7.19MCK7.20MCK7.21MCK7.22MCK7.23MCK7.24MCK7.25MCK7.26MCK7.27MCK7.28
Dir ection
_
4.
_
_4--4-
+
_4__
4._
___
51
33477071
13392171391950222733260852767729240314803268G3420635290
4046G42283459494825850850531415573558033632106648670223728867848882791
3"31185701
125901622018282224882509727170289893111932299334973505740160
4179044241479904913052766554155730660806640976833971059743728009086992
Ext i n
171
13867734G434
18
60222
123718233
10
EST00202060102431
015
100
04000000
512780
1307750199552254283324424100211209728
337408654203589442466571296406404316325931
Sequence IUgi|5391442|gb|AAC27293.2|gi|4115383|gb|AAD03384.1gi|4559347|gb|AAD23008.1gi|4400192|emb|CAB30515.1gi|4097561gi|4204205gi|1001650|dbj|BAA10381gi|135532|sp|r23253|gil4455704|einb|CAB36617.1|gi|3122387|sp|O22407|gi|4003719|ref|Nrj»2003.11gi|G130546|sp|r72777gi|1653230|dbj|BAA18145|gi|5734021|dbj|BAA83352.1
gi|1709798|sP|r54778|gi|2700839|gb|AAB95307.11gi|2245024|emb|CAB10444.1gi|19463C9|gb| AAB03087.1|gi|462579|sp|r2152S|
Overlap Identity Definition[partial] (AF053941) non pliototropk- hypocotyl 1-Hke A. thali*(AC0059C7) putative receptor-like protein kiutu>e A. th*U*uu(AC00C&8S) hypothetical protein A. tlutlltuia(AL035440) putative protein A. thttliana(U64918) ATGPl A thalum*(AC005223) 45043 A. t/utWwt(D64002) hypothetic! protein Syttt^Uocyutla up.tfiah'dtife (neumuiinidnxe) |NA) (major *urfnce nutigen)(AL03M78) hypothetical protein SC2G&.30 Streptvmyct* cuelicuWD-40 repent protein MSI1fragile hltftidiiie triad geneYCF54-like protein(D90912) hypothetical protein Syued'ocyutw op.(AP000391) EST* AU067992(C11433)rAU077424[C11433) confpond to a region of the predicted gene.
26» protease regulatory HubuuH Ob houiolog(AC003100) putative receptor-like protein kiiuute A. thaliniia(Z97341) cyanohydrin lyase like protein A. thalUua(U93215) unknown protein A. thuJiaiutinalate dehydrogenafe. chloroplaet precursor (NADP-MDH)
gi|6580145|emb|CAB63149.:gi|1940370|gb|AABC 3094.1gi|0225108|>p|Q9ZG89|gi|3250035|euib|CAA74046|gi|5381253|dbj|BAA82306.1gi|5381253Jdbj|BAA82306.1
51170048369719827020113214942310197
141491
400594244544394
53464
129437299313
100.000.601.047.080.447.041.137.030.797.047.448.045.174.4
92.848.448.630.785.0
52.907.740.958.064.366.2
(AL1329G8) MAr kinas.(U93215) unknown proteinGTP-bindinK protein CGP/(Y14274) putative *erine/tli(AB027752) peroxidase Nk,(AB027702) peroxidase Nio
A. thuiianui> A. thaJia.
* t*}jiu;u[n
e Sorglmm biculor
MFB16 (66087 bp)
I I I INI I I I •
46
• I
II II I I I I I! Ill
7 8 t> n• • I
1 2 3I! !
HIT
i mini it I I
i IIOI3M
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGran exon
identifierMFBlo.l—MFB10.2MFB1C.3MFB1C.4MFB10.5MFB10.CMFB10.7MFB16.8MFB16.9MFB16.10MFB16.11
MFB16.12MFB10.13MFB16.14MFB1C.10MFB1C.1GMFB10.17MFB10.18MFB1C.19
n EST Sequence IU OverlapI 0 229^ gi |6598507[gb|AAF18020.1 12fJ 3 O ~3 2 540 gi |6091732|gb |AAF03444.1 | 528 49.0
14 0 548 gi |4581150|gb|AAD24034.1 | 440 59.08 0 372 gi |322598|pir | |S28004 345 70.21 1 67 gij0523097|einb|CAB62355.1| 61 43.53 1 359 gi |5931083|ei i ib |CAB50595.1| 179 43.3
10 2 320 gi |4455237|enib |CAB36736.1 | 316 81.16 0 303 gi |5M1703|emb|CAB01208.1j 295 50.36 5 349 gi[5541703!einb|CAB51208.1| 209 48.11 0 1851 0 135 gi |5732004|gb|AAD48903.1[ 114 84.3
3 0 107 gi |4400239lemb|CAB3C738.1| 100 09.21 0 139 gi|4097547 39 70.03 1 162 gi|4097547 1213 7 364 gi|2645971 3361 2 183 gil3090944lemblCAA18854.il 580 0 342 gi |0541703|euib |CAB51208.1 | 311
19 1 838 gi |4455240|einb|CAB36739.1 | 5246 fj 289 gi |1619002|emb|CAAC9976| 220
Identity Definition_ ^
(AC006053) puta t ive DnaJ protein A. thaiiana(AC010797) unknown protein A. Ihaliana(AC006919) hypothetical protein A. thuliaimStl2p protein - A. thaliana(AL133315) putative protein A. thaliana(AJ011C43) squamosa promoter binding protein-like 0 A. thaliu(AL035523) ubiquitin activating enzyme-like protein A. tliaiia/i.(AL09G800) putative protein A. thaliana(AL0968G0) putative protein A. thaliana
[pseudo] (AF1472G3) contains similarity to tmnsuosasA. thaliaua(AL035523) abscisic acid-induced-like protein A. thaliana(U64906) ATFP3 A. thaliana(U64906) ATFP3 A. thai/ana(AF034255) reversibly glycosylated polypeptide-3 A. thaliana(AL023094) putative protein A. thaliana(AL09G860) putative protein A. thaliana(AL035523) putative protein A. thaliana(Y0872C) MtN3 MeJicafto truncatula
71719368
1435910722200472298925477274093119135993
3791740133412904521348927509405441060011
55039070
12411100781G922212802488726782291423174536398
3841040549419944061449475528435902461974
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1
MG03 (43570 bp)
S. Sato et al. •SI
mini 111 i i:t an l
MWIsiin
Mill
I10
I
• •111 Bill 1 !
Grail exon
PICMUMI db hit
ESTdbhit
Gene
Gene
ESTdbhit
ProtMi db hitGrail exon
deduced gunw
identifieiMGO3.1MGO3.2MGO3-3MGO3.4MGO3.5MGO3.C
MCO3.7
MGO3.8MGO3.9
MGO3.10MGO3.11
2921383499374548377
1227273212825177451958522320
2730031090
210212408144001880C2171224051
25227 20867
290443207G
37290 4020C4101C 42254
272
Ki|5902002|ref|NPJ)0898G.lKi|0308787|ifb|AAF07 308.1Ki|0598300|Kb|AAF18098.1Ki|4887754|gb|AAD32290.1Ki|0498404|dbj|BAA87853.1]
398 gi|3367520
495 gi|3510253|«b|AAC33497.1327 8i|5203312|Bb|AAD41414.1|
485 gi|4510402|gb|AAD21489.1!320 Ki|5541C82|emb|CAB51188.1[
1350 49.5 polymerize (RNA) III (DNA directed) (155kD)448 43.7 IAC010852) hypothetical protein A. thaliana224 33.3 (AC002354) hypothetical protein A. thaliana519 50.4 (AC00C533) ankyrin-like protein A. thaliana293 3G.7 (AP000816) EST AU030604(E51294) correspond* to a region of
the predicted ( o n .301 39.4 (AC004392) Similar to protein kinase APKlA. tyrosiue-eerine-
threonine kinase gb|D12522 from A. thaliana. A. thaiiana413 30.0 (AC005310) hypothetical protein A. thaliana200 54.3 (AC007727) Contains 3 PF|0080C Pumilio-family RNA binding
domains (PUF). A. (Italian*444 41.1 (AC000087) putative AP2 domain transcription factor A. thaliana132 45.1 (AL09G859) putative protein A. thaliana
MHM17 (78423 bp)
I I I
a
HUHI
1 III II I
1 in• Hi I I
13 14 6 K 17 19 20
• • • • I MUL3
ntii t
I I I IBI • I
• 10 ti oI I!
I I 1 III I I III I I
Grail exon
Protein db hit
ESTdbhtt
Gene
Gene
ESTdbhft
Protein db hit
Grail exon
No. ofEST
Length Info]Overlap Identity Definitic
MHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMl
.1 -2 +.3 +.4.5 +.0 4-.7.8.9
012 -3 +4 +5 +0 +7 +89 +
.20 +
422474079471
1242310350170892317729418392424104845544483445207855979599220180905214085077219175435
71418470
102331503010008197072004032710393154218440782490745407159045012230428008287090877377170204
104273
115514
48
132
117444
00002800020120300420
093224227075135374400523
74299224187458441370524795197423140
Ki[1723495Ki 12244927Ki 12244927Ki 15089465Ki 13090931Ki 14538940
sp|Q10414|emb|CAB10349.emb|CAB10349.dbj|BAA83010.1emb|CAA18841.emb|CAB39C7C.
Ki|3894198|gb| AAC78547.1[Kil5123568X54513Ki 14538942
Ki|1100355Ki|3070398Ki|4538939
emb|CAB45334.
emb|CAB39C78.
KH AAC14530.ilemb|CAB39C70.
Ki|3212870|Kb| AAC23421.1|Ki|1399181Ki|6593498|Kb|AAD10106.2Ki|59O2371|Kb|AAD00473.1Ki|3914239Ki|4538935
sp|O04719emb|CAB39071.
535| 44I 75
731 1 "I 373
71| 402
G7290
107451439345523545109422123
25.942.228.939.258.007.938.953.188.050.8
31.050.980.975.190.259.233.097.270.0
hypothetical 03.2 kd protein(Z97339) hypothetical prote'(Z97339) hypothetical prote'(AB028987) KIAA1004 prot(AL023094) putative ribosot(AL049483) nucleosome asse(AC000002) hypothetical prc(AL079344) cytukinm oxidastRNA-Val(CAC)(AL049483) uncharacteriued
C1F3.09 in chromosome In A. thalianan A. thalianain Homo sapiensml protein SIC A. thaliananbly protein I-like protein A. thalianatein A. thalianae-like protein A. thaliana
protein A. thaliana
(U33058) UNC-89 CaenorliaWitis el<-gan«(AC004484) unknown protei[pseudo] (AL049483) Col-0 c(AC004005) putative N-myr(U50738) lycopene epsilon c[pseudo] (AC005917) putativ(AC009322) Hypothetical prPROTEIN PHOSPHATASE(AL049483) putative proteit
i A. thaitanaasein kinase I-like protein A. thaiianastoyltransferase A. thaitanaclase A. thalianae protein kinase A. thalianautein A. thaliana2C ABI2 (PP2C)A. thaliana
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
m Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
MIF21 (59372 bp)
I Mil I
WDMi 1 •
I 111 1 III
II I 111 1!a • « n! • ! •
", "a "< "s I
11 HIII
ii m
I i 111 i mi
C M516T7
II i
Grail exon
Protein db hit
ESTdbWt
Gene
Gene
ESTdbhH
Protein db hitGrail exon
deduced genes
MIF21.1
MIF21.2
MIF21.3MIF21.4MIF21.5MIF21.CMIF21.7MIF21.8MIF21.9MIF21.10MIF21.:
No. ofExon
No. ofEST ferlap Identity DeHnition
309 2T2 (U70559) DNA repair/trt ion protein Min.l9p Sactha-
33.2 (AC009917) Contains a bZIP transcription factor PF|00170 do-main. ESTs nb|R30400: gb|AA000904. gb|AI994521 come fromthis gene. A. thah'ana
79.2 (AJ003130) polygalacturonase A. (Indiana00.7 (AC007211) putative SCARECROW gene regulator A. (Indiana76.3 (AC00917C) unknown protein A. (Indiana31.1 (AL078037) putative protein A. (Indiana05.3 (AC00917C) unknown protein A. (Indiana80.3 (AC009803) unknown protein A. thahana
MIF21.MIF21.MIF21.MIF21.MIF21.MIF21
109
7900
100251411519037243473290130094308704007441545
6389
10002
120271000821430248173380335781388824038243058
441885091053337548205030258082
471585270554728553925732759259
025 gi|0034702|gb|AAF19742.1
011
091124
4905741 5 73 0 11 0 22 1 01 0 3
3 7 9
gi|2982083|emb|CAA05892| 384gi|4585920|gb|AAD20580.1| 408gi|0400900|8b|AAF13095.1| 573gi|5051703|emb|CAB45050.1| 121si|C4C0955|gb|AAF13090.1| 292gi|0041838|gb|AAF02147.1| 101
480339373100322130
ui|4587010|gb|AAD20838.1|
gi|1542941|emb|CAA55006|gi|4929099|Bb|AAD34110.1|gi|040C948|gb|AAF13083.1|
gi|4580245|emb|CAB40980.1| 305
305 04.2 (AC00C951) putati
402 89.3 (X7811C) Acetoacetyl-coenuyine138 30.7 (AF101873) CGI-110 protein Homo «*372 07.8 (AC00917C) unknown protein A. tha/i;
indole-3-glyterul phosphate
A thiola.e lUpljaiiun n
syntha
58.5 (AL049C40) putative protein A.[ ]
MJB24 (58589 bp)
III 13 1 litII
MUL3
1 mi 111: 1 1
18 12 O 14
2 6 7 8 10 II
if IIin i a n mi i in i i
«
mi
Grail exonP r o t e i n * hit
EST db hit
Gene
Gene
EST db hit
Protein db hit
Gran exon
deduced genes
MJB24.1MJB24.2MJB24.3
MJB24.4MJB24.5MJB24.6MJB24.7MJB24.8MJB24.9MJB24.10MJB24.11MJB24.12MJB24.13MJB24.14MJB24.10MJB24.1C
+-
+
---
--
+++-
Position
17345129
11949
14010174871900022735250002857131090327403500830728424305200055249
40307743
13710
15020193382268524311271433002031873340493500541084440995312908089
N o . of
1453
001
289282
2 2
82
13
No. ofEST
001
0001
105000048
Length
5 9 17 3 74 9 1
100379
10125024262891 2 0
288102902312250777
Information on the most similarSequence ll>
«i|4914419|gi|44G8809
xi|4539303|Ki|CC30404|Ki|5915831Ki |4539305|Ki|6319895|gi]022C017|
emb|CAB39C59.1emb|CAB43070.1e.nb|CAB38210
emb|CAB39C0C.lgb|AAF19502.1•PJO64718Iemb|CAB39C08.1ref|NP_009970.1sp|P06724|
gi|4511988|gh|AAD21548.1
gi|3093294|gi |20010001
Ki|l 14339|.
emb|CAA73320|sp|Q40784|
p|P20431
Overlap5900104 8 4
3707005003452801 1 9
258
901302
770
Identity00.354.070.3
74.828.170.377.540.002.551.7
08.871.9
97.0
Uetinition(AL049483) predicIAL050352) putatiIAL035C01) cytoA. thaliana
(AL049480) putati
ted protein destination factor A. thaJianave protein A. tha/ia/iachrome P400 monooxygenase (CYP91A2)
ve protein A. tha/ia/ia(AC007190) F23N19.4 A. (IndianaCYTOCHROME P450 71B9(AL049480) putatiProtein carboxyl n60. acidic ribosom.(AF08889C) ubiqui
(Y12782) putativepossible apospory-.
|partial| plasma 111,
ve protein A. tha/ianalethylase»1 protein P3 (Pl/P2-like)none methyltransferase Zymomoua. mobilu
villin A. thaiianaassociated protein C
emhrane ATPase 3 (proton pump)
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
MJM18 (16203 bp)
II II
M1O24 •
S. Sato et al. 53
use
• < l l 7 ^
I ! 1 II III II III U
Grail exonP r o t e i n * hit
ESTdbhft
Gene
Gene
ESTdbhit
Protein db hitGrail exon
identifierMJM18.1MJM18.2MJM18.3MJM18.4MJM18.5MJM186MJM18.7
Direction 5'66
3329+ 7080
8857113751239814890
3'181750148403
10020121451327616203
No.E x i in
574321
1
No. of
EST0001000
LenKth
157207250322242293436
Information on the most similarSequence IDKi|29795CC|Kb|AAC00175.1Ki|2979500||!b|AACOC175.1
Ki[2956707|emb|CAA70370|Ki[5042160|emb|CAB44685.11gi|2979559|Kb|AAC00108.1Ki|3292823|emb|CAA19813.1|
Overlap156202
281238264393
Identity52.248.3
65.254.450.945.4
Definition[partial] (AC003680) MADS-box protein(AC003680) MADS-box protein (AGL20)
(Y1C778) peroxida»e Spiriacia ofcraosa
(AGL20) A. [IndianaA. tlialiana
(AL078020) cytoclirome P450-like protein A. thalia/ia(AC003680) putative PCF2-like DNA bin[partial] (AL031018) putative protein A.
idiuK protein A. tlialianatha/iar;»
M J P 2 3 ( 3 1 8 2 7 b p )
1 i l I l l l II III III
3 4 6K19P1.' I • •
9 nnK1BG13
5 7 8il I
• I I I i HIIIIin
Grail exonProtein db hit
ESTdbhH
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
Identity DefinitionMJP23.1MJP23.2
Mjr23.3MJP23.4
MJP23.5MJP23.0MJP23.7MJP23.8MJP23.9MJP23.10MJP23.11
2803
C2279910
1129011818141841993C24204287982974C
4200
700010987
11005139001844822398202022902331174
393 Ki|5302776|emb|CAB400:>4.1|468 gi|4409008|emb|CAB38269|
227331
215461795401295192232
Ki|1755066gi|729774|»p|P41152|
X580088i|4098647si|3914083|»p|P73025|gi|4469009|emb|CAB38270|gi| 1800147
yi[3258570
314 65.7 [partial]^(Z97337) hypothetical protein A452 54.7 IAL035G02) UDP rhamnose ai
rhamno.yltrausferase-like protein A. thali118 44.5 (U63012) lectin precursor Sophora japonic250 42.4 heat shock factor protein HSF30 (heat si
30) (HSTF 30) (heat stress transcription216 100.0 U3 suRNA A. lhaliana448 99.8 (U80008) homogentisate l,2-dioxy|(enase743 31.9 muU2 protein343 79.1 (AL035602) putative protein A. tliaJia/ia294 70.9 (U83055) membrane associated protein A
227 59.0 (U89959) Unknown protein A. tiiahi.na
thucyanidin-3-gluco.ide
factor)
A. tlmJia
M K N 2 2 (27229 bp)
• IB II II II II IIII nil1 2 3 4 5 6 8
MCC7 m • I • I I • r,'IK1P
:S- I !
Grail exonProtein db hit
ESTdbhtt
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
identifierMKN22.1
MKN22.2MKN22.3MKN22.4MKN22.0MKN22.0MKN22.7MKN22.8
Direttio+
+++
_
Poyitioiin 0'
1
710010GOO1481018077200102128420108
3"2397
9273117301087018800207542193020800
No. ofExon
12
0271423
No. ofEST
4
0117020
Length
481
004144420
GO
132172403
Information on the most dinSequence IDgi|2000277|tfp|P08927|
Ki|6087804|Kb|AAF180&0.1|
Ki|3128108|Kl>| AAC1CO72.1|Ki|4039428|emb|CAB38 961.1Ki|2032100|emb|CAB114C9|
Ki|3193321
lilar sequenceOverlap
479
290
340| 09
80
200
Identity89.0
52.4
GG.350.054.0
37.8
Definition
(60 kd clmperc(AC012680) p.
(AC004521) ui(AL049171) P>
(Z98700) HrKhi
(AF009299) N
>nin bet* »ubujtntive RNA-b
iknown proteiiitHtive proteinlyl-tRNA yyntl
u definition lin
nit)iutlii
i A.A. (
lettu
,e fo
(CPN-00 beta)UK protein A. thuiia
thalintia
'•.haliixim
.•:. A. thaJJana
uud A. thaJittuu
nit precursor
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
MMI9 (81736 bp)
i i i i iii ilia i i 11 IIIII i II B I I I I I I I
MTG106 8 9 10
114 «i7 «
II I
19BI
•7 '«III I i II!
I HUM IM i l 1MB I I I llll I I I I I I II I DIE I I
GralexonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Proton db hitGrail exon
identifiel Identity DefinitiuMMI9.1MMI9.2MMI9.3MMI9.4MMI9.5MMI9.6
MMI9.7MMI9.8MMI9.9MMI9.10MMI9.11MMI9.12MMI9.13MMI9.14MMI9.15MM19.1CMMI9.17MMI9.18MMI9.19MMI9.20MMI9.21MMI9.22MMI9.23MMI9.24
1995565079749489
17883
2399427206307813307738015410184284144855503385304055052
621496335472134738307000777758
532971248927
1145923577
2001929431327953693339432424194300449801512365305155069
627576029973305750207705681502
226
U10
1000
003040
1001104000400
327071231223549938
327511404420236229117
1168245204206202203982348440214811
Ki]2979555|Kb| AAC06104.ilKi| 1488521 |eu]b|CAA08194|Ki|3241945|Kl>|AAC23732.1!Ki |3335171 |Kb|AAC27073.ljKi|4512098|Kb| AAD21751.ilgi|5734642|dbj|BAA83373.1
Ki|3292811|emb|CAA19801.1ni|4522009|Kbj AAD21782.1|Ki|4454020|einb|CAA23073|Ki|3401815|Kb| AAC32909.1|Ki|2583118|Kb|AAB82027.1|Ki| 1000971 |dbj|BAA05009|Ki|135095G|s>p|P49200|Ki|4589590idbj|BAA70817.1l!i|6224938|Kb|AAFO6022.1Ki|5541705|enib|CAB51210.11Ki|5541705|emb|CAB51210.11Ki|5541705|einb|CAB51210.1|Ki|1871577|enib|CAA72315|Ki|0630464|Kb|AAF19552.1Ki|2901375|tnnb|CAA18122|gi|3702325|«b|AAC62882.1
lii|0522529leiub|CAB01972.1
324033165119394911
1964922182232201241104532441G6171201187835315105
810
44.088.252.401.742.309.6
30.530.747.930.831.732.899.141.989 835.334.961.939.420.151.328.3
70.8
(AC003G80) unknown protein A. thaliana(X99938) RNA helica*e A. tliaiiaiia(AC004025) unknown protein A. tluJiaiu(AF0G7858) embryo-specific protein 3 A. thahana(AC000S69) unknown protein A. tlwliaiia(AP000391) EST. C22007(S0014),C22650(S0014) corre.pond toregion of tlie predicted gene.(AL031018) hypothetic»l protein A. thaluui*(AC007009) unknown protein A. thaiiana(AL035396) putntive protein A. tluliiuui(AC004138) hypothetical protein A. tlialimm(AC002387) hypothetical protein A. tlialiaiw(D2C076) chloride channel Ory< tol«gu» cuniculu.40. ribosomal protein S20 (S22)(AB023190) KIAA0973 protein Homo napioia(AF19902C) putative lran»cription factor A. thuliumIAL090800) putative protein A. tlwliuui(AL090860) putative protein A. tliajiaiui(AL090860) putative protein A. tliajiaiia(Y11553) putative 21kD protein precur.or Maliuxfu mtiv.(AC007190) F23N19.4 A. tlialiaiia(AL022141) NAM like protein A. t);aji».,«(AC005397) hypothetical protein A. tliajimia
lP»rt
MNB8 (46872 bp)
I I
11 II12 3 5
MXC20 I • I
i i ii ii i uiiii in
ii II II
II I I « mini in i II II i
GraNexon
Protsndb hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGralexon
identifierMNB8.1MNB8.2MNB8.3MNB8.4MNB8.5MNB8.CMNB8.7MNB8.8MNB8.9MNB8.10MNB8.11MNB8.12MNB8.13MNB8.14
Overlap Identity32
7313101581370699138
1217523380288003045134018388364231844105
4841747449671238484
1135020018257132980731429369134101943G3340326
932
3
4032200003000
170179222272413
109572024525C438000217498
gi|1652779|dbj|BAA17698|gi|1652387|dbj|BAA17309(
|4972115|etnb|CAB43972.1| 271|S107033|gb|AAD39930.1 4124914414|emb|CAB43665.1 8971408192 4834972112|emb|CAB43969.1 216|6539250|gb|AAF15920.1 96
i|3342450 1475080792|gb|AAD39302.1 417
,iG22C013|.p|Q9ZEA4| 15412645229 306
40.7 (D909O8) hypothetical protein Synechocyatia ap.47.1 (D90905) hypothetical protein Syn«lio<ytftiu up.
55.9 (AL078579) putative protein A. thaliana86.7 (AF133708) PP2A regulatory .ubunit A. thaliana42.4 (AL050352) Ca2+-tran»porting ATPa«e-like protein A. thah'a22.1 (U59294) myoain heavy chain Placo/jecceii ma^ellanicuB42.4 (AL078579) hypothetical protein A. thaliana48.5 (AC0117G5) hypothetical protein A. thahana33.8 (AF071233) lipolytic enzyme Sul/bloou. «<ido<aWariu.30 1 (AC007576) Unknown protein A. thaliana23.9 50. ribo.omal protein L923.5 (U78597) kinesin light chain Pl«tonerna Ijoryaniiin
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
MJE7 (74298 bp)
• I II M i l l
mi2 3 4
I I I I I
! !)« 1112
S. Sato ct al.
I IS Illll I I
is
III
K 15N18
Grail exon
Protein db hit
ESTdbhit
Gene
illI I
It I Ii i IIIII i in II i II i i
rin
ESTdbhit
Protein db hitGraHexon
deduced genes
identifierMJE7.1MJE7.2MJE7.3MJE7.4MJE7.5MJE7.CMJE7.7
No. tExou
ngth InfoSequel Overlap Identity Definition
(AC000348) T7N9.20 A. tlujia4890
1160315730217842331425354
1945107791406218693222092453128007
6181337807847142406459
gi|2213600 593 40.6gi|2244847|emb|CAB10269.1| 354 27.0Ki|6041847|sb|AAF02156.1| 819 55.9Ki|G041847|j<b| AAF02150.1 787 55.3
gi|285741|dbj|BAA03413 398 30.3*i!3915984|tfp|P33642| 229 28.3
(Z97337) hydri(AC009803) unknown(AC009853) unknown
(D14550) EDGr precuhypothetical 39.5 kd(ORFZ)
i A. Italia;i A. Mialia.
otein houioloK A. t/iaJia
.or Daucu»,ddc-reducta: fiint 3'regiun (DADA")
MJE7.8MJE7.MJE7.MJE7.MJE7-MJE7.MJE7MJE7.MJE7.MJE7.MJE7.MJE7-
01
23
6789
2871731647
+ 37022+ 37229+ 40233
43354->- 48654+ 53499+ 56826
6399204518
+ 71047
310313478437093394334107443056492755429861063042490030073301
814
17
21
217
1
1
7
0401
3
0
00001
544654
72364166101151224671
86203200
Ki|4725941|emb|CAB41712.1Ki|281122GZ11880gi|3873500|embjCAA22127|
K'.j 1420887Ki|4733902|Kb|AAD28645.1Ki|2601311|Kb|AAB87091.1)Ki|2792304l!i|0041842|i!b|AAF02151.1Ki|3643271Ki|3915958|*p|Q08270|
531003
64-188
7233
201422
64241
53
61.898.392.227.0
39.776 561.919.150.938.000.0
(AL049730) putative(AF042609) fimbrintRNA-Lyx(TTT)(AL033534) neriue-ri
(U34334) nou-,pecifiIAC0072C11 unknowIAC002336) hypothe(AF040964) unknow(AC009853) unknow(AF090872) 33 kDa[pceudo] hypothetic^
pollen-.2 A. thah
h protei
lipid tr
ical prol protein, proteinecretoryHIT-lik
jecific protein A. thaJiunaaria
L StiiraoBatc/iarornyt.* pornb*-
*n»fer-lik« protein Plia«ro!u» vulgar-i.
ein A. U.aliariaIT1 Hvinti sapiensA. 6ha/ianaprotein Orywi tativu
e protein MJ0800
MNI5 (21011 bp)
K6M1
II I
•Ii
1 I2 3
4III
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
dedu
identiMNI:MNI:MNI:MNI:
iced genes
ifier Dh. .1, 2..3. .4
ection-
-
I W ' ™ . '
7009433
1091419420
3'44579042
1358721008
E x i , » '
61
112
EST0123
48170
563452
S.Ki
K>
,ouence ID1769887]emb;CAA05051
4827060|ref|NPJ)05099.1|6500764|Kb|AAF10704.1
Overlap480
507439
Identity97.3
45.744.1
Definition(X95730) amiiiu H<
xyluloki inure (if. it[partial] (AC01010
,id penm
iSutatzHt?)•$) F3ML
:ase 0 A. t/iaJiaua
liomoloK8.12 A. tliaJiaija
MPI10 (29605 bp)
1
IMIK11'
I I
fl I
I I III
4 6
II
I I I I • III
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
identifier DirectMPI10.1MPI10.2MPI10.3MPI10.4MP110.5
Positionion 5'
13709
183602321924880
N o .
10875427
224832309529304
J !
33
171
13
No. ofEST
10
0
Length
302398724159
1087
InforinatioSequence 1
v;i |4538944*i| 4538943
Ki|4612705
i on the must simiD
emb|CAB39680.1|emb|CAB39679.1|
KbjAAD21758.1!
ar sequenceOverlap
397723
404
Identity
42.580.5
02.7
DetiniSptirtij.{AL04(AL04
i o n
1]9483) put9483) put
1] {AC00C
utive
669)
trdiiwcription fbetii-^HlHctyyii
putwtive protei
H*e A. thuiiiuiit
i kiiitwe A. thaliaim
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
MQL5 (88398 bp)
[Vol. 7,
III M l ii ill ill mi I I
i
9 V CO M 16 17 18 19 20 21
III •Illl I I
2526 27 26
• •IB
2 3 4 6 7
MM III
I III II I III I I I
• I2223
I I 1 I I I I II I I M
GraflexonProtein* hit
ESTdbltt
Gene
Gene
EST db hit
Protein db hitGraiexon
deduced gene
identifier BMQL5.1
6
llrectiou 5 ^ -0073
3'0317
Exon EST1 0 4 1 0
Sequence IDgi|4314371|Kb|AADl 0082.1
O v erlap262
Identity42.6
Definition(AC006340) .iui ilar to mam malian MHC 1111 reKion protei ll G9a
MQL5.2MQL5.3MQL5.4MQLO.OMQLO.OMQL0.7
MQL5.8
MQL0.9
G8539729
12777102291756720368
73021160714488170571935421090
22829 24170
28100 29313
032
70
0
1
150220280159357243
329
329
Ki|4895238|Kb|AAD32823.1| 183Ki|4539454|emb|CAB39934.1 107«i|2245111|emb|CAB10033.1 150«i|2240110|emb|CAB10032.1 300l<i|3434969|dbj|BAA32419| 242
8i|4330720|«b|AAD17398.1| 278
«i|3434970|dbj|BAA32422| 299
40.748.1
MQL5.10MQLO.l 1
MQL5.12
MQL0.13MQLG.14MQL0.10MQL0.1CMQL0.17MQL0.18MQL0.19MQL5.20MQL0.21MQL5.22MQL0.23MQL0.24MQL5.25MQL5.26MQL5.27MQL5.28MQL5.29
+-
-
++++-__-t-
+-t-
-
3246934390
37753
4002641499403804046749067020470001207849610900279004827098177534377504805578135084104
3380030918
40407
4085543898407734771401013032900700009473026570422000989728137688079801809258319188220
01
1
10
2144
76714
11331
1013
00
0
000002
22105030001
2 9 5
843
8 8 0
1 1 0623
90410240244314293310477283098300000123208804
gi|2980800gi|3C00040
gi|3000040
«i|1703219gi|4490297Ki|6119525gi|6522919gi|224G108gi|730688|*gi|2245107gi|2245107Ki|2245107
euib|CAA18182.1
»p|P54120|enib|CAB38788.1Kb|AAF04169.1emb|CAB0210C.lemb|CAB10G30.1p|r39097|euib|CAB10029.1emb|CAB10029.1emb|CAB10029.1
gi|4836917|Kb|AAD30619.1|Ki|11701C9 sp P4CC01gi|3152613|gb|AACl 7092.1(!i|10766C0Jpir||S51839
gi|2245101gi|2245100
enib|CAB10523.1emb|CAB10522.1|
203839
8 3 2
72601
88371220211233213230304198347304
2 1 4
510
49.242.5
41.9
49.304.038.201.371.499.066.766.409.120.979.934.253.4
70.748.5
(AC007609) putative VAMP-associated protein A. tha/uuu.(AL049000) contaim EST gb:AA72841C A. tliajiaria(Z97343) GTP.bindiug RAB1C like protein A. tlialiaua(Z97343) hypothetical protein A. tiialiana(AB008104) ethylene responsive element binding factor 2
(AC00C248) putative non-LTR retroelement reverse tran.criptaseA. tlialiaiia(AB008107) ethylene responsive element binding factor 0A. tluJimia(AL022197) putative protein A. tha/ian»(AF080119) similar to A. thaliana disease resistance protein RPS2|GB:U14108) Arubiilopttie thaJiana(AF080119) similar to A. thaliana disease resistance protein RPS2(GB:U14108) ArattidopBis thalitniaA1G1 protein(AL035C78) putative protein A. thahatia(AC011560) hypothetical protein A. tlmlUnu(AL132978) putative protein A. tlialiaua(Z97343) EREBP-4 like protein A. tlialiaiia40s ribosomal protein S19. uiitochondrial precursor(Z97343) thioesterase like protein A. thaiiana(Z97343) thioesterase like protein A. elialiaiia(Z97343) thioesterase like protein A. thulijtiM(AC007153) 80099 A. tlialiaiiahouieobox-leucine zipper protein HAT2 (HD-ZIP protein 2)(AC004482) hypothetical protein A. tlnJiunuD13FIMYBST1) protein - potato
(Z97343) hypothetical protein A. thaliu[partial] [Z97343) DWA-bimliiiK proteii
MSD23 (33479 bp)
• II I • I I II I
1 2 3 5 6MZA-5
4III I
I I I I I
VQD22
Grail exonProtein dbhH
EST db hit
Gene
Gene
EST dbhH
Protein db hitGrail exon
deduced
identifier
p e r i e t
b irectionI'osition
3'No. of
" Exuii EST Sequence ID O v t rl»p Idei itity i)eh iiutionMSD23.1MSD23.2
MSD23.3MSD23.4MSD23.0MSD23.0
40997031
12094143251784720192
02319580
13987157751890726907
0
00
290208293258
i|1465368|emb|CAsi|3875770|emb|CAA92093.1
8i|185054Ggi|2245131|emb|CAB10552.1(•i|4510429|gb|AAD21515.1|
248289223
33.5 (Z08297) SimilaritySAP02 (TIR Ace. No A47C55)
93.2 (U88045) .yntaxin related protein AtV»m3p A. thali«72.8 (Z97344) hypothetical protein A. Uudiaiia51.8 [p»eudo) (AC006929) putative non-LTR retroeleuient i
scriptaise A. tha/mtia
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
M Q M l ( 8 1 3 6 5 b p )
I ill III III I
2 5K13M13 I I
57
III I II! I III 1 I II N
i II
8 10• IB
11 B 14 B
1 3 4 67
11 III II mi mi
16 17 1920
i i i mm i i in i
Grail exonProtein db hit
ESTdbhtt
Gene
Gene
ESTdbhH
Protein db hitGrail exon
dedi
ideut
iced genei
ifier D'i rectionPosition
3'No. ofExon
No. ofEST
LeiiKth Infon.Seque
tiatin e e
on oilb
:i the iriust similar sequenceOverlap Ide i t y Dehn itlOIl
MQMl. lMQMl.2
MQMl.3MQMl.4MQMl.5MQM1.0MQMl.7MQMl.8MQMl.9MQMl.10MQMl.11
MQMl.12MQMl.13MQMl.14
MQMl.15MQMl.10MQMl.17MQMl.18MQMl.19
MQMl.20MQMl .21MQMl.22MQMl .23MQMl .24MQMl.25
5095
58981185215855172812084G28519312993381334071
4250445CSC47379
498825227750444C083C02979
05191C794271019731887594978088
0784
94041405110739204202091730370325033405341550
440714032148512
5192450702080940239804043
004440874072721700497818281361
00
003607003
031
09013
000000
8 4 0230
497499295489
72285337
651179
464212349
490083025175308
109208430206306286
tO|4100903|enib|CAA77232|gi|4176538|einb|CAA22872.1<
Ki|4309972|Kb|AAB81870|Ki|3328838|Kb| AAC68007.1|Ki|6033999|dbj|BAA09700.2|Ki|4990890|emb|CAB4431G.lD00935Ki|3747111gi|3330378|gb| AAC27179.il
gi|3874214|cmb|CAB00083.1
Kil4220491|Kb|AAD12714.1K1J2583121 |K1>| AAB82030.1gi J2384910
Kij2129548|pir||S71190Ki|4502897|ref|NP_001285.1Ki|4900074|Kb|AAD34008.1Ki|4200249|emb|CAA22897|gi!G030089|dbj|BAA88530.1
Ki|6225984|sp|Q9ZCRG|gi|5487873|gb|AAD04946.2
Ki|0017097|Kb|AAF01580.1Ki|C017097|nb|AAF01580.1Ki|0017097|Kb|AAF01580.1
1G0211
37219901
488W
284170
1144
50207203
489035109127307
90220
9490
123
33.129.2
33.233.537.183.490.3
100.043.8
42.5
43.995.230.3
93.144.024.540.687.7
51.530.6
51.040.233.1
[partial] (Y18C20) DtfPTPl protein A. thxliu(AL030260) dna-directed rim polymerase iii suromycea pombe(AC002983) hypothetical protein A. thaiiana(AE001314) PolyA Polymerase Clilnmydia. trmh(D03479) KIAA0145 protein Homo wpirno(AJ242C59) o«rinii palmitoyltrHiinfertwt! SohuiumtRNA-Gln(TTG)IAF095041) MTN3 hotnolog A. thalUim(AC003028) putative MYB fnmily trtu»criptiun
E1-E2 AT
fjtctor A. tluil'uuui
Puse YEL031W(Z83217) Similarity to{SW:YEDl_YEAST)(AC00C069) hypothetical protein A- tltalitui*EAC002387) unknown protein A. tlmlinim{AF022982) contain* »iiiiil»rity to a DNAJ-Hke domain CtMxwrlmfj-ditw f/fga.n»calcium-dependent protein kiiiase (EC 2.7.1.-) - A. th»li«nacleft lip *iiid palate associated traii^iiiembrane protein 1(AF149049) M protein precursor Strcptoco<<UH pyognit*(AL030297) hypothetical protein Homo mpientt(AP0009C9) ESTy D39011(R0009). AU032023(R3215) correspondto a region of the predicted gene.50s ribo*omal protein L24(AF110333) PrMC3 rinut. radiate
(AC009895) hypothetical protein A. thuUana(AC009895) hypothetical protein A. thaliaua[partial] (AC009895) hypothetical protein A. tfiaJiana
MRG21 (55151 bp)
K19D1
i m i i iin
3 4
•III II IH i I
I! I 1
111IIIin ii
MOBS
1 2 5 6 7 8 «I ! ! I!
J i
III i • • •iiimii i i n
Grail exon
Protein db hit
EST db hit
Gene
Gene
EST db hit
Protein db hit
Grail exon
dedi
ident
iced g
ifier
enes
D i r ectioPonition
3"No.Exo
of No. ofEST
Lengtli I nfcSeq
>rmatSon oilb
ii the mi>*t similar sequOver
encelap Jdent:i ty Defii lit ion
MRG21.1MRG21.2MRG21.3
MRG21.4MRG21.5MRG21.0
MRG2MRG2MRG2MRG2MRG2MRG2
29070923
129341300919374
226032011029282302200134704288
43879548
130061017722138
250002720234741370090328055041
91
21
00I
41690020
379950010450221
gi|0520231|dbj|BAA87957.1| 227 44.3gi|5882743|gb|AAD00290.1 000 67.3
gi|0903050|gb|AAD55615.1| 37 71.1gi|5882745|Kb|AAD55298.1| 618 62.0gi|1255871 ' 90 33.0
Ui|2C23300|Kb|AAB80452.1| 320 49.5gi|4078333|euib|CAB41144.1| 903 84.5gi|4C78332|euib|CAB41143.1| 013 73.8gi|207073|«p|r29512| 429 90.78i|207073|«p|r29512| 220 93.7
IAB028232) helin-loop-lielix protein liouiolog A. ij.aliaiIAC008203) EST» gb|H30134 «nd gb|H30132 come from
[p.eudo] (AC008010) F0D8.29 A. tlialiana(AC008203) F20A4.24 A. tttaliaiia(U53341) .l.ort region of weak nimiUrity to bovine me.ceptor pO3 (PIR:S28503) Citeitorliabditi* eJegau«
(AC002409) unkn(AL049C08) H+-t(AL049C08) putative peptide tr»itubulin bet«-2/beta-3 chain[partial] tubuliu beta-2/bet»-3 cli
protein A. «;»luuuiportinB ATPum-like protein A.
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 1,
MSKIO (81414 bp)
mi iii nil
34 5 .
I l l l l l l l l l1:1 <)| 1
I! I I! II II I II
1 2
II
I I I13
III)III I
1718
1II
•II
1 •8202122
•1 •Il l l I
23 24
1 -
25 26
1
»a
i
Grail exon
ESTdbhR
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genei
ideutiSei H I T Identity DefiniteMSK10-1MSK 10.2MSKIO.3MSK 10.4MSK10.5MSK10.CMSK10.7
MSK10.8MSK10.9MSK10.10MSK10.11MSK10.12
MSK10.13
MSKIO.14
MSK10.15MSKIO.10MSK10.17
MSK10.18MSK10.19
MSKIO.20MSK 10.21MSK10.22
MSK10.23MSK 10.24MSK10.25MSK 10.20MSK 10.27
4084469574278890
1011211072
1780423430245082538226246
4017705881199579
10C0115536
2193824017248372589727258
27694 30523
33826 36842
377044207847382
5045552783
592045999900859
0493367193730047698279942
387604407349456
5111158870
597216005662767
65295670177410377G7981414
932 «i|6623976|gb|AAF19229.1| 643178 gi|46C2646|sb|AAD2C916.1| 119488 gi|4262158|gb|AAD14458| 243231 Ki|5068781|Kb|AAD4G007.1| 230230 8i|3252818|Kb|AAC24188.1| 202113 Ki|3252818|gb|AAC24188.1| 97
1488 *i|3047071 028
961 *i|3047072 960142 Ki|3047070 14090 Ki|3805759|gb|AAC69115.1] -79
147 gi|30470C9 146248 «i|4773911|Kb|AAD29781.1| 247
800 gi|3047008 805
748 *i|3047073 588
321 Ki|5032274|«b|AAD38222.1364 Ki|4585912|Kb|AAD25573.1691 gi|6539553|dbj|BAA88170.205 gi|40380CC|gb| AAC97247.1| 91
1307 Ki|4263544|Kb|AAD15358.1| 738
205 gi|4038000|gb|AAC97247.1679 gi|30470CC
380 «i|4080179|sb|AAD27547.1| 191133414 gi|3047001 413
84.335.8
97.5
90.3100.040.0
100.091.9
90.4
udo] (AC007505) Similar to Athila ORF 1 A. thaliaIAC000429) hypothetical protein A. tliahana[pseudo] (AC005275) hypothetical protein A. tljaliana[paeudo] (AC007894) F21H2.6 A. thaliana(AC004705) hypothetical protein A. thalituia(AC004705) hypothetical protein A. H.«li»l:a[pseudo] (AF058825) tiimilar to inaine traiicpo^o(GB:M7C978) A. tl,«Ji«n»(AF058825) No definition line found A. tlmlMnu(AF058825) No definition line found A. l);«!i»/;a(AC005693) hypotheticIAF058825) No definition lii(AF147259) contni™ »imil»thetical proteins(AF058825) .i.nilar to III»A. lli«Jian»[p.eudo) (AF058825) conti>proteins A. tlialimia[pfeudoj (AF147264) No definifn
e found Aty to a fa
uila
ly of A. thaliaua hypo-
u*on MuDR (GB:M76978)
ity to retrotran
id A. thalia
i-like
line f(AC006298) hypothetical protein A. thali
51.7 [p«eudo] (AP000836) Sin.ilarto Oyza au.fralra.diB retrotran.posonRIRE1 (D85597) O I J . . « I ™
58.7 [pyeudo] (AC005897) hypotlietical protein A. tdaJiana64.1 [pceudo] (AC000250) putative Atliila retroelement ORFl protein
A. thaliana
37.9 [pseuJo] (AC005897) hypothetical protein A. thaliana100.0 (AF058825) contain* similarity to retrovirus-related TOL polypro-
teius A. thaliana
38.0 [p»eudo] (AF111709) polyprotein Orj.ua mtiva «ulj.p. indici
93.7 [partial] (AF058825) .imilar to A. thaliaua retrotrau«po»oi1 |GB:(L47193) ArakiJopaia thaliaii*
MUF8 (13776 bp)
I I S!
2 34MBK23 — 1 1 i K!FI. 2?
Grail exon
PiLriundb hit
ESTdbhit
Gene
Gene
ESTdbhtt
Protem db hit
Grail exon
deduced genesStltK
[pnrti«l] (Y12776) hypotlietk(AC011622) puttttive di^we(AC005107) puUtive dis.e^e[pseudo] (AC000170) putMtiv
i A l l
1 A. thaliaMUF8.1MUF8.2MUF8.3MUF8.4
30 22C338M 73509160 12C421333C 13770
71110181008
147
Ki|G456160|xb|AAF09148 1|K'I J3757516 Kb AAC04218.1Ki|3738337|i(b|AAC63678.1
509872988
70
51.946.850.162.0
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
MSN2 (62927 bp)
M l II I I I I I II
•II
1 2 BK1F13 • •
III!I I
9 «
S. Sato et al.
IMIMil
I i l lM «
MUD21
3 4 5 6 7
I II II I I I I1
III III II
59
Gran exonProtandb hit
ESTdbWt
Gene
Gene
EST db hit
Protein db hitGrail exon
deduced gene
identifierMSN2.1MSN2.2MSN2.3MSN2.4MSN2.5MSN2.0MSN2.7
MSN2.8
MSN2.9MSN2.10MSN2.11MSN2.12MSN2.13MSN2.14MSN2.15MSN2.10
Dir ectiou
+--_--
+
+-_++
5*1
52238G12
14716108822070023248
30500
3981142014444234732253130544505807059181
3"109271679662
10909181052197620980
32002
4116443966409004928853960582715874602620
Exoi
82111
13
1
2G442
1G1
10
No. of No. ofEST
Info theSeque ID Definition
[partial] (Z99708) putative protein A. fchahana(Z99708) putative protein A. (Indiana(AC004482) hypothetical protein A. thaliana(AC005724) unknown protein A. thaJiana(AC005724) unknown protein A. tfialiana[partial] (AC005724) unknown protein A. (Indianadolichyl-diphoaphuoligosaccharide protein glycotraiiBfera**2.4.1.119) 50kD yubunit - human(AF077407) contain* .iuiilarity to UDP-glucorono*yl andglucosyl transferase. (Tfam: UDPGT.liimii . .core:A. tnaliana(Z99708) homeodomain protein A. Uialjana(AL132979) protein kina.e ATNl-like protein A. tlialiaiia(Z97341) hypothetical protein A. thalimin(AL132979) sine finger protein A. tfutliana(AC007202) T8K14.10 A. tiujuuu,(AF143940) SWI2/SNF2-like protein A. «ia/ia/;atRNA-Glu(TTC)(AJ001809) succinate dehydrogenase flavoprotein alpha .A. thaliana
0 240 gi|400C88C|emb|CAB10810.10 451 gi|4000880|emb|CABlC81C.l0 321 gi|3152605|gb|AAC17084.1|0 398 gi|4185129|gb|AAD08932.10 408 gi|4185129|gb|AAD08932.13 405 gi|4185129|gb|AAD08932.15 442 gi|C27424|pir||A44054
5 481 gi|3319344|gb|AAC20233.1
228 gi|4000894|emb|CABlC824.1[405 «i|C501975|emb|CAB02441.1[414 (•i|530279C|emb|CAB40038.1|500 gi|C501973|emb|CAB02439.1(247 s i |4835707|gb|AAD30234.1|704 gi|4720079|gb|AAD28303.1[
72 K00193034 gi|3C00471|emb|CAA05025
413309380334372390
51.240.853.937.039.145.845.1
1003402854972427C3
U033
59.055.108.949.447.794.095.C92.1
UDP-85.94)
MUD12 (22601 bp)
IIII
mi IB iu
MY HI 9
1 •2 3
! I I
S 6,1
ma i
VlSNii)
Grail exon
Protein db hit
EST db l it
Gene
Gene
ESTdbhH
Protein db hit
Grail exon
deduced
identifier
ger l e s
D i r eetionI'o fit ion
5 3"NE
o. ofx o n
No. of LeiiKtl:i Inforiiiation iSequenee ID
m. the moot tiiiailjir tfequ.Overlliip Ide »t]ty Detii lition
MUD12.1MUD12.2MUD12.3MUD12.4MUD12.0MUD12.0
MUD12.7
14315120729097921539717322
348454018755147001073318098
19180 22550
499114302115294459
|1200250|emb)CAA02470gi|0225913|«p|O24415|giJ4507210jgb| AAD23031.1|gi[5791483|emb|CAB53527.11gi|0319759|ref|Nr_009841.1|j!i|5903074|gb| AAD55032.1|
1017 Ki|5903073|gb|AAD56031.1|
04278895
52387
53.839.154.243.439.4
(X90990) s tpkl protein kinase So/ariu00. acidic ribo«om»l protein T2B(AC007113) hypothetical protein A. tlujiuia(ALllOllf i) putative protein A. thallium.Mitochoiidrii.1 ribosomal protein MRPL27 (Yin(AC008017) Similar to part of downy mildewR P P 5 A. tluluiui[partial] (AC008017) Similar to di .ea.e reA. tiuJiaru
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
60 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 1,
MUL3 (82010 bp)
n innHIM
12
• I I
II I
B i l l ?
6mi
IB I I
« 9IB
1
«1
1
• • • • 1HE
ii main 11111
Grail exon
Protein db hit
ESTdbNt
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
identifiel Overlap Identity UefinitiJn"MUL3.1MUL3.2MUL3.3MUL3.4MUL3.0MUL3.6MUL3.7
MUL3.8MUL3.9MUL3.10MUL3.11MUL3.12MUL3.13MUL3.14
50076813
12614159452881841570
51339559745840901201682027444177582
54131061714488229053044845388
55440574155933867736691887080881412
G07G2GGO39010993301271
102839722G1184115435615
Ki|3377007 599 90.0Ki|C143887[gb|AAF04433.1| 281 20.0Ki|4914414|einb|CAB43060.1 1009 70.2si|434700|dbj|BAA04803 112 42.0Ki|4388818|gb|AAD19773.1 049 74.9
Ki|491441G|emb|CAB43667.1| 1024 44.1«i|3022900|Kb|AAC34232.1! 344 47.0Ki|4914417|euib|CAB43668.1 210 03.0gi|4604997|ref|Nrj002303.1| 774 31.0Ki|5809758|einl)|CAA41032.1j 113 59.0Ki|4538928|enib|CAB39064.11 369 53.2ei|453892G|e.nb|CAB39062.1 604 83.8
(AF050020) auxin transport protein EIR1 A tli*li««i,(AC010718) unknown protein A tliulimiu(AL050352) C»2+-tr«nspurtiuK ATPa»e-like protein A thaliana(D212C2) ORF Hum; aapinnr[pseuJo) (AC00C028) putative Pet roele.nent pol polyproteiA. tfiWiuia(AL000352) putative protein A. tliuliitim(AC004411) bypotbetical protein A. tnaiiana(AL000352) putative protein A. tl;ali«,,aDNA ligase IV(X58827) AT-LS1 product A. tlaliana(AL049483) putative protein A. thuiUiiH(AL049483) putative r,bo»pli»tidyl>eriue decarl.oxyla.-e A Huliiu
MWD22 (87180 bp)
w i n i Rim
i t !• i6 7
I I I I I UK
K3K/
I I I mini urn i i n in
6 16 » 19 20
II I I
IB I •3 4 5 8!j i!tl
III III ii mn
I ICM
II 11 i II
21
IIIIII
•n
22
•
24
mi II
25 26
mi
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced genes
ideutifie] :rlap Identity Defiiiit.it.MWD22.1MWD22.2MWD22.3MWD22.4MWD22.5
MWD22.6MWD22.7MWD22.8MWD22.9MWD22.MWD22.MWD22.MWD22.MWD22.MWD22.
MWD22.MWD22.17MWD22.18MWD22.19MWD22.20MWD22.21MWD22.22MWD22.23
MWD22.24
MWD22.25
MWD22.20
290000037123
12082
13523152001804322797260432707231123340513749838089
53934550475791CG150GG450GG59880890375219783C7
83G47
8GC20
4438G9579398
13143
1511717GC120550252552744129093330893513638298531G8
544G656888G0853G3104G5609G7229720257740581920
8C058
87180
0
530316131305220
2G7379
442
0 3600 2500 2670 1837
1413913882233G8257819
gi!H69544|t-p!P42762|
gi|1172494:t<p|P43335|
k;i|2136800,pir||S598G3gi|3947013|eml.|CAA1946513 i 11175381 |*p|Q09709|^i|5GC8787|yb|AAD4G013.1gi|4220480|j<l->|AADl 2703.1
gi|3540182|<ijl20849?|dbi|BAA07323Ki|5732055|^b|AAD48954.Ki|1504030|dbj|BAA13214,
Ki|26GG93|*p!r29525|i=!;i|4559310|Kb|AAD22979.1|(<i|47G8996|gb|AAD29711.1gi|4185499gij5042171|emb|CAB44090j!!i|4455258|eiIib|CAB36757gi:4455259|cinb|CAB30758.gi 4490304|tmb|CAB38795.
xi G572330|emb|CAB62977
Kil3451321kmbjCAA20438|
gil3445238 einb:CAA18481.1|
124
235113
1271| 25C
347232190
349129204
1161
138169123142
I1 365250796580370
40.4
65.729.8
53.137.038-526.232.5
49.456.245.;22.8
49.G22.947.640.C50.872.057.780.434.2
] ERD1 protein precursorh y p o t h e t i c ! 39.2 kd protein RV2228C
AF094831) iron ^.terin-4-fxlphii-ciirb.ydroxy-tetrnhydr
protein) (PCD)ityA binding protein II - bovine
AL023828) LDNA EST EMBL:M89008 come;- fix.ypothetitcd 44.9 kd protein C18B1102C in cliroiAC007894) F21H2.12 A. th*liMutAC00G069) unknown protein A. thallium
[AC004122) Unknown protein A. tlmlinimD38125) EREBP-4 Nkotim.H tabm-uni
48925..49138] fD8G978ot-mid K12D12(Z49OG9JLEOSINAF129131) put.Ht.ive ZAF140498) hypotheticAF090095) fertilizatioiAL078620) putative pr
AL035G23) putative SeAL035678) putative p[AL09G766) dA59Hl8.2east, worm and plantAL031323) putative troiiiycex pott thepartial] (AL022347) pi
) Hm
c3 bi1 pro-hideot.ein
r/Th
Inuvpred
tlli-LT
ativ
Ltlit e ipe l
A.
r VA.
elctepti
P
o a C.wpiciis
ig prote, Oryx*dent. ;-ethHliHiia
oteiii kthalinitHroteiii
d) proteon or s[
oteiu A
l^fititn j
n
vtttivaed 2 p.x
iinilwrin.) Hoicing frt
tlutlittti
teir
th*.
o hino nttol
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
MWF20 (91193 bp)
S. Sato et al. #1
I III I
•I I • III II I I lilt
S M'"I«
III •IIIII i •
7 8 0 « H
i n if -3 i nun
i ii r
IN I Il i d ! • • i
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced gen
identifier
l e s
PositionDirection 5' 3"
No. ofExon
No. ufEST
L Inioieql
FiliationJence ID
on the most: similar :O
=<equelivet-Up
ce
Ide iitity b e h i litionMWF20.1MWF20.2MWF20.3MWF20.4MWF20.0MWF20.GMWF20.7MWF20.8MWF209MWF20.10MWF20.H
MWF20.12
MWF20.13MWF20.14
MWF20.15
MWF20.1C
MWF20.17MWF20.18
MWF20.19MWF20.20MWF20.21MWF20.22MWF20.23
00048703
15023246203117C3078838040412874648900110
00190
66892C8708
70401
72420
7427770109
7823082003830768014089503
7048102001674520313328973092839973432014010100507
60988
6801670210
71710
73816
7434976091
8170082745839688852889077
332516524021024327043CKi
Ki
K'KlH<
«l
|2341034|Kb|AAB70434||3809190|dbj|BAA34390|10024282780347|dbj|BAA24281|2780348]dbj|BAA24282||2780349|dbj|BAA24283|
8i|3770980|emb|CAA09190gi|1903308!Kb[AAB70439
1933 ei|4203G56|gb|AAD15377.1
370201
305
362
73154
131096
gi|1903309|gb|AAB70441iKi|1706714l»[i|ro3070
Biil903360|gb|AAB70442|
Ki|1903300|Kb|AAB70442[
AC000106gi |2341039|eb |AAB70443|
908 gi|3901294
gi|0042037|t;bjAAF20218.1gi|0324710|ref|Nr.014784.1i
479331
523320
113
303
327250
361
357
73137
127383
70.690.794.096.697.390.8
94.004.0
62.060.3
03.0
61.7
94.087.0
00.920.0
(AC000104) F19P19.10 A. tlialiaIAC000104) F19IM9.13 A. tlujiiuio(AB00574C) iuorgimk pliosplmte transputer A. tluJiaiu(U62330) pluoplntte tmiuijurter A. U»lia,,«(AB000094) inorgttnic pbosphHtt triinsporter A t}>nlUi,H(AB000094) morganic phusphatc trHii^porter A. thulium(AB000094) protein plio»plr»t»»e 1 ualnlytic subunit A. tlutluu
[pneudu] IAJ010406) RNA l,elic»s(AC000104) Similar to Nicoti EREBP-3 (gbjD38124).
[iweuduj (AC006136) putative uo.,-LTR retrueleiueut rever»« tra.,-scripti^e A. thaiiaria(AC000104) F19P19.21 A. thalUiiaelectron trai.sfer flavoproteii, bet»-subunit (beta-ETF) (eleclruntran.fer flnvopruteiu .mall subunit) (ETFSS)1AC000104) Similar to AraUilopu's 2A0 lgl.|X83090). ESTgb|T7G913 comes from tbis gene. A. thzlituiH(AC000104) Similar to Anbidopm, 2A0 (gb]X83090). ESTgb|T76913 tomes from tins gene. A. tlujiiuiatRNA-Ala(AGC)(AC000104) Similar to Nicotiar.a lesiou-induciu,. ORF(gb|UCC269). A. tliWisiui(AF089711) rpp8 A. tlujiwia
(AC012390) uukuuwactin-related protein
i A. U,aJia;,«
MWJ3 (42356 bp)
I IHH I. I II• *1 2 3 *
MDf 20 ^ B • •
'• I I I I I I I
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
deduced s
identifier Dir ectnropition
UH 0 3"NoEx
. of No. uf LEST
eiiKth InforiSeque
iiiitin e e ID
JII the i llOirt sin liUr sequenceOvcrlHp I d e ntity Dehi ijtion
MWJ3.1
MWJ3.2MWJ33MWJ3.4MWJ3.0MWJ3.0MWJ3.7MWJ38MWJ3.9
li
32818500
100291702318400239902702333027
73799730
1704C1819422877200872872030242
04000000
1274 K79 K
649 K
224 K1491 K306 K366 K
1072 K
100304641^1 AAF19002.1112642210i|2240080|einb|CABl0002.1i
i297|emb|CAB30832.1:i|3779020|ii!b|AAC07205.1
»297|emb|CAB36832.1j|4450297|emb|CAB36832.1|:i|4012630|8b|AAD21099.1
91578
548131
1490342342
1002
27.778.521.953.880.900.000,606.8
trtial] (AC004C84) putaA. tiuduuw(AC007190) F23N19.4 A. tluliuu(AF03038C) NOI protein A. tliali.(Z97343) uiyuoin beavy chain like(AL030028) bypotbetical protein(AC005171) putative retroelemen(AL030028) hypothetical protein(AL035528) hypothetical protein(AC004793) Contains reverseTFI00078. A. tliWiaiui
:eptor-like protei
A. tliattit pol pc
A. tlulii
.lyproteir
;riptas
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
MWP19 (11026 bp)
Mttli! II I
1 2MXHI ; MIK22
Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,
GialexonProtein db hit
EST db hit
Gene
Gene
ESTdbhft
Protein db hitGrail exon
identifier DirectioMWT19.1 +
MWT19.2 +
ru , i t ,on
1244
7984
3 '0437
11024
Exo1
1
EST0
0
1731
1013
Sequence 1I>Ki|4309763|Kb|AAD15532.1[
Ki|3779026|Kb|AACC720S.l|
Overlap290
0 1 1
Identity30.1
C3.0
Deft ni tip*eudoA. ttud,,,pseudoA. th*li*
(AC00C217)m a
(ACO0M71)
p
p
t
t
ative
ative
retroele
retroele
ntnt pol
ueut pol
polyp
polyp
rotein
rotein
M X K 3 ( 8 1 4 9 4 b p )
I I I I Illl I I II
7 8
i 11 ii I I I I I I tniitmim i
ne o u 17* »
• •II III III •
III24 292* 27
i inn
9 « e «
• •III
• • I I20 212223
11••nun
i i2839
Gra« exon
Proton db hit
EST db hit
Gene
Gene
ESTdbhtt
Protein db hitGranexon
identifierMXK3.1MXK3.2MXK3.3
MXK3.4MXK3.5MXK3.6MXK3.7MXK3.8
MXK3.10MXK3.11MXK3.12MXK3.13MXK3.14MXK3.15MXK3.1CMXK3.17MXK3.18MXK3.19MXK3.20MXK3.21MXK3.22MXK3.23MXK3.24
MXK3.25MXK3.26MXK3.27MXK3.28MXK3.29
Directioi+
+
_-_++
—
_
++
-
—
++
_-
Position
4834185750
1133712249142941927022410
271142908831981345883823541076432234645151222528325614458285609000246164998
6655970013712507303575202
3'18913735
11211
1185813847170281957922481
286433120032677352714144642677456565005852394557595777800335621270376905947
0901070085728867304075085
No.Exo
31
14
20721
2222
1414
181
1249051
131512
EST000
00400
2001
0001031
0000
00400
106651
145380092
8572
479171109
92487334504795391513428344200262316
57873
3172021 1 3
Sequence ID
gi|0091736|gb|AAF03448.1
Ki|0324551|ref|NP_014620.1|ni| 105 2892|<lbj|BAAl 7810Ki|6469125|emb|CAB61744.1AC004392*i|544184|.p|Q06801
gi|3851530
gi|5929906|gb|AAD50036.1
Ki|0466953|xb|AAF13088.1Ki|0502304!emb|CAB02602.1|Ki|1806140|emb|CAA05979|Ki|0587800|Kb| AAF18552.1Ki|4263515|Kb|AAD15341Ki|4507198|nb|AAD23614.1Ki|4903004|dbj|BAA77841.1|«i|4522005|gb|AAD21778.1
ni|3859036M2C108gi|4959108|i!b| AAD34237.1|
Ki|G573707|t<b|AAF17687.1|
ir sequenceOverlap
441
46
341533
2772
400
319
576353447363
02198224310
57774
316
100
Identity57.0
48.9
27.251.782.198.0
64.0
74.7
43.727.475.000.863.576.925.335.4
92.798.6
100.0
44.0
Definition(AC011098) putative betH-1.3-K]uc*tliH*e A. tltaliltiu
(AC010797) putative WRKY-like tran.criptio.u.1
Yol022cp(D90909) ABC transporter Syri«Jio*ys£iH »[>.(AJ275310) hypothetical protein Cicer arietinumtRNA-Leu(AAG)
tionatinx euayme) (D-en«yme)(AF005435) uodulin Giytine/dax
(AF102150) COPl-interactinif protein CIP8 A. llm
(AC009176) unknown protein A. thaiiniw(AL133421) putative protein A. tluJiuia(X97314) cdc2MwC Medicago Bativa(AC012680) putative mitodiondrial carrier protein(AC004044) hypothetical protein A. thuJimia(AC0071G8) putative GTP-bindinK protein A. tha/i(AB021981) UDP-N-acetylKluco»aniine transporter[paeudo] (AC007069) putative non-LTR retroelemei
(AF095453) asparaKine synthetase A. tJialianatRNA-ArxlACG)(AF083914) annexin A. tluliuui
(AC009243) F28K19.24 A. tluliaiw
r.gulator pro-
P P
lituia
A. Uialiana
•aita
Homo sapiensit reverse tran-
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
No. 1]
MYN8 (54528 bp)
S. Sato et al. 63
I I II • IlllIII t
I I! I
K19E1 • I • HM Wt
lilMIHB IIIu iiirai
inn7 8
51-t i I I I
P/NC6
GranexonProtein db hit
ESTdbhK
Gene
Gene
ESTdbhtt
Protein db hitGrail exon
deduced genes
identifieiIufoi itlieSequence ID Identity Definitk
MYN8.1MYN8.2MYN8.3MYN8.4MYN8.0MYN8.6MYN8.7MYN8.8
8700144441838424443299054127901258
8804162272392428731330005038753149
1320C
1
4040
10n
35205
10401181070
2216338
Ki|4972120|emb|CAB43977.11si|3860270|t!bjAAC73044.1gi|4450349|eiiib|CAB30759.11«i|4455350jemb|CAB3C700.1|(Ci|123530|»p|r04929|8i|23170C|»p|r29018|gi|417073|»p|Q034C0|Ki|4972109|emb|CAB439CC.l|
34262976
1752201
332
88.005.857.3
30.778.272.4
(AC005(AL035(AL035
cell diviglutrtllU(AL078
824) unknown protei524) putative protei]524) putative proteii
istiuii control protein*te yynthase precursi579) putative acyl-C
in A. thaiianai A. thuliwtui A. tluJijtn*
2 homolog 1jr (NADH-GOGAT)oA binding protein A. thai
M Z N l ( 8 1 6 7 2 b p )
I D I I M i l l I I I I l l l I • I I I I I II I I II II 11 I I I
II I
I i I O i l ill
- t - •+•
II i l8 9t> n
• II •H 1 -4-
19 21
• I
i i III ii mini
K19M22
12 13 14 161718 20 22 2324 28
n iIII I I I Mi I I Ii: I
i • in m inn ii II mi II mi i
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protem db hit
Grail exon
dedi
ident
iced
iher
ger l e s
b i rPoniti
ectlono n
5' 3'N o . of No. of
ESTLengt]Ii Infori
Sequenatin e e
on ui
lb[i the i IlOft »'l milar »eq
U v i
uencesrlap Ideiitity b eh inition
MZNl.lMZNl.2MZNl.3MZNl.4MZNl.iMZN1.CMZNl.7MZNl.8MZNl.9MZNl.10MZNl.11
MZNl.12MZNl.13
MZNl.15MZN1.1CMZNl.17
MZNl.18MZNl.19MZNl.20MZNl.21MZNl.22MZNl.23MZNl.24MZN1.25MZNl.20
11607140191715318M02095428557330173454830107
3905842848
47745
024380407709017
000370348400342089947030072907740017820880G33
124301031217338202772079030377339283409738397
413494C028
50050
040900707209922
013490404108204700907118173708758057891981207
833517308248
62402
109560710450
352
13008704gi|1053859|dbj|BAAl8709|«i|4185131|gb|AAD08934.1|
Ki|3420050|gb| AAC31851.il«i|3935138|gb| AAC80581.1|*i|4587989|gb| AAD25930.1|Ki|5541697|emb|CAB51202.
gi|6522600|enib|CAB61965.1|
561 gi |902923 |dbj |B AA075471
909 gi| 1771381 |emb|CAA65127|
597 gi|1771381|emb|CAA65127|
204 gi|6180043|gb|AAF05700.1|519 gil3201477lemblCAA06808.ll302 gi|3695403
386 Ki|&051764|etnb|CAB45057.1|576 gi|2134979|pir||I38909310 gi|3242704|«b|AAC23756.1|225 Kij3242704|^b| AAC23756.1|226 «i|3242704|gb| AAC23756.1|302 «i|3242704|gb| AAC23756.1|101 «iJ0522597|eiiib|CAB019G2.1|188 gi|5031275|xb|AAD38143.1|
153153101
343490092103
30.034.4
100.0
09.030.247.979.8
18400009
309322220224220270100182
100.080.447.1
37.031.047.082.773.004.907.702.3
(AF049230) unknown A. thali*(D90917) hypothetical protein Synechocyvfui BJJ.(AC005724) putative RING zinc finger protein A. thalitut*(U62742) Ran binding protein 1 homolo* A. thalim,*
(AC004680) hypothetical protein A. thalitui*(AC005106) T25N20.2 A. HuJiana(AF085279) hypothetical Cy^-3-His zinc finger protein A. thuiiiuui(AL096800) putative protein A. tlialitui*
(AL133292) l-aminocyclopropane-l-i;iirboxylic acid oxida*e-likeprotein A. bhHlitui*(D38544) phocphoinoisitide specific phonpholipafe C A. tItalian*(X95877) phocphuiuoyitide-specific phoopholipafe C JVicotiana rus-
(X95877) pliosphoiuoyitide-specific phocpholipane C NkotituiH rutt-tita(AF192490) cyclophilin A. t/iaiiaiia(AJ000021) putative PRLl a^ociated protein A. thxli*i,x(AF090373) contains similarity to the pfkB family of carbohydratekiuavctf (Pf»m: PF00294: E=1.6e-75) A. tlutlimiu
(AL078637) putative protein A. tliuliau*damage-specific DNA binding protein 2 - hunmn(AC003040) hypothetic*! protein A. tlialuut*AC003040) hypothetical protein A. thalianaAC003040) hypothetical protein A. thalintuiAC003040) hypothetical protein A. th*luuiaAL133292) RNA binding-like protein A. thalum*AF139496) unknown Primus armertiaca
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018
Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018