GENEVESTIGATOR.ArabidopsisMicroarrayDatabaseandAnalysisToolbox1[w]
PhilipZimmermann2,MatthiasHirsch-Hoffmann2,LarsHennig,andWilhelmGruissem*InstituteofPlantSciences,SwissFederalInstituteofTechnologyandZurich-BaselPlantScienceCenter,ETHCenter,CH–8092Zurich,Switzerland(P.Z.,M.H.-H.,L.H.,W.G.);andFunctionalandGenomicsCenterZurich,UNIIrchel,Y32H52,CH–8057Zurich,Switzerland(W.G.)
High-throughputgeneexpressionanalysishasbecomeafrequentandpowerfulresearchtoolinbiology.Atpresent,however,fewsoftwareapplicationshavebeendevelopedforbiologiststoquerylargemicroarraygeneexpressiondatabasesusingaWeb-browserinterface.WepresentGENEVESTIGATOR,adatabaseandWeb-browserdatamininginterfaceforAffymetrixGeneChipdata.Userscanquerythedatabasetoretrievetheexpressionpatternsofindividualgenesthroughoutchosenenvironmentalconditions,growthstages,ororgans.Reversely,miningtoolsallowuserstoidentifygenesspecificallyexpressedduringselectedstresses,growthstages,orinparticularorgans.UsingGENEVESTIGATOR,thegeneexpressionprofilesofmorethan22,000Arabidopsisgenescanbeobtained,includingthoseof10,600currentlyuncharacterizedgenes.Theobjectiveofthissoftwareapplicationistodirectgenefunctionaldiscoveryanddesignofnewexperimentsbyprovidingplantbiologistswithcontextualinformationontheexpressionofgenes.Thedatabaseandanalysistoolboxisavailableasacommunityresourceathttps://www.genevestigator.ethz.ch.
Amajorchallengeinbiologytodayisthelarge-scaledeterminationofgenefunction(Boyesetal.,2001).First,theestablishmentofstandardsandcontrolledvocabulariesfacilitatestheintegrationofexperimentaldataintoacomputationalframework,therebyallow-ingstructuredandsystematicprocessingofinfor-mation(Ashburneretal.,2000;Brazmaetal.,2001).Second,structureddatabasesanddataqueryingtoolsprovidethemeanstoassignputativefunctionalin-formationtogenes.
ThecompletesequencingoftheArabidopsisge-nomeachievedintheyear2000(TheArabidopsisGenomeInitiative,2000)enablesustomonitorgeneexpressionofthisfloweringplantonagenome-scaleusingmicroarrays.Insitusynthesisofhigh-densityoligonucleotidesonglassslides(Lockhartetal.,1996)hasbecomeapowerfultooltorapidlyintegratethesequenceknowledgeintoexpressionprofilingplat-forms,suchastheATH1fullgenomearraydevelopedbyAffymetrixandTheInstituteforGenomicResearch(TIGR),whichrepresentsapproximately23,750genesfromArabidopsis(Redmanetal.,2004).Theavailabil-ityofafull-genomearrayandthecompletetechnicalenvironmentprovidedbytheAffymetrixsystemledtoawideuseoftheGeneChiptechnologyintheplantcommunity.Thousandsofarrayshavesincebeen
ThisworkwassupportedbyETH,StrategicExcellenceProject2–74213–02/TH–8/02–2,andbytheFunctionalGenomicsCenterZurich.2Theseauthorscontributedequallytothepaper.
*Correspondingauthor;e-mailwilhelm.gruissem@ipw.biol.ethz.ch;fax41–1–632–10–79.[w]TheonlineversionofthisarticlecontainsWeb-onlydata.www.plantphysiol.org/cgi/doi/10.1104/pp.104.046367.
1processed,ofwhichasignificantnumberarepubliclyavailablethroughservicesandrepositoriessuchasNottinghamArabidopsisStockCentreTranscrip-tomicsService(NASCArrays;Craigonetal.,2004),ArrayExpressattheEuropeanBioinformaticsInstitute(EBI;Brazmaetal.,2003),orGeneExpressionOmni-bus(GEO)attheNationalCenterforBiotechnologyInformation(NCBI;Edgaretal.,2002).
Theexploitationoflarge-scalegeneexpressiondata-sets,mainlyfromSaccharomycescerevisiaeandEscheri-chiacoli,hasalreadyledtothediscoveryofglobalstructuresgoverningmetabolicandregulatorynet-works(Leeetal.,2002;Ravaszetal.,2002;Stellingetal.,2002;Ihmelsetal.,2004).Multiple-genomecompar-isonshavealsoyieldedinterestingobservationsonthemodularityandconnectivitydistributionsofgeneexpressiondata(Bergmannetal.,2004).Nevertheless,thecombinationofmultipledatasetsstillraisesanum-berofquestionsconcerningtheircompatibility,inparticularwhencomparingdatafromdifferentplat-formsandorganisms.Whileanalysesrevealingglobalpropertiesofnetworksormodulesmaynotnecessar-ilyrequirefullcompatibilityofexpressiondatasets,thedetailsareoftennoisy(Friedman,2004)andthecomparativesearchforthefunctionofindividualgenesrequiresamorestringentselection.
TheAffymetrixplatformprovidesastandardizedsystemwithahighdegreeofreproducibility(Hennigetal.,2003;Redmanetal.,2004).Althoughdatafromdifferentexperimentsmaynotbepooledforarigorousexpressionprofilinganalysis,onecanassumethatthelarge-scalecombinationandanalysisofexpressiondatafromasingleorganismusingasingleplatformliketheAffymetrixsystemallowstheidentificationofbiologicallymeaningfulexpressionpatternsof
2621
PlantPhysiology,September2004,Vol.136,pp.2621–2632,www.plantphysiol.orgÓ2004AmericanSocietyofPlantBiologists
Zimmermannetal.
individualvelopedgenes.Todate,fewtoolshavebeende-databases.for(yMGV)ThebiologistsYeasttoMicroarrayquerylargegeneGlobalexpressionanalysisgenes2004).amongofistranscriptionaladatabaseprovidingonlinetoolsViewerforthe82differentexpressiondatasets(Lelandaisprofilesofetyeastal.,etmicroarrayal.,In2004)theplantprovidescommunity,aNASCArrays(Craigonminingdataandsomerepositorysimple‘‘gene-centric’’forArabidopsisdataGENEVESTIGATORHere,tools.
wedescribeanovelonlinetooldatabasecomprisingageneexpressioncalledfunctionalitiesanddiscovery.developedanumbertooffacilitatequeryinggeneandfunctionalanalysispresentedorgan,inGENEVESTIGATORthecontextofallowsthedatatobevidualinggenesandenvironmentalplantdevelopment,plantorconditions,bothforindi-genequestionssuchforfamiliesas‘‘inwhichofgenes,growththerebyanswer-specificallyofintereststageismyofexpressedexpressed?’’or‘‘whichgenesaregenethesoftwareistoassigninroots?’’contextualThemainobjectiveexperimentsexpressionandgenedata,functionaldirectingdiscovery.
thedesigninformationofnewtoRESULTS
DatabaseConceptandSoftwareDesign
friendlyGENEVESTIGATORwasconceivedasaanalysis.onlineandItconsiststoolofforalarge-scaleexpressionuser-data(PHPaWebserverapplicationMySQLprogrammedrelationalindatabasethePHPdatabaseHypertextexperimentalworksPreprocessor)asa‘‘datawarehouse’’scriptinglanguage.containingTheasanalysiswellasdiverseandannotationtablesfordata,preprocesseddata,usingRawexperimental(Fig.1).
controlofworkflowanddatafromusersis(TGT)AffymetrixvaluesofMAS5.0softwaretoatargetprocessedvalueGeneChipare1,000collected(Liuetal.,for2002).SignalintensitiesandPcanArrayExpressbeimportedarray.Alternatively,eachhybridizedAffymetrixfromdataandannotationet(Brazmaetpublical.,2003)repositoriesandGEOsuch(Edgarassets)al.,theirto2002).ArabidopsisTheassignmentlocusofarrayelements(probesetsannotationsisbasedidentifiersonregularly(AGIupdatedcodes)data-andsourceobtainedfromtheArabidopsisInformationRe-home/tair/Microarrays/Affymetrix/;(TAIR)ftpserver(ftp://ftp.arabidopsis.org/Aprilcurrentlyannotation5,2004,2004]).releasebasedonthefinalArabidopsisgenomeasofgenesInadditiontofromprobeTIGRsets[versionrepresenting5.0,Januaryarrays(endingorincludenonunique‘‘_at’’),theprobeATH1andAGGeneChipuniquemultiplemorecloselysetsrepresentingtwo‘‘_x_at’’;forcross-hybridizingrelatedgenes(ending‘‘_s_at’’)ordetails,seeRedmanprobeetal.,2004).setsAlthough
(ending2622
Figure1.ConceptanddesignofGENEVESTIGATOR.Theexperi-mentersubmitsRNAprofilingdatatothedatabasecurator,whoprocessesthedataanduploadsittothedatabase.ThedatawarehousecontainsrawsignalintensityandPvalues,aswellaspreprocessedtables.AWebserverapplicationactsasaninterfacebetweenusersandtheGENEVESTIGATORdatabase.
theseonlyprobesettypesrepresenttwoormoregenes,TheseoneGENEVESTIGATORambiguouslocusidentifierprobeisdisplayedperprobeset.totodrawsetstheareattentionhighlightedoftheuserinstructuredThethisexperimentissue.
annotationiscurated,entered,anduniqueenvironmental(e.g.ingrowtheitherstage),hierarchical(e.g.plantorgans),signedcondition).Theormulti-selectsoftwarehasformbeen(e.g.de-theseforeasyadditionsofnewannotationsinanyofingannotationtoolsformatstoandforrapidcreationofthecorrespond-providedofanalyzearrayswasandbasedvisualizethedata.Theinformationbyusersorpublicrepositories.ontheinformationMissingspondingcalculations.arraysdoesarenotimpacttheresults,asthecorre-wereAmbiguousnotincludedorunsuitableintotheannotationsrespectiveextractedfurtherrosettefromignored.wholeForexample,arraysfromRNAtoolsleaves,andinflorescence)adultplantsare(includingunsuitableroots,forandcalculations,arerelatingthereforetoplantorganspecificity(GeneAtlas)suchtheasGenebutChronologer.maynotincludedbeproperintoEachtoolfortheusecorrespondingthereforeinothertoolscessing,bestrespectiveavailablesourcesofdataaccessesforpro-separately.DatafromwhiletheunsuitableATH1anddataAGisignored.
arearrayusedDifferentsetsofoligonucleotidearraysaresequencesprocessedprobetypes,toandprobeidenticaltargetgenesonthetwohybridizationhybridizationthusanddifferentnontargetefficienciestooftargettointensitiesproducibilityimpossible.makesadirectcomparisonprobeofsignalcross-byprobeboththeATH1wasfoundAlthoughandforthemostahighdegreeofre-AGarrays,targetgenesprobeddifferingsetresultsforidentical(Hennigtargetetal.,genes2003).
yielded300stronglypairsofPlantPhysiol.Vol.136,2004
availableAsofJulycoveringdata2004,fromthe750databaseATH1containedpubliclyLaboratory81and121AGarrays2003;(http://www.pb.ethz.ch;publicexperimentsfromtheGruissem2004),Hennig(http://www.fgcz.ethz.ch),theFunctionaletal.,Genomics2004;KleffmannMengesetetal.,al.,ssbdjc2.nottingham.ac.uk/narrays/experimentbrowse.NASCArraysCenter(http://Zurichpl;www.ebi.ac.uk/arrayexpress/;Craigonetal.,2004),ArrayExpressandgov/geo/;fromGEOatBrazmaatetEBIal.,(http://2003),demicGENEVESTIGATOREdgaretNCBIal.,(http://www.ncbi.nlm.nih.is2002).
freelyaccessibletoallpresentinstitutions.data,bothpubliclySinceavailabletheasdatabasecontainsaca-atmanagementwehavewellasconfidentialAllsystemimplementedadualuserprofileloginusersarethereforeforaskedpublictoregisterandprivateonceusers.offoreachsession.WelimitthecollectionandandusetoadministerpersonalGENEVESTIGATOR.theinformationdatabaseandtowhatimproveisnecessarythetosharedwiththirdparties.
PersonalinformationutilityisnotofAnalysisTools
twoTheingtypesGENEVESTIGATORofqueries:agene-centrictoolsgenerallyapproachcontainreport-afulfillinggenome-centricsignalintensityapproachvaluesforprovidingindividuallistsgenes,andanyvaluestoolchosenarebasedcriteria.Theresultsobtainedofgenesfromcases,thepresent/absentandthecorrespondingonallavailablecallinformationannotations.signalintensityasdefinedInsomebysignalTheMAS5.0firstalgorithmisindicated(seebelow).
selectionintensitytool,valuesDigitalofNorthern,inputgeneswillretrievethelectionthosesuchexperimentstoolof(Fig.GeneChip2A)experiments.Anforelaborateachosense-thatallowsfitsingletheuserortochooseexactlyfactors.astaneously,Upanatomy,filling,displayedto10probegrowthstage,ormultipleenvironmentalcriteriainsetsseveralcanbecolors,processedshapes,simul-andpresentrevealingsymbols)call(closedbothsymbols)signalandintensityabsentvaluescall(openandintensityTheGeneinformationCorrelator(Fig.allows2B).
comparingthesignalexperimentsvaluesDigital(Fig.of2C;twoidenticalgenesthroughoutselectiontoolallchosenandthecanNorthern).beidentifiedEachbyspotmouse-overrepresentsorbyaGeneChipasforlinkingtocoefficientannotationdatabase.ThePearson’scorrelationbetweeninformationexpressionisgivenassignalsameasureoftwofortherelationshipcontextualBecausetheisvisualizedbyacolorgenes.codingPresent(Fig.2C).calladditionallyinformationobjectiveforofthethesoftwareexpressionwastoprovidethreementalmainstage,annotationfocusedonrelatinggeneexpressionofgenes,wetoandenvironmentalgroups:plantstress.
organ,develop-PlantPhysiol.Vol.136,2004
GENEVESTIGATOR
signalTheorintensityGeneAtlasvaluestoolofsimilarlyageneprovidestheaverageversely,tissuesannotatedinthedatabaseofinterest(Fig.inall2D).organsRe-forGENEVESTIGATORcanoutputlistsofgenesinwhichsignalintensitiesexceedachosenthreshold(Fig.selectedpreferentially2E).Thisorgansallowsversususersabaselinetofindchoiceoforgansroots,tionyoungleavesincertainorstamina.organsorgenestissues,expressedsuchasbytontology.org/)thewasPlantbasedOntologyonstandardanatomyTheanatomytermsasannota-defined(callus,sette,cellsuspension,thatweConsortiumclassified(http://www.plan-seedling,intosixmaingroupsTheseandisolatedcategoriesroots)coverandtheinflorescence,ro-alltissuescorrespondingthatsubgroups.extendedforbecomeasexpressiontissueandanalysis,butcancancurrentlyeasilybebegrowthTheGenemoreChronologerprecise(Birnbaumcellseparationtechniquestool,basedetal.,2003).
twosignalmainstagefeatures.ontologyFirst,(Boyesetal.,2001),onthepossessesBoyesalifegeneintensities(orexpressionitoutputslevels)theandaverageSEsofquerycycleofinterestofArabidopsisfor10representative(Fig.2F).Second,sectionsoftheaboveexample,athegivendatabasethresholdtooutputuserscanatallgenesexpressedsignaltheeachsumintensityallgenescanbechosenselectedgrowthforstages.whichFortheofallataveragetheseedlingsignalstageintensityexceedsvalues90%foroflifecyclecategory,ofthemeasuredplant(Fig.for2G).
thisgenethroughoutthetionalitiesTheResponseonasGeneViewerAtlasandtoolGeneprovidesChronologer,thesamefunc-eachstressmentscondition,responseoneannotationsor(Fig.2,HandI).basedForcorrespondingwerechosen.ingdirectcomparison.
controlEachseveralfromstressthesefactorrepresentativeexperiments,isgivenwithexperi-allow-thestudyTheMeta-Analyzerutilityhasbeendesignedtosimultaneouslythegeneexpressionprofilesofseveralgenesstresses,oforgans,inthecontextofenvironmentalsemi-colon-,genescanbeenteredandgrowthindiversestagesformats(Fig.2,J–L).Liststurn,Thelinefeed],ororspace-separated,directlycopiedfromCRLFa[carriage(comma-,re-tensityoutputpage)valuesis(seeaheatDocumentationmapofnormalizedspreadsheet).signalin-linkageclusteredusefulhierarchicalbyeithersectiononourWebclustering.single,average,orcompleteidentifytoclusterscomparemembersofThisgenetoolfamiliesisespeciallyandtoprovideFinally,perimentsuserstheDatabaseofsimilarlyexpressedgenes.
withannotationandDocumentationinformationsectionsmationwasrepository,conceived(Fig.in2,theMdatabase,aswellastechnicalaboutinfor-ex-toandbeanN).analysisSinceGENEVESTIGATORtoolandnotadataTheMicroarrayfullMIAMEareduced(MinimumsetofannotationsInformationisstoredAboutlocally.(Brazmaetal.,Experiment)2001)areavailablecompliantbylinkingannotationsatothe
2623
Zimmermannetal.
Figure2.ScreenshotsofsomeofthefeaturesofGENEVESTIGATOR.Topleft,Logoandavailabletools.A,ChipSelectiontool;B,DigitalNorthern;C,GeneCorrelator;DandE,GeneAtlas(relatestoplantanatomy);FandG,GeneChronologer(relatestotheplantgrowthstages);HandI,ResponseViewer(relatestoenvironmentalfactors);JtoL,Meta-Analyzer(multiplegeneanalysiswithrespecttoanatomy,growthstage,andenvironmentalfactors);MandN,Databasetoolforviewingexperimentandarrayannotation,andDocumentationsectionforuserinformation.
2624PlantPhysiol.Vol.136,2004
GENEVESTIGATOR
correspondingrepositorysitesfromwhichtheexperi-mentsweredownloaded.
GeneralApproachandValidation
Thedatabasecontainsexpressiondatafromahighdiversityofexperimentscoveringdifferenttissues,ages,andtreatments(TableI).Thegeneralhypothesisinourapproachisthatasthenumberofexperimentspercategory(e.g.growthstage5.10)increases,in-dividualeffectsareaveragedoutandglobaltrendsbecomevisible.Asameasureofconfidencefortheexpressionofgenesindifferentcategories,weindicatetherespectivenumberofGeneChipsandtheSEofthemeanforeachcategory.
Tovalidateourhypothesis,wecheckedwhetherstronglypopulatedcategoriesyieldresultsthatareconsistentwiththeliterature.Inafirststep,weselectedanumberofmarkergeneswithpreferentialexpressioninparticularorgans,atspecificgrowthstages,orinresponsetocertainstressesandthenan-alyzedtheirexpressionpatternsgeneratedbyGENE-VESTIGATOR.Markergeneswerechosenfromtheliterature.
First,usingGeneAtlas,threeAGAMOUS-likegenesknowntobepreferentiallyexpressedinrootsas
measuredbyreversetranscription-PCR(AGL12[At1g71692],AGL14[At4g11880],andAGL17[At2g22630];Parenicovaetal.,2003)infactshowedstrongexpressioninrootsandradicle,butweakersignalsinallotherorgans(Fig.3,A–C).Twogenesassociatedwithpollentubegrowth(At1g55570,Albanietal.,1992;andAt2g25600,Moulineetal.,2002)werealsoidentifiedasbeingspecifictostamina(andbyextensiontothecategories‘‘flower’’and‘‘inflores-cence’’)inourexpressiondatabase(Fig.3,DandE).Furthermore,twogenesinvolvedinphotosynthesis(chlorophylla/bbindingproteins,At1g19150andAt3g040)werefoundtobeabundantlyexpressedingreenplanttissues(rosette,caulineleaf,stem,node,flower,cotyledon,andhypocotyl),butlowlyexpressedinphotosyntheticallyinactivetissues(roots,stamen,andseeds;Fig.3,FandG).Thispatternwasobservedforallgenesfromthechlorophylla/bbindingfamilyexceptforonegene(TAIR;http://www.arabidopsis.org/info/genefamily/Chloroplast.html;seeSupple-mentalTableII,availableatwww.plantphysiol.org).Second,toverifythereliabilityoftheGeneChronol-ogertool,welookedforgenesannotatedasbeingdevelopmentallyregulated.Twogenesinvolvedinseedgerminationandseedlingdevelopment(encodingtheembryonicabundantproteinATEM1[AT3G51810,
TableI.AnnotationcategoriesincorporatedinGENEVESTIGATORasofJuly2004
PlantTissues/Organs
DevelopmentalStages
EnvironmentalFactors
(Continued)
0Callus
1Cellsuspension2Seedling
21Cotyledons22Hypocotyl23Radicle3Inflorescence31Flower311Carpel312Petal313Sepal314Stamen315Pedicel32Silique33Seed34Stem35Node
36Shootapex37Caulineleaf4Rosette
41Juvenileleaf42Adultleaf43Petiole
44Senescentleaf5Roots
51Primaryroot52Lateralroot53RoothairRoottip
55Elongationzone
PlantPhysiol.Vol.136,2004
10CategoriesbasedontheBoyeskeyontology:A)0.10.0.70B)1.00.1.02C)1.03.1.05
D)1.06.1.08/3.20E)1.09.1.12/3.50
F)1.13/1.14/3.70/5.10G)3.90/6.00/6.10H)6.30/6.50I)6.90/8.00J)9.70
HormonesEthyleneAuxin
AbscisicacidGibberellinAtmosphereOzone
CarbondioxideIlluminationLightintensityLightDark
LightqualityFar-redBlueUVAUVBVisible
Bioticinteractions
PseudomonassyringaeGigasporarosea
AgrobacteriumtumefaciensHeteroderaschachtiiErisyphecichoracearumProgrammedcelldeathSenescenceHeatCold
2625
EnvironmentalFactorsNutrients/heavymetalsPhosphateNitrateSulfatePotassiumWaterSuc/GlcLeadZinc
Zimmermannetal.
Figure3.ValidationofthequalityofdatageneratedbyGENEVESTIGATOR.AtoG,Expressionoforganortissue-specificmarkergenesusedfortestingtheGeneAtlastool(A,AGL12,At1g71692;B,AGL14,At4g11880;C,AGL17,At2g22630;D,At1g55570;E,At2g25600;F,At1g19150;G,At3g040).HtoK,ExpressionofgrowthstagespecificmarkergenesusedtovalidatetheGeneChronologertool(H,ATEM1,At3g51810;I,At4g37580;J,APETALA1,At1g69120;K,FLOWERINGLOCUST,At1g680).LtoQ,ExpressionofenvironmentalfactorspecificmarkergenestovalidatetheResponseViewertool(L,At4g14690;M,At5g190;N,ERF1,At3g23240;O,AtERF1,At4g17599;P,AtERF2,At5g47220;Q,AtERF13,At2g44840).
Vicientetal.,2000]andageneinvolvedinapicalhookdevelopment[At4g37580,Lehmanetal.,1996])showedhighestexpressionduringmatureseedandgermi-nationstages(Fig.3,HandI),butlowerlevelsinallotherstages.Incontrast,twogenesinvolvedinflow-ering(APETALA1[At1g69120,Pelazetal.,2001]andFLOWERINGLOCUST[At1g680,Ruiz-Garciaetal.,1997])wereshowntobemostabundantlyexpressedinthefloweringstages(Fig.3,JandK).
2626
Third,theResponseViewertoolwasusedforseveralgenesknowntoberesponsivetoparticularstresses(Fig.3,L–Q).GENEVESTIGATORcorrectlyshowedtheexpressionpatternofalight-inducedgeneencodingalight-harvestingchlorophylla/bbindingprotein(AT4G14690,Janssonetal.,2000)andofthelight-repressedprotochlorophyllidereductaseAgene(At5g190,Rungeetal.,1996;Fig.3,LandM,respectively).Similarly,fourgenesreportedtobe
PlantPhysiol.Vol.136,2004
GENEVESTIGATOR
TableIIA.Representativesamplesofgenesexpressedinspecifictissuesoratparticulargrowthstages
(Tablecontinuesonfollowingpage.)
PlantPhysiol.Vol.136,2004
2627
Zimmermannetal.
TableIIB.
(Tablecontinuesonfollowingpage.)
2628PlantPhysiol.Vol.136,2004
GENEVESTIGATOR
TableIIC.
(Tablecontinuesonfollowingpage.)
responsivetoethylene(ERF1[At3g23240];AtERF1[At4g17500];AtERF2[At5g47220];andAtERF13[At2g44840])werecorrectlyfoundbythesoftwaretoberesponsivetoethyleneandtothepathogenPseudo-monassyringae,asreportedbytheauthors(Onate-SanchezandSingh,2002;Fig.3,N–Q).
Thisfirstvalidationstepconfirmsthatglobaltrendscanbedetectedintheexpressionprofilesofindividualgenesbycombiningnumerousnormalizedexpressiondatasetsusingthesametechnicalplatform,i.e.theAffymetrixsystem.Basedonthisinformation,weperformedasecondvalidationstep,inwhichwetestedwhetherGENEVESTIGATORcanidentifygeneswithknownexpressionprofiles.UsingGeneAtlas,72geneswereidentifiedtobeexpressedinpollen.Ofthese,9hadbeenidentifiedbyHonysandTwell(2003)aswellasBeckeretal.(2003)tobepollen-specificusing8KArabidopsisGenomeArrays(seeTableIIA;SupplementalTableII).Oftheremaininggenes,severalcouldbefunctionallyassociatedwithpollenbasedonannotationssuchas‘‘self-incompati-bilityprotein,’’‘‘pollencoatprotein-related,’’or‘‘al-lergen.’’Further,14geneswereannotatedas‘‘expressedprotein,’’revealingthepotentialofGENE-VESTIGATORtoidentifynovelgenesrelatedto
PlantPhysiol.Vol.136,2004
particularorgans.Asimilaranalysiswasperformedtoidentifygenesexpressedspecificallyinsiliques(TableIIB,comparewithHennigetal.,2004),roots,photosyntheticactivetissues,leaves,senescentleaves,stemandnode,carpel,petal,sepal,andshootapex(seeSupplementalTableII)andatspecificdevelop-mentalstagessuchasseedlingstage(TableIIC)orearlyfloweringstage(TableIID;SupplementalTableII).Weconcludethatwiththecurrentsetofdata,GENEVESTIGATORgenerateshighqualityresults.Moreover,weexpectthatthisqualitywillcontinuetoriseasthesizeofthedatasetincreases.
DISCUSSION
PublicrepositoriessuchasGEOandArrayExpressprovidetoolsforsubmission,storage,andretrievalofheterogeneousdatasets.Incontrast,GENEVESTIGATORcontainsacoherentdatasetfromasingleorgan-ismgeneratedonacommonhybridizationplatform.Despitethehighdiversityofexperimentsrepresentedinthedatabase,thevalidationstepswecarriedoutdem-onstratethattheunderlyinghypothesisisvalidandthatbiologicallymeaningfulresultscanbeobtained
2629
Zimmermannetal.
TableIID.
Genesexpressedpreferentially(A)instaminaandpollen,(B)inseedsandsiliques,(C)duringseedlingstage,and(D)duringearlyfloweringstage.Forthedescriptionofgrowthstagegroups(labeledA–J),seeTableI.SeealsoSupplementalTableII,whichprovideslistsofgenesexpressedpreferentiallyinroots,greentissues,photosyntheticactiveleaves,senescentleaves,stemandnode,carpel,petal,sepal,andshootapex.
usingGENEVESTIGATOR.Thesoftwaregenerallyperformsprimarylevelanalysisanddisplaysresultseitherasgraphsorasnumericdata,whichcaneasilybecombined,exported,orfurtheranalyzedwithotherdataanalysisandvisualizationtools.
Thecomplexityofmulticellularliferequiresthepropercontext-dependentexpressionofgenes,whichisachievedbyhighlyinterconnectedtranscriptionalnetworks.Theinferenceofsuchmodulenetworksmayrequiretheuseofmanydatatypessuchasgeneexpression,proteinabundance,proteininteraction,metaboliteabundance,affinityprecipitation,syntheticlethality,etc.(Troyanskayaetal.,2003).Nevertheless,theanalysisofgeneexpressiondatacanrevealsignif-icantpatternsofsuchnetworks(Segaletal.,2003).Incontrasttomanyothertools,GENEVESTIGATORusesexperimentannotationtoyieldcontextualinformationthatcanbebroughtintounderstandinggenenet-works.Theidentificationofgenesexhibitingsimilartissuelocalizationandstressresponseattributesfacil-2630
itatesmodelingofgenenetworksusingnetworkin-ferencetools(Willeetal.,2004)byreducingthenumberoftestablecandidates.Thus,thecombinedgene-centricandgenome-centricapproachesmakeitapowerfultoolfortargetedfunctionalgenomicsefforts.
CriticalissuesinusingtheGENEVESTIGATORtoolsare(1)thequestionsbeingaddressedbyque-riesand(2)theinterpretationofoutputdata.First,GENEVESTIGATORallowsqueriesatahighlevelofdetailandinalargevarietyofcombinationsspecifyingorgan,developmentalstage,ortreatment.AlthoughGENEVESTIGATORcurrentlycontainsinformationfrommorethan750publiclyavailablefullgenomearrays,somecombinationsatverydetailedlevelmaynotyethavesufficientdatasupporttoyieldrobustresults.Thequalityoftheresultsthereforedependsstronglyonthelevelofgranularitytheuserchoosesandthenumberandtypesofunderlyingexperiments.Second,caremustbetakennottoover-interpret
PlantPhysiol.Vol.136,2004
outputfacilitatedatacomputedbyGENEVESTIGATOR.Toperdatainterpretation,thenumberofsamplesNevertheless,categoryandgranularity,whentheSEworkingsoftheinmeansaareindicated.advisedoriginofusingapost-verificationtheeffectstheDigitalobserved.
NorthernofindividualdetailedtooltoconfirmgenesleveloftheisCONCLUSION
GENEVESTIGATORBoththeforwardandreverseoftechnologyannotateddatafromrevealedvarioussourcesthatthevalidationusingcombinationofcontextualIninformationplatformisaboutavalidthesameelementsapproachoftorevealgenesourcase,contextfromtheArabidopsisexpressionprofilesofmorethethandataset.22,000ronmentalofplantcanbegeneratedintheriesstress.organ,plantdevelopmentandenvi-arrays,arecurrentlyAlthoughwellcoverednotinalltermsannotatedofnumbercatego-mayobtainedbeandsomewhatthereforebiased,theoutputthegeneralfromthesecategoriesofmanentconstantlysubmissionusingGENEVESTIGATORofnewdatasetsisqualityhigh.ofTheresultsper-resultinghypothesesinformationimprovethecanqualityisexpectedtobeoftheoutput.Theexpressionnetworks,networkorgenerateusedtoconfirmpreviousstructuresnewhypothesesandaboutgenetargetedexperiments.
resultinginthedesignofgeneticmorepreciseregulatoryandACKNOWLEDGMENTS
WethankEvaVranova
´andFranziskaHumairforfeedbackontheuseofthesoftwareindevelopment.WearealsogratefultotheFunctionalGenomicsCenterZurichforprovidingsupportandtheAffymetrixplatformforGeneChipexperiments,aswellasallpublicrepositoriesforprovidingdata.ReceivedMay14,2004;returnedforrevisionJuly12,2004;acceptedJuly16,2004.
LITERATURECITED
TheArabidopsisGenomeInitiative(2000)AnalysisofthegenomesequenceofthefloweringplantArabidopsisthaliana.Nature408:796–815AlbaniD,SardanaR,RobertLS,AltosaarI,ArnisonPG,FabijanskiSF(1992)ABrassicanapusgenefamilywhichshowssequencesimilaritytoascorbateoxidaseisexpressedindevelopingpollen.Molecularcharac-terizationandanalysisofpromoteractivityintransgenictobaccoplants.PlantJ2:331–342
AshburnerM,BallCA,BlakeJA,BotsteinD,ButlerH,CherryJM,DavisAP,DolinskiK,DwightSS,EppigJT,etal(2000)Geneontology:toolfortheunificationofbiology.TheGeneOntologyConsortium.NatGenet25:25–29
BeckerJD,BoavidaLC,CarneiroJ,HauryM,FeijoJA(2003)Transcrip-tionalprofilingofArabidopsistissuesrevealstheuniquecharacteristicsofthepollentranscriptome.PlantPhysiol133:713–725
BergmannS,IhmelsJ,BarkaiN(2004)Similaritiesanddifferencesingenome-wideexpressiondataofsixorganisms.PLoSBiol2:E9
BirnbaumK,ShashaDE,WangJY,JungJW,LambertGM,GalbraithDW,BenfeyPN(2003)AgeneexpressionmapoftheArabidopsisroot.Science302:1956–1960
BoyesDC,ZayedAM,AscenziR,McCaskillAJ,HoffmanNE,DavisKR,GorlachJ(2001)Growthstage-basedphenotypicanalysisofArabidop-
PlantPhysiol.Vol.136,2004GENEVESTIGATOR
sis:amodelforhighthroughputfunctionalgenomicsinplants.PlantCell13:1499–1510
BrazmaA,HingampP,QuackenbushJ,SherlockG,SpellmanP,StoeckertC,AachJ,AnsorgeW,BallCA,CaustonHC,etal(2001)Minimuminformationaboutamicroarrayexperiment(MIAME)—towardstandardsformicroarraydata.NatGenet29:365–371
BrazmaA,ParkinsonH,SarkansU,ShojatalabM,ViloJ,Abeygunawar-denaN,HollowayE,KapusheskyM,KemmerenP,LaraGG,etal(2003)ArrayExpress—apublicrepositoryformicroarraygeneexpres-siondataattheEBI.NucleicAcidsRes31:68–71
CraigonDJ,JamesN,OkyereJ,HigginsJ,JothamJ,MayS(2004)NASCArrays:arepositoryformicroarraydatageneratedbyNASC’stranscriptomicsservice.NucleicAcidsRes(Databaseissue)32:D575–D577
EdgarR,DomrachevM,LashAE(2002)GeneExpressionOmnibus:NCBIgeneexpressionandhybridizationarraydatarepository.NucleicAcidsRes30:207–210
FriedmanN(2004)Inferringcellularnetworksusingprobabilisticgraph-icalmodels.Science303:799–805
HennigL,GruissemW,GrossniklausU,Ko
¨hlerC(2004)Transcriptionalprogramsofearlystagesofplantreproduction.PlantPhysiol135:1765–1775
HennigL,MengesM,MurrayJA,GruissemW(2003)ArabidopsistranscriptprofilingonAffymetrixGeneChiparrays.PlantMolBiol53:457–465
HonysD,TwellD(2003)ComparativeanalysisoftheArabidopsispollentranscriptome.PlantPhysiol132:0–652
IhmelsJ,LevyR,BarkaiN(2004)PrinciplesoftranscriptionalcontrolinthemetabolicnetworkofSaccharomycescerevisiae.NatBiotechnol22:86–92
JanssonS,AnderssonJ,KimSJ,JackowskiG(2000)AnArabidopsisthalianaproteinhomologoustocyanobacterialhigh-light-inducibleproteins.PlantMolBiol42:345–351
KleffmannT,RussenbergerD,vonZychlinskiA,ChristopherW,SjolanderK,GruissemW,BaginskyS(2004)TheArabidopsistha-lianachloroplastproteomerevealspathwayabundanceandnovelproteinfunctions.CurrBiol14:3–362
LeeTI,RinaldiNJ,RobertF,OdomDT,Bar-JosephZ,GerberGK,HannettNM,HarbisonCT,ThompsonCM,SimonI,etal(2002)TranscriptionalregulatorynetworksinSaccharomycescerevisiae.Science298:799–804
LehmanA,BlackR,EckerJR(1996)HOOKLESS1,anethyleneresponsegene,isrequiredfordifferentialcellelongationintheArabidopsishypocotyl.Cell85:183–194
LelandaisG,LeCromS,DevauxF,VialetteS,ChurchGM,JacqC,MarcP(2004)yMGV:across-speciesexpressiondataminingtool.NucleicAcidsRes(Databaseissue)32:D323–D325
LiuWM,MeiR,DiX,RyderTB,HubbellE,DeeS,WebsterTA,HarringtonCA,HoMH,BaidJ,SmeekensSP(2002)Analysisofhighdensityexpressionmicroarrayswithsigned-rankcallalgorithms.Bio-informatics18:1593–1599
LockhartDJ,DongH,ByrneMC,FollettieMT,GalloMV,CheeMS,MittmannM,WangC,KobayashiM,HortonH,etal(1996)Expressionmonitoringbyhybridizationtohigh-densityoligonucleotidearrays.NatBiotechnol14:1675–1680
MengesM,HennigL,GruissemW,MurrayJA(2003)Genome-widegeneexpressioninanArabidopsiscellsuspension.PlantMolBiol53:423–442
MoulineK,VeryAA,GaymardF,BoucherezJ,PilotG,DevicM,BouchezD,ThibaudJB,SentenacH(2002)PollentubedevelopmentandcompetitiveabilityareimpairedbydisruptionofaShakerK(1)channelinArabidopsis.GenesDev16:339–350
Onate-SanchezL,SinghKB(2002)IdentificationofArabidopsisethylene-responsiveelementbindingfactorswithdistinctinductionkineticsafterpathogeninfection.PlantPhysiol128:1313–1322
ParenicovaL,deFolterS,KiefferM,HornerDS,FavalliC,BusscherJ,CookHE,IngramRM,KaterMM,DaviesB,etal(2003)MolecularandphylogeneticanalysesofthecompleteMADS-boxtranscriptionfactorfamilyinArabidopsis:newopeningstotheMADSworld.PlantCell15:1538–1551
PelazS,Gustafson-BrownC,KohalmiSE,CrosbyWL,YanofskyMF(2001)APETALA1andSEPALLATA3interacttopromoteflowerde-velopment.PlantJ26:385–394
2631
Zimmermannetal.
RavaszE,SomeraAL,MongruDA,OltvaiZN,BarabasiAL(2002)Hierarchicalorganizationofmodularityinmetabolicnetworks.Science297:1551–1555
RedmanJC,HaasBJ,TanimotoG,TownCD(2004)DevelopmentandevaluationofanArabidopsiswholegenomeAffymetrixprobearray.PlantJ38:5–561
Ruiz-GarciaL,MaduenoF,WilkinsonM,HaughnG,SalinasJ,Martinez-ZapaterJM(1997)Differentrolesofflowering-timegenesintheactivationoffloralinitiationgenesinArabidopsis.PlantCell9:1921–1934
RungeS,SperlingU,FrickG,ApelK,ArmstrongGA(1996)Distinctrolesforlight-dependentNADPH:protochlorophyllideoxidoreductases(POR)AandBduringgreeninginhigherplants.PlantJ9:513–523SegalE,ShapiraM,RegevA,Pe’erD,BotsteinD,KollerD,FriedmanN(2003)Modulenetworks:identifyingregulatorymodulesandtheir2632condition-specificregulatorsfromgeneexpressiondata.NatGenet34:166–176
StellingJ,KlamtS,BettenbrockK,SchusterS,GillesED(2002)Metabolicnetworkstructuredetermineskeyaspectsoffunctionalityandregula-tion.Nature420:190–193
TroyanskayaOG,DolinskiK,OwenAB,AltmanRB,BotsteinD(2003)ABayesianframeworkforcombiningheterogeneousdatasourcesforgenefunctionprediction(inSaccharomycescerevisiae).ProcNatlAcadSciUSA100:8348–8353
VicientCM,HullG,GuilleminotJ,DevicM,DelsenyM(2000)Differ-entialexpressionoftheArabidopsisgenescodingforEm-likeproteins.JExpBot51:1211–1220
WilleA,ZimmermannP,Vranova
´E,BleulerS,Fu¨rholzA,HennigL,LauleO,Prelı
´cA,vonRohrP,ThieleL,etal(2004)Sparsegraphicalgaussianmodelingforgeneticregulatorynetworkinference.GenomeBiol(inpress)
PlantPhysiol.Vol.136,2004
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- niushuan.com 版权所有 赣ICP备2024042780号-2
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务