学科分类
/ 4
67 个结果
  • 简介:数据存取延期成为了高端计算系统的突出的性能瓶颈。在系统设计减少数据存取延期的关键是减少数据货摊时间。存储器地区和并发是影响现代存储器系统的性能的二个必要因素。因为全面存储器系统性能上的存储器并发的影响很好没被理解,然而,存在在利用数据存取并发上在很少减少数据货摊时间学习焦点。在这研究,一双新奇数据货摊时间模型,为地区和并发的联合努力的L-C模型和为数据上的纯失误的效果的下午模型阻止时间,被介绍。模型提供数据存取延期的新理解并且为表演优化提供新方向。基于这些新模型,先进缓存优化的一张概括表格被介绍。当时,被数据并发贡献了,把38个条目仅仅,21个条目由数据地区作出贡献,它显示出数据并发的值。在这研究介绍的L-C和下午模型和他们的联系结果和机会为数据中央的建筑学和算法现代计算系统设计的未来重要、必要。

  • 标签: 数据访问 并发性 失速 时间 存储器系统 评估
  • 简介:Spatialapplicationswillgainhighcomplexityasthevolumeofspatialdataincreasesrapidly.Asuitabledataprocessingandcomputinginfrastructureforspatialapplicationsneedstobeestablished.Overthepastdecade,gridhasbecomeapowerfulcomputingenvironmentfordataintensiveandcomputingintensiveapplications.Integratinggridcomputingwithspatialdataprocessingtechnology,theauthorsdesignedaspatialdataprocessinggrid(calledSDPG)toaddresstherelatedproblems.RequirementsofspatialapplicationsareexaminedandthearchitectureofSDPGisdescribedinthispaper.KeytechnologiesforimplementingSDPGarediscussedwithemphasis.

  • 标签: SDPG 网格计算 空间数据处理 GIS 应用软件
  • 简介:赛门铁克于近期宣布了一个绝无仅有的融合式解决方案,其可促进企业在跨异质化应用程序、数据库、服务器与存储的平台间于一致的基础架构软件上进行标准化。

  • 标签: FOUNDATION 赛门铁克 CENTER DATA 应用程序 基础架构
  • 简介:Withmassiveamountsofdatastoredindatabases,mininginformationandknowledgeindatabaseshasbecomeanimportantissueinrecentresearch.Researchersinmanydifferentfieldshaveshowngreatinterestindateminingandknowledgediscoveryindatabases.Severalemergingapplicationsininformationprovidingservices,suchasdatawarehousingandon-lineservicesovertheInternet,alsocallforvariousdataminingandknowledgediscoverytchniquestounderstandusedbehaviorbetter,toimprovetheserviceprovided,andtoincreasethebusinessopportunities.Inresponsetosuchademand,thisarticleistoprovideacomprehensivesurveyonthedataminingandknowledgediscorverytechniquesdevelopedrecently,andintroducesomerealapplicationsystemsaswell.Inconclusion,thisarticlealsolistssomeproblemsandchallengesforfurtherresearch.

  • 标签: 数据库 知识发现 机器学习 数据开采
  • 简介:ETL(Extract-Transform-Load)通常包括三个阶段:抽取,转变,并且装载。在造数据仓库里,它起数据注射的作用并且是最费时间的活动。因此改进ETL的表演是必要的。在这份报纸,一条新ETL途径,电话(Transform-Extract-Load)被建议。电话途径使用虚拟表格在抽取舞台和装载舞台前认识到转变阶段,在存储从每迥异的来源数据系统提取的未加工的数据的数据阶段区域或阶段数据库外面。电话途径减少数据传播负担,并且从存取层改进询问的性能。试验性的结果基于我们的建议基准证明电话途径可行、实际。

  • 标签: ETL 虚拟化 临时数据库 数据仓库 原始数据 数据传输
  • 简介:Thisresearchtakestheviewthatthemodellingoftemporaldataisafundamentalsteptowardsthesolutionofcapturingsemanticsoftime.Theproblemsinherentinthemodellingoftimearenotuniquetodatabaseprocessing.Therepresentationoftemporalknowledgeandtemporalreasoningarisesinawiderangeofotherdisciplines.Inthispaperanaccountisgivenofatechniqueformodellingthesemanticsoftemporaldataanditsassociatednormalizationmethod.ItdiscussesthetechniquesofprocessingtemporaldatabyemployingaTimeSequence(TS)datamodel.Itshowsanumberofdifferentstrategieswhichareusedtoclassifydifferentdatapropertiesoftemporaldata,anditgoesontodevelopthemodeloftemporaldataandaddressesissuesoftemporaldataapplicationdesignbyintroducingtheconceptoftemporaldatanormalisation.

  • 标签: 相关数据库 数据存储 时间序列模型
  • 简介:Page-basedsoftwareDSMsystemssufferfromfalsesharingcausedbythelargesharinggranularity,andonlysupportone-dimensionBlockorCyclicblockdatadistributionschemes,Thusapplicationsrunningonthemwillsufferfrompoordatalocalityandwillbeabletoexploitparallelismonlywhenusingalargenumberofprocessors,Inthispaper.awaytowardssupportingflexibledatadistribution(FDD)onsoftwareDSMsystemispresented.Smallgranularity-tunableblocks,thesizeofwhichcanbesetbycompilerorprogrammer,areusedtooverlaptheworkingdatasetsdistributedamongprocessors.TheFDDwasimplmentedonasoftwareDSMsystemcalledJIAJIA.ComparedwithBlock/Cyclic-blockdistributionschemesusedbymostDSMsystemsnow,experimentsshowthattheproposedwayofflexibledatadistributionismoreeffective.Theperformanceoftheapplicationsusedintheexperimentsissignificantlyimproved.

  • 标签: 软件开发 DSMS 数据分布
  • 简介:Thispaperpresentsanewefficientalgorithmforclusteringcategoricaldata,Squeezer,whichcanproducehighqualityclusteringresultsandatthesametimedeservegoodscalability.TheSqueezeralgorithmreadseachtupletinsequence,eitherassigningttoanexistingcluster(initiallynone),orcreatingtasanewcluster,whichisdeterminedbythesimilaritiesbetweentandclusters.Duetoitscharacteristics,theproposedalgorithmisextremelysuitableforclusteringdatastreams,wheregivenasequenceofpoints,theobjectiveistomaintainconsistentlygoodclusteringofthesequencesofar,usingasmallamountofmemoryandtime.OutlierscanalsobehandledefficientlyanddirectlyinSqueezer.Experimentalresultsonreal-lifeandsyntheticdatasetsverifythesuperiorityofSqueezer.

  • 标签: 数据处理 群集分类数据 有效算法 KDD
  • 简介:为越过在象文件同步那样的应用程序的宽区域网络(广域网)的文件通讯的数据deduplication并且云环境反射通常完成以数据deduplication的重要时间开销的成本节省的重要带宽。时间开销包括在二个地理上分布式的节点为数据deduplication要求的时间(例如,磁盘存取瓶颈)并且在发送者之间的复制质问/答案操作和接收装置,后来,每询问或答案介绍至少一个潜伏的双程的时间(RTT)。在这份报纸,我们在场越过有元数据反馈和元数据利用(MFMU)的广域网的一个数据deduplication系统,联系了时间开销以便利用数据deduplication。在建议MFMU系统,到发送者的从接收装置的选择元数据反馈被介绍减少复制质问/答案操作的数字。另外,到马具,元数据在接收装置联系了磁盘I/O操作,以及带宽开销由元数据反馈介绍了,磁滞现象哈希值重新组合机制基于的元数据利用部件被介绍。我们的试验性的结果证明MFMU与保存没被元数据反馈减少的比率的带宽完成了20%40%deduplication加速的一般水准,当与基线相比内容定义组合(CDC)在LBFS(Low-bandwith网络文件系统)使用,组合算法的退出的最先进的Bimodal基于数据deduplication解决方案。

  • 标签: 数据删除 元数据 广域网 反馈 I/O操作 时间开销
  • 简介:当前的流行系统,Hadoop和火花,当运行反复的大数据应用程序时,因为计算和通讯的低效的重叠,不能完成满足的性能。计算,数据运动,和数据管理的管道为计算系统的当前的分布式的数据起一个关键作用。在这份报纸,我们首先分析开销洗牌在Hadoop的操作并且当运用PageRank工作量时,发出火花,然后建议一条事件驱动管道和在里面记忆洗牌有更好作为DataMPI重复计算和通讯重叠的设计,一个基于MPI的图书馆,为反复的大数据计算。我们的表演评估表演DataMPI重复能为PageRank和K工具在Apache火花上在ApacheHadoop,和2X3X加速上完成9X21X加速。

  • 标签: 数据计算 I迭代 PAGERANK MPI Apache 计算系统
  • 简介:Inthispaper,ARMiner,adataminingtoolbasedonassociationrules,isintroduced.Beginningwiththesystemarchitecture,thecharacteristicsandfunctionsaredis-cussedindetails,includingdatatransfer,concepthierarchygeneralization,miningruleswithnegativeitemsandthere-developmentofthesystem.Anexampleofthetool'sapplicationisalsoshown.Finally,someissuesforfutureresearcharepresented.

  • 标签: ARMiner 数据开采工具 机器学习
  • 简介:AmajoroverheadinsoftwareDSM(DistributedSharedMemory)isthecostofremotememoryaccessesnecessitatedbytheprotocolaswellasinducedbyfalsesharing.ThispaperintroducesadynamicprefetchingmethodimplementedintheJIAJIAsoftwareDSMtoreducesystemoverheadcausedbyremoteaccesses.TheprefetchingmethodrecordstheinterleavingstringofINV(invalidation)andGETP(gettingaremotepage)operationsforeachcachedpageandanalyzestheperiodicityofthestringwhenapageisinvalidatedonalockorbarrier.AprefetchingrequestisissuedafterthelockorbarrieriftheperiodicityanalysisindicatesthatGETPwillbethenextoperationinthestring.Multipleprefetchingrequestsaremergedintothesamemessageiftheyaretothesamehost,Performanceevaluationwitheightwell-acceptedbenchmarksinaclusterofsixteenPowerPCworkstationsshowsthattheprefetchingschemecansignificantlyreducethepagefaultoverheadandasaresultachievesaperformanceincreaseof15%-20%inthreebenchmarksandaround8%-10%inanotherthree.Theaverageextratrafficcausedbyuselessprefetchesisonly7%-13%intheevaluation.

  • 标签: 软件开发 数据预取 DSMS
  • 简介:Approximatequeryprocessinghasemergedasanapproachtodealingwiththehugedatavolumeandcomplexqueriesintheenvironmentofdatawarehouse.Inthispaper,wepresentanovelmethodthatprovidesapproximateanswerstoOLAPqueries.Ourmethodisbasedonbuildingacompressed(approximate)datacubebyaclusteringtechniqueandusingthiscompresseddatacubetoprovideanswerstoqueriesdirectly,soitimprovestheperformanceofthequeries.WealsoprovidethealgorithmoftheOLAPqueriesandtheconfidenceintervalsofqueryresults.AnextensiveexperimentalstudywiththeOLAPcouncilbenchmarkshowstheeffectivenessandscalabilityofourcluster-basedapproachcomparedtosampling.

  • 标签: OLAP 数据处理 决策支持系统
  • 简介:从除一个地点驱动的模式以外的一个数据驱动的通讯模式继承,命名数据联网(NDN)把更好的支持提供给网络层dataflow。然而,应用程序开发者不得不处理复杂任务,例如数据分割,包确认,和流动控制,由于在网络层上的合适的运输层协议的缺乏。在这研究,我们设计一个dataflow面向的编程接口为NDN提供运输策略,它极大地在开发应用程序改进效率。这个接口介绍检索策略根据出版模式,基于当前的网络地位和数据产生控制dataflow在采用一个适应ADUpipelining算法的不同数据评估的二个应用程序数据单位(ADU)。接口也提供网络测量策略监视许多影响应用程序表演的批评度量标准。我们由实现流的一个录像验证我们的接口的功能和性能在世界范围的NDN试验床上跨越11个时区的申请。我们的实验证明接口罐头高效地支持开发高效、驾驶dataflow的NDN应用程序。

  • 标签: 编程接口 数据联网 应用程序 数据驱动 流动控制 运输层协议
  • 简介:处理的大数据正在成为数据中心计算的固执己见者部分。然而,最近的研究显示了大数据工作量不能充分利用现代记忆系统。我们发现处理的大数据的戏剧的无效从缓存失误的庞大的数量和看情况的存储器存取的货摊。在这篇论文,我们介绍二优化处理这些问题。第一是slice-and-merge策略,它减少种类过程的缓存失误率。第二优化是direct-memory-access,它改革在钥匙/值的存储使用的数据结构。这些优化被评估与微基准并且真实世界的基准HiBench。结果我们的微基准清楚地以硬件事件计数表明我们的优化的有效性;并且HiBench的另外的结果显示出1.21X一般水准加速在上申请级。两结果说明那小心的硬件/软件合作设计将改进大数据处理的存储器效率。我们的工作已经集成于为ApacheHadoop的Intel分发。

  • 标签: 数据处理 内存系统 直接存储器访问 基准测试 Apache 高速缓存
  • 简介:Duetodramaticallyincreasinginformationpublishedinsocialnetworks,privacyissueshavegivenrisetopublicconcerns.Althoughthepresenceofdifferentialprivacyprovidesprivacyprotectionwiththeoreticalfoundations,thetrade-offbetweenprivacyanddatautilitystilldemandsfurtherimprovement.However,mostexistingstudiesdonotconsiderthequantitativeimpactoftheadversarywhenmeasuringdatautility.Inthispaper,wefirstlyproposeapersonalizeddifferentialprivacymethodbasedonsocialdistance.Then,weanalyzethemaximumdatautilitywhenusersandadversariesareblindtothestrategysetsofeachother.Weformalizeallthepayofffunctionsinthedifferentialprivacysense,whichisfollowedbytheestablishmentofastaticBayesiangame.Thetrade-offiscalculatedbyderivingtheBayesianNashequilibriumwithamodifiedreinforcementlearningalgorithm.Theproposedmethodachievesfastconvergencebyreducingthecardinalityfromnto2.Inaddition,thein-placetrade-offcanmaximizetheuser'sdatautilityiftheactionsetsoftheuserandtheadversaryarepublicwhilethestrategysetsareunrevealed.Ourextensiveexperimentsonthereal-worlddatasetprovetheproposedmodeliseffectiveandfeasible.

  • 标签: PERSONALIZED PRIVACY protection GAME theory trade-off
  • 简介:Thispaperdescribesanimmersivesystem,called3DIVE,forinteractivevolumedatavisualizationandexplorationinsidetheCAVEvirtualenvironment.Combininginteractivevolumerenderingandvirtualrealityprovidesanaturalimmersiveenvironmentforvolumetricdatavisualization.Moreadvanceddataexplorationoperations,suchasobjectleveldatamanipulation,simulationandanalysis,aresupportedin3DIVEbyseveralnewtechniques.Inparticular,volumeprimitivesandtextureregionsareusedfortherendering,manipulation,andcollisiondetectionofvolumetricobjects;andtheregion-basedrenderingpipelineisintegratedwith3Dimagefilterstoprovideanimage-basedmechanismforinteractivetransferfunctiondesign.ThesystemhasbeenrecentlyreleasedaspublicdomainsoftwareforCAVE/ImmersaDeskusers,andiscurrentlybeingactivelyusedbyvariousscientificandbiomedicalvisualizationprojects.

  • 标签: 医学 造影诊断 数据处理 三维数据显示
  • 简介:Withseveralricegenomeprojectsapproachingcompletiongeneprediction/findingbycomputeralgorithmshasbecomeanurgenttask.Twotestsetswereconstructedbymappingthenewlypublished28,469full-lengthKOMEricecDNAtotheRGPBACclonesequencesofOryzasativassp.japonica:asingle-genesetof550sequencesandamulti-genesetof62sequenceswith271genes.Thesedatasetswereusedtoevaluatefiveabinitiogenepredictionprograms:RiceHMM,GlimmerR,GeneMark,FGENSHandBGF.Thepredictionswerecomparedonnucleotide,exonandwholegenestructurelevelsusingcommonlyacceptedmeasuresandseveralnewmeasures.Thetestresultsshowaprogressinperformanceinchronologicalorder.Atthesametimecomplementarityoftheprogramshintsonthepossibilityoffurtherimprovementandonthefeasibilityofreachingbetterperformancebycombiningseveralgene-finders.

  • 标签: 程序设计 计算机 SGP-2 SLAN
  • 简介:Asusersincreasinglybefriendothersandinteractonlineviatheirsocialmediaaccounts,onlinesocialnetworks(OSNs)areexpandingrapidly.Confrontedwiththebigdatageneratedbyusers,itisimperativethatdatastoragebedistributed,scalable,andcost-efficient.Yetoneofthemostsignificantchallengesaboutthistopicisdetermininghowtominimizethecostwithoutdeterioratingsystemperformance.Althoughmanystoragesystemsusethedistributedkeyvaluestore,itcannotbedirectlyappliedtoOSNstoragesystems.Andbecauseusers'dataarehighlycorrelated,hashstorageleadstofrequentinter-servercommunications,andthehighinter-servertrafficcostsdecreasetheOSNstoragesystem'sscalability.Previousstudiesproposedconductingnetworkpartitioninganddatareplicationbasedonsocialgraphs.However,datareplicationincreasesstoragecostsandimpactstrafficcosts.Here,weconsiderhowtominimizecostsfromtheperspectiveofdatastorage,bycombiningpartitioningandreplication.Ourcost-efficientdatastorageapproachsupportsscalableOSNstoragesystems.Theproposedapproachco-locatesfrequentlyinteractiveuserstogetherbyconductingpartitioningandreplicationsimultaneouslywhilemeetingload-balancingconstraints.Extensiveexperimentsareundertakenontworeal-worldtraces,andtheresultsshowthatourapproachachieveslowercostcomparedwithstate-of-the-artapproaches.ThusweconcludethatourapproachenableseconomicandscalableOSNdatastorage.

  • 标签: online SOCIAL NETWORK inter-server traffic COST