论文查询检索-中国期刊网

Reevaluating Data Stall Time with the Consideration of Data Access Concurrency

简介：数据存取延期成为了高端计算系统的突出的性能瓶颈。在系统设计减少数据存取延期的关键是减少数据货摊时间。存储器地区和并发是影响现代存储器系统的性能的二个必要因素。因为全面存储器系统性能上的存储器并发的影响很好没被理解，然而，存在在利用数据存取并发上在很少减少数据货摊时间学习焦点。在这研究，一双新奇数据货摊时间模型，为地区和并发的联合努力的L-C模型和为数据上的纯失误的效果的下午模型阻止时间，被介绍。模型提供数据存取延期的新理解并且为表演优化提供新方向。基于这些新模型，先进缓存优化的一张概括表格被介绍。当时，被数据并发贡献了，把38个条目仅仅，21个条目由数据地区作出贡献，它显示出数据并发的值。在这研究介绍的L-C和下午模型和他们的联系结果和机会为数据中央的建筑学和算法现代计算系统设计的未来重要、必要。
标签：数据访问并发性失速时间存储器系统评估

全文阅读

SDPG： Spatial Data Processing Grid

简介：Spatialapplicationswillgainhighcomplexityasthevolumeofspatialdataincreasesrapidly.Asuitabledataprocessingandcomputinginfrastructureforspatialapplicationsneedstobeestablished.Overthepastdecade,gridhasbecomeapowerfulcomputingenvironmentfordataintensiveandcomputingintensiveapplications.Integratinggridcomputingwithspatialdataprocessingtechnology,theauthorsdesignedaspatialdataprocessinggrid(calledSDPG)toaddresstherelatedproblems.RequirementsofspatialapplicationsareexaminedandthearchitectureofSDPGisdescribedinthispaper.KeytechnologiesforimplementingSDPGarediscussedwithemphasis.
标签： SDPG 网格计算空间数据处理 GIS 应用软件

全文阅读

赛门铁克推出Data Center Foundation

作者：张羽
学科：自动化与计算机技术 > 计算机科学与技术
创建时间：2006-12-22
出处：《中国信息化》 2006年第14期

简介：赛门铁克于近期宣布了一个绝无仅有的融合式解决方案，其可促进企业在跨异质化应用程序、数据库、服务器与存储的平台间于一致的基础架构软件上进行标准化。
标签： FOUNDATION 赛门铁克 CENTER DATA 应用程序基础架构

全文阅读

An Overview of Data Mining and Knowledge Discovery

简介：Withmassiveamountsofdatastoredindatabases,mininginformationandknowledgeindatabaseshasbecomeanimportantissueinrecentresearch.Researchersinmanydifferentfieldshaveshowngreatinterestindateminingandknowledgediscoveryindatabases.Severalemergingapplicationsininformationprovidingservices,suchasdatawarehousingandon-lineservicesovertheInternet,alsocallforvariousdataminingandknowledgediscoverytchniquestounderstandusedbehaviorbetter,toimprovetheserviceprovided,andtoincreasethebusinessopportunities.Inresponsetosuchademand,thisarticleistoprovideacomprehensivesurveyonthedataminingandknowledgediscorverytechniquesdevelopedrecently,andintroducesomerealapplicationsystemsaswell.Inconclusion,thisarticlealsolistssomeproblemsandchallengesforfurtherresearch.
标签：数据库知识发现机器学习数据开采

全文阅读

A New ETL Approach Based on Data Virtualization

简介：ETL(Extract-Transform-Load)通常包括三个阶段：抽取，转变，并且装载。在造数据仓库里，它起数据注射的作用并且是最费时间的活动。因此改进ETL的表演是必要的。在这份报纸，一条新ETL途径，电话(Transform-Extract-Load)被建议。电话途径使用虚拟表格在抽取舞台和装载舞台前认识到转变阶段，在存储从每迥异的来源数据系统提取的未加工的数据的数据阶段区域或阶段数据库外面。电话途径减少数据传播负担，并且从存取层改进询问的性能。试验性的结果基于我们的建议基准证明电话途径可行、实际。
标签： ETL 虚拟化临时数据库数据仓库原始数据数据传输

全文阅读

The Modelling of Temporal Data in the Relational Database Environment

简介：Thisresearchtakestheviewthatthemodellingoftemporaldataisafundamentalsteptowardsthesolutionofcapturingsemanticsoftime.Theproblemsinherentinthemodellingoftimearenotuniquetodatabaseprocessing.Therepresentationoftemporalknowledgeandtemporalreasoningarisesinawiderangeofotherdisciplines.Inthispaperanaccountisgivenofatechniqueformodellingthesemanticsoftemporaldataanditsassociatednormalizationmethod.ItdiscussesthetechniquesofprocessingtemporaldatabyemployingaTimeSequence(TS)datamodel.Itshowsanumberofdifferentstrategieswhichareusedtoclassifydifferentdatapropertiesoftemporaldata,anditgoesontodevelopthemodeloftemporaldataandaddressesissuesoftemporaldataapplicationdesignbyintroducingtheconceptoftemporaldatanormalisation.
标签：相关数据库数据存储时间序列模型

全文阅读

Supporting Flexible Data Distribution in Software DSMs

简介：Page-basedsoftwareDSMsystemssufferfromfalsesharingcausedbythelargesharinggranularity,andonlysupportone-dimensionBlockorCyclicblockdatadistributionschemes,Thusapplicationsrunningonthemwillsufferfrompoordatalocalityandwillbeabletoexploitparallelismonlywhenusingalargenumberofprocessors,Inthispaper.awaytowardssupportingflexibledatadistribution(FDD)onsoftwareDSMsystemispresented.Smallgranularity-tunableblocks,thesizeofwhichcanbesetbycompilerorprogrammer,areusedtooverlaptheworkingdatasetsdistributedamongprocessors.TheFDDwasimplmentedonasoftwareDSMsystemcalledJIAJIA.ComparedwithBlock/Cyclic-blockdistributionschemesusedbymostDSMsystemsnow,experimentsshowthattheproposedwayofflexibledatadistributionismoreeffective.Theperformanceoftheapplicationsusedintheexperimentsissignificantlyimproved.
标签：软件开发 DSMS 数据分布

全文阅读

Squeezer：An Efficient Algorithm for Clustering Categorical Data

简介：Thispaperpresentsanewefficientalgorithmforclusteringcategoricaldata,Squeezer,whichcanproducehighqualityclusteringresultsandatthesametimedeservegoodscalability.TheSqueezeralgorithmreadseachtupletinsequence,eitherassigningttoanexistingcluster(initiallynone),orcreatingtasanewcluster,whichisdeterminedbythesimilaritiesbetweentandclusters.Duetoitscharacteristics,theproposedalgorithmisextremelysuitableforclusteringdatastreams,wheregivenasequenceofpoints,theobjectiveistomaintainconsistentlygoodclusteringofthesequencesofar,usingasmallamountofmemoryandtime.OutlierscanalsobehandledefficientlyanddirectlyinSqueezer.Experimentalresultsonreal-lifeandsyntheticdatasetsverifythesuperiorityofSqueezer.
标签：数据处理群集分类数据有效算法 KDD

全文阅读

Metadata Feedback and Utilization for Data Deduplication Across WAN

简介：为越过在象文件同步那样的应用程序的宽区域网络(广域网)的文件通讯的数据deduplication并且云环境反射通常完成以数据deduplication的重要时间开销的成本节省的重要带宽。时间开销包括在二个地理上分布式的节点为数据deduplication要求的时间(例如，磁盘存取瓶颈)并且在发送者之间的复制质问/答案操作和接收装置，后来，每询问或答案介绍至少一个潜伏的双程的时间(RTT)。在这份报纸，我们在场越过有元数据反馈和元数据利用(MFMU)的广域网的一个数据deduplication系统，联系了时间开销以便利用数据deduplication。在建议MFMU系统，到发送者的从接收装置的选择元数据反馈被介绍减少复制质问/答案操作的数字。另外，到马具，元数据在接收装置联系了磁盘I/O操作，以及带宽开销由元数据反馈介绍了，磁滞现象哈希值重新组合机制基于的元数据利用部件被介绍。我们的试验性的结果证明MFMU与保存没被元数据反馈减少的比率的带宽完成了20%40%deduplication加速的一般水准，当与基线相比内容定义组合(CDC)在LBFS(Low-bandwith网络文件系统)使用，组合算法的退出的最先进的Bimodal基于数据deduplication解决方案。
标签：数据删除元数据广域网反馈 I/O操作时间开销

全文阅读

Accelerating Iterative Big Data Computing Through MPI

简介：当前的流行系统，Hadoop和火花，当运行反复的大数据应用程序时，因为计算和通讯的低效的重叠，不能完成满足的性能。计算，数据运动，和数据管理的管道为计算系统的当前的分布式的数据起一个关键作用。在这份报纸，我们首先分析开销洗牌在Hadoop的操作并且当运用PageRank工作量时，发出火花，然后建议一条事件驱动管道和在里面记忆洗牌有更好作为DataMPI重复计算和通讯重叠的设计，一个基于MPI的图书馆，为反复的大数据计算。我们的表演评估表演DataMPI重复能为PageRank和K工具在Apache火花上在ApacheHadoop，和2X3X加速上完成9X21X加速。
标签：数据计算 I迭代 PAGERANK MPI Apache 计算系统

全文阅读

Tencent and Facebook Data Validate Metcalfe＇s Law

简介：
标签：定律腾讯数据验证梅网络规模网络效应

全文阅读

ARMiner：A Data Mining Tool Based on Association Rules

简介：Inthispaper,ARMiner,adataminingtoolbasedonassociationrules,isintroduced.Beginningwiththesystemarchitecture,thecharacteristicsandfunctionsaredis-cussedindetails,includingdatatransfer,concepthierarchygeneralization,miningruleswithnegativeitemsandthere-developmentofthesystem.Anexampleofthetool'sapplicationisalsoshown.Finally,someissuesforfutureresearcharepresented.
标签： ARMiner 数据开采工具机器学习

全文阅读

Dyanmic Data Prefetching in Home—Based Software DSMs

简介：AmajoroverheadinsoftwareDSM(DistributedSharedMemory)isthecostofremotememoryaccessesnecessitatedbytheprotocolaswellasinducedbyfalsesharing.ThispaperintroducesadynamicprefetchingmethodimplementedintheJIAJIAsoftwareDSMtoreducesystemoverheadcausedbyremoteaccesses.TheprefetchingmethodrecordstheinterleavingstringofINV(invalidation)andGETP(gettingaremotepage)operationsforeachcachedpageandanalyzestheperiodicityofthestringwhenapageisinvalidatedonalockorbarrier.AprefetchingrequestisissuedafterthelockorbarrieriftheperiodicityanalysisindicatesthatGETPwillbethenextoperationinthestring.Multipleprefetchingrequestsaremergedintothesamemessageiftheyaretothesamehost,Performanceevaluationwitheightwell-acceptedbenchmarksinaclusterofsixteenPowerPCworkstationsshowsthattheprefetchingschemecansignificantlyreducethepagefaultoverheadandasaresultachievesaperformanceincreaseof15%-20%inthreebenchmarksandaround8%-10%inanotherthree.Theaverageextratrafficcausedbyuselessprefetchesisonly7%-13%intheevaluation.
标签：软件开发数据预取 DSMS

全文阅读

Compressed Data Cube for Approximate OLAP Query Processing

简介：Approximatequeryprocessinghasemergedasanapproachtodealingwiththehugedatavolumeandcomplexqueriesintheenvironmentofdatawarehouse.Inthispaper,wepresentanovelmethodthatprovidesapproximateanswerstoOLAPqueries.Ourmethodisbasedonbuildingacompressed(approximate)datacubebyaclusteringtechniqueandusingthiscompresseddatacubetoprovideanswerstoqueriesdirectly,soitimprovestheperformanceofthequeries.WealsoprovidethealgorithmoftheOLAPqueriesandtheconfidenceintervalsofqueryresults.AnextensiveexperimentalstudywiththeOLAPcouncilbenchmarkshowstheeffectivenessandscalabilityofourcluster-basedapproachcomparedtosampling.
标签： OLAP 数据处理决策支持系统

全文阅读

A Dataflow-Oriented Programming Interface for Named Data Networking

作者： Li-Jing Wang;Yong-Qiang Lv;Ilya Moiseenko;Dong-Sheng Wang
学科：自动化与计算机技术 > 计算机科学与技术
创建时间：2018-01-11
出处：《计算机科学技术学报：英文版》 2018年第1期

简介：从除一个地点驱动的模式以外的一个数据驱动的通讯模式继承，命名数据联网(NDN)把更好的支持提供给网络层dataflow。然而，应用程序开发者不得不处理复杂任务，例如数据分割，包确认，和流动控制，由于在网络层上的合适的运输层协议的缺乏。在这研究，我们设计一个dataflow面向的编程接口为NDN提供运输策略，它极大地在开发应用程序改进效率。这个接口介绍检索策略根据出版模式，基于当前的网络地位和数据产生控制dataflow在采用一个适应ADUpipelining算法的不同数据评估的二个应用程序数据单位(ADU)。接口也提供网络测量策略监视许多影响应用程序表演的批评度量标准。我们由实现流的一个录像验证我们的接口的功能和性能在世界范围的NDN试验床上跨越11个时区的申请。我们的实验证明接口罐头高效地支持开发高效、驾驶dataflow的NDN应用程序。
标签：编程接口数据联网应用程序数据驱动流动控制运输层协议

全文阅读

Using Memory in the Right Way to Accelerate Big Data Processing

简介：处理的大数据正在成为数据中心计算的固执己见者部分。然而，最近的研究显示了大数据工作量不能充分利用现代记忆系统。我们发现处理的大数据的戏剧的无效从缓存失误的庞大的数量和看情况的存储器存取的货摊。在这篇论文，我们介绍二优化处理这些问题。第一是slice-and-merge策略，它减少种类过程的缓存失误率。第二优化是direct-memory-access，它改革在钥匙/值的存储使用的数据结构。这些优化被评估与微基准并且真实世界的基准HiBench。结果我们的微基准清楚地以硬件事件计数表明我们的优化的有效性；并且HiBench的另外的结果显示出1.21X一般水准加速在上申请级。两结果说明那小心的硬件/软件合作设计将改进大数据处理的存储器效率。我们的工作已经集成于为ApacheHadoop的Intel分发。
标签：数据处理内存系统直接存储器访问基准测试 Apache 高速缓存

全文阅读

Improving Data Utility Through Game Theory in Personalized Differential Privacy

作者： Lei Cui;Youyang Qu;Mohammad Reza Nosouhi;Shui Yu;Jian-Wei Niu;Gang Xie
学科：自动化与计算机技术 > 计算机科学与技术
创建时间：2019-02-12
出处：《计算机科学技术学报：英文版》 2019年第2期

简介：Duetodramaticallyincreasinginformationpublishedinsocialnetworks,privacyissueshavegivenrisetopublicconcerns.Althoughthepresenceofdifferentialprivacyprovidesprivacyprotectionwiththeoreticalfoundations,thetrade-offbetweenprivacyanddatautilitystilldemandsfurtherimprovement.However,mostexistingstudiesdonotconsiderthequantitativeimpactoftheadversarywhenmeasuringdatautility.Inthispaper,wefirstlyproposeapersonalizeddifferentialprivacymethodbasedonsocialdistance.Then,weanalyzethemaximumdatautilitywhenusersandadversariesareblindtothestrategysetsofeachother.Weformalizeallthepayofffunctionsinthedifferentialprivacysense,whichisfollowedbytheestablishmentofastaticBayesiangame.Thetrade-offiscalculatedbyderivingtheBayesianNashequilibriumwithamodifiedreinforcementlearningalgorithm.Theproposedmethodachievesfastconvergencebyreducingthecardinalityfromnto2.Inaddition,thein-placetrade-offcanmaximizetheuser'sdatautilityiftheactionsetsoftheuserandtheadversaryarepublicwhilethestrategysetsareunrevealed.Ourextensiveexperimentsonthereal-worlddatasetprovetheproposedmodeliseffectiveandfeasible.
标签： PERSONALIZED PRIVACY protection GAME theory trade-off

全文阅读

3DIVE：An Immersive Environment for Interactive Volume Data Exploration

简介：Thispaperdescribesanimmersivesystem,called3DIVE,forinteractivevolumedatavisualizationandexplorationinsidetheCAVEvirtualenvironment.Combininginteractivevolumerenderingandvirtualrealityprovidesanaturalimmersiveenvironmentforvolumetricdatavisualization.Moreadvanceddataexplorationoperations,suchasobjectleveldatamanipulation,simulationandanalysis,aresupportedin3DIVEbyseveralnewtechniques.Inparticular,volumeprimitivesandtextureregionsareusedfortherendering,manipulation,andcollisiondetectionofvolumetricobjects;andtheregion-basedrenderingpipelineisintegratedwith3Dimagefilterstoprovideanimage-basedmechanismforinteractivetransferfunctiondesign.ThesystemhasbeenrecentlyreleasedaspublicdomainsoftwareforCAVE/ImmersaDeskusers,andiscurrentlybeingactivelyusedbyvariousscientificandbiomedicalvisualizationprojects.
标签：医学造影诊断数据处理三维数据显示

全文阅读

Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

作者：李恒;刘劲松;徐昭;金蛟;方林;高雷;李余动;邢自兴;高绍根;刘涛;李海红;李雁;谢惠民;郑伟谋;郝柏林
学科：自动化与计算机技术 > 计算机科学与技术
创建时间：2005-04-14
出处：《计算机科学技术学报：英文版》 2005年第4期

简介：Withseveralricegenomeprojectsapproachingcompletiongeneprediction/findingbycomputeralgorithmshasbecomeanurgenttask.Twotestsetswereconstructedbymappingthenewlypublished28,469full-lengthKOMEricecDNAtotheRGPBACclonesequencesofOryzasativassp.japonica:asingle-genesetof550sequencesandamulti-genesetof62sequenceswith271genes.Thesedatasetswereusedtoevaluatefiveabinitiogenepredictionprograms:RiceHMM,GlimmerR,GeneMark,FGENSHandBGF.Thepredictionswerecomparedonnucleotide,exonandwholegenestructurelevelsusingcommonlyacceptedmeasuresandseveralnewmeasures.Thetestresultsshowaprogressinperformanceinchronologicalorder.Atthesametimecomplementarityoftheprogramshintsonthepossibilityoffurtherimprovementandonthefeasibilityofreachingbetterperformancebycombiningseveralgene-finders.
标签：程序设计计算机 SGP-2 SLAN

全文阅读

A Cost-Efficient Approach to Storing Users'Data for Online Social Networks

作者： Jing-Ya Zhou;Jian-Xi Fan;Cheng-Kuan Lin;Bao-Lei Cheng
学科：自动化与计算机技术 > 计算机科学与技术
创建时间：2019-01-11
出处：《计算机科学技术学报：英文版》 2019年第1期

简介：Asusersincreasinglybefriendothersandinteractonlineviatheirsocialmediaaccounts,onlinesocialnetworks(OSNs)areexpandingrapidly.Confrontedwiththebigdatageneratedbyusers,itisimperativethatdatastoragebedistributed,scalable,andcost-efficient.Yetoneofthemostsignificantchallengesaboutthistopicisdetermininghowtominimizethecostwithoutdeterioratingsystemperformance.Althoughmanystoragesystemsusethedistributedkeyvaluestore,itcannotbedirectlyappliedtoOSNstoragesystems.Andbecauseusers'dataarehighlycorrelated,hashstorageleadstofrequentinter-servercommunications,andthehighinter-servertrafficcostsdecreasetheOSNstoragesystem'sscalability.Previousstudiesproposedconductingnetworkpartitioninganddatareplicationbasedonsocialgraphs.However,datareplicationincreasesstoragecostsandimpactstrafficcosts.Here,weconsiderhowtominimizecostsfromtheperspectiveofdatastorage,bycombiningpartitioningandreplication.Ourcost-efficientdatastorageapproachsupportsscalableOSNstoragesystems.Theproposedapproachco-locatesfrequentlyinteractiveuserstogetherbyconductingpartitioningandreplicationsimultaneouslywhilemeetingload-balancingconstraints.Extensiveexperimentsareundertakenontworeal-worldtraces,andtheresultsshowthatourapproachachieveslowercostcomparedwithstate-of-the-artapproaches.ThusweconcludethatourapproachenableseconomicandscalableOSNdatastorage.
标签： online SOCIAL NETWORK inter-server traffic COST

全文阅读