简介:Sequentialpatternminingisanimportantdataminingproblemwithbroadapplications.However,itisalsoachallengingproblemsincetheminingmayhavetogenerateorexamineacombinatoriallyexplosivenumberofintermediatesubsequences.Recentstudieshavedevelopedtwomajorclassesofsequentialpatternminingmethods:(1)acandidategeneration-and-testapproach,representedby(i)GSP,ahorizontalformat-basedsequentialpatternminingmethod,and(ii)SPADE,averticalformat-basedmethod;and(2)apattern-growthmethod,representedbyPrefixSpananditsfurtherextensions,suchasgSpanforminingstructuredpatterns.Inthisstudy,weperformasystematicintroductionandpresentationofthepattern-growthmethodologyandstudyitsprinciplesandextensions.Wefirstintroducetwointerestingpattern-growthalgorithms,FreeSpanandPrefixSpan,forefficientsequentialpatternmining.ThenweintroducegSpanforminingstructuredpatternsusingthesamemethodology.Theirrelativeperformanceinlargedatabasesispresentedandanalyzed.Severalextensionsofthesemethodsarealsodiscussedinthepaper,includingminingmulti-level,multi-dimensionalpatternsandminingconstraint-basedpatterns.
简介:Geological Prospecting and Mining in TibetGeologicalProspectingandMininginTibet¥DONDUINAMGYISeptember1,1995markedthe30thanniv...
简介:HuainanCoalMiningBureau,aspeciallargecoalenterpriseandastatekeycoalproductionbase,issituatedincentral-northpartofAnhuiProvince.Thearea,well-knownas"thecoalcapitalofEastChina",aboundsincoalresources,andtheprovencoalreserveisestimatedtobeupto70billiontonswithcompletevarietiesandsuperiorquality.Bytheyearof2010,theannualproductioncapacitywillreach30milliontons.Thereareexcellentinvestmentenvironmentandconvenientcommunicationandtransportation
简介:在日常生活,人们经常在某些时期重复常规线路。在这篇论文,一个采矿系统被开发经过旅行发现个人的连续线路模式。数个人动人的地位的差异的无异状,采矿系统采用记录的适应GPS数据,五个数据过滤保证clean使数据犯错。采矿系统使用客户机/服务器体系结构保护个人隐私并且减少计算负担。服务者进行主要采矿过程,但是与到recover的不够的信息,真实个人发送。改进顺序的模式采矿的可伸缩性的无异状,一个新奇模式采矿算法,连续线路模式采矿(CRPM),被建议。这个算法能容忍在真实线路和摘录的不同骚乱经常的模式。基于九个人的旅行表演,那CRPM能更长多于twotimes提取的试验性的结果比传统的线路模式采矿算法发送模式。
简介:Withmassiveamountsofdatastoredindatabases,mininginformationandknowledgeindatabaseshasbecomeanimportantissueinrecentresearch.Researchersinmanydifferentfieldshaveshowngreatinterestindateminingandknowledgediscoveryindatabases.Severalemergingapplicationsininformationprovidingservices,suchasdatawarehousingandon-lineservicesovertheInternet,alsocallforvariousdataminingandknowledgediscoverytchniquestounderstandusedbehaviorbetter,toimprovetheserviceprovided,andtoincreasethebusinessopportunities.Inresponsetosuchademand,thisarticleistoprovideacomprehensivesurveyonthedataminingandknowledgediscorverytechniquesdevelopedrecently,andintroducesomerealapplicationsystemsaswell.Inconclusion,thisarticlealsolistssomeproblemsandchallengesforfurtherresearch.
简介:Thedatausedintheprocessofknowledgediscoveryoftenincludesnoiseandincompleteinformation.Theboundariesofdifferentclassesofthesedataareblurandunobvious.Whenthesedataareclusteredorclassified,weoftengetthecoveringsinsteadofthepartitions,anditusuallymakesourinformationsysteminsecure.Inthispaper,optimalpartitioningofincompletedataisresearched.Firstly,therelationshipofsetcoverandsetpartitionisdiscussed,andthedistancebetweensetcoverandsetpartitionisdefined.Secondly,theoptimalpartitioningofgivencoverisresearchedbythecombingandpartingmethod,acquiringtheoptimalpartitionfromthreedifferentpartitionssetfamilyisdiscussed.Finally,thecorrespondingoptimalalgorithmisgiven.Therealwirelesssignalsofftencontainalotofnoise,andtherearemanyerrorsinboundarieswhenthesedataisclusteredbasedonthetradionalmethod.Inourexperimant,theproposedmethodimprovescorrectrategreatly,andtheexperimentalresultsdemonstratethemethod’svalidity.
简介:Landresourcesarefacingcrisesofbeingmisused,especiallyforanintersectionareabetweentownandcountry,andlandcontrolhastobeenforced.Thispaperpresentsadevelopmentofdataminingmethodforlandcontrol.Avector-matchmethodfortheprerequisiteofdataminingi.e.,datacleaningisproposed,whichdealswithbothcharacterandnumericdataviavectorizingcharacter-stringandmatchingnumber.Aminimaldecisionalgorithmofroughsetisusedtodiscovertheknowledgehiddeninthedatawarehouse.Inordertomonitorlandusedynamicallyandaccurately,itissuggestedtosetupareal-timelandcontrolsystembasedonGPS,digitalphotogrammetryandonlinedatamining.Finally,themeansisappliedintheintersectionareabetweentownandcountryofWuhancity,andasetofknowledgeaboutlandcontrolisdiscovered.
简介:采矿诱发性是必要的提供诊断。这研究瞄准提取在多重句子或EDU(基本讲话单位)以内存在的诱发性。因为他们以某个方式成为明确,研究强调诱发性动词的使用一个原因的作为结果的事件,例如,“蚜虫从米饭叶子吮吸傻瓜。然后,叶子将缩小。后来,他们将变得黄;干燥。'.一个动词能也是在原因之间的原因动词的连接;在EDU以内完成,例如,“蚜虫从引起叶子被缩小的米饭叶子吮吸傻瓜”(“引起”用泰语等价于一个原因动词的连接)。研究面对二个主要问题:从文件识别有趣的诱发性事件;识别他们的边界。然后,我们由使用二种不同机器学习技术在动词上建议采矿,中间广场Bayes;支持向量机。结果的采矿规则将被用于鉴定;从文本的多重EDU的诱发性抽取。我们的多重EDU抽取从中间广场Bayes与0.75召回显示出0.88精确;有从支持向量机的0.76召回的0.89精确。
简介:Thispaperpresentsafault-detectionmethodbasedonthephasespacereconstructionanddataminingapproachesforthecomplexelectronicsystem.TheapproachforthephasespacereconstructionofchaotictimeseriesisacombinationalgorithmofmultipleautocorrelationandΓ-test,bywhichthequasi-optimalembeddingdimensionandtimedelaycanbeobtained.Thedataminingalgorithm,whichcalculatestheradiusofgyrationofunit-masspointaroundthecentreofmassinthephasespace,candistinguishthefaultparameterfromthechaotictimeseriesoutputbythetestedsystem.Theexperimentalresultsdepictthatthisfaultdetectionmethodcancorrectlydetectthefaultphenomenaofelectronicsystem.
简介:OutlierminingisanimportantaspectindataminingandtheoutlierminingbasedonCookdistanceismostcommonlyused.Butweknowthatwhenthedatahavemulticollinearity,thetraditionalCookmethodisnolongereffective.Consideringtheexcellenceoftheprincipalcomponentestimation,weuseittosubstitutetheleastsquaresestimation,andthengivetheCookdistancemeasurementbasedonprincipalcomponentestimation,whichcanbeusedinoutliermining.Atthesametime,wehavedonesomeresearchonrelatedtheoriesandapplicationproblems.
简介:Asemi-structureddocumenthasmorestructuredinformationcomparedtoanordinarydocument,andtherelationamongsemi-structureddocumentscanbefullyutilized.Inordertotakeadvantageofthestructureandlinkinformationinasemi-structureddocumentforbettermining,astructuredlinkvectormodel(SLVM)ispresentedinthispaper,whereavectorrepresentsadocument,andvectors'elementsaredeterminedbyterms,documentstructureandneighboringdocuments.TextminingbasedonSLVMisdescribedintheprocedureofK-meansforbriefnessandclarity:calculatingdocumentsimilarityandcalculatingclustercenter.TheclusteringbasedonSLVMperformssignificantlybetterthanthatbasedonaconventionalvectorspacemodelintheexperiments,anditsFvalueincreasesfrom0.65-0.73to0.82-0.86.