Fermilab Distributed Monitoring System(NGOP)

在线阅读 下载PDF 导出详情
摘要 ADistributedMonitoringSystem(NGOP)thatwillscaletotheanticipatedrequirementsforRUnIIcomputinghasbeenunderdevelopmentatFermilab.NGOP[1]providesaframeworktocreateMonitoringAgentsformonitoringtheoverallstateofcomputersandsoftwarethatarerunningonthem.SeveralMonitoringAgentsareavailablewithinNGOPthatarecapableofanalyzinglogfiles,andcheckingexistenceofsystemdaemons,CPUandmemoryutilization,etc,NGOPalsoprovidescustomizablegraphicalhierarchicalrepresentationsofthesemonitoredsystems.NGOPisabletogenerateeventswhenseriousproblemshaveoccurredaswellasraisingalarmswhenpotentialproblemshavebeendetected.NGOPallowsperformingcorrectivactionsorsendingnotifications,NGOPprovidespersistentstorageforcollectedevents,alarmsandactions.AfirstimplementationofNGOPwasrecentlydeployedatFermilab.Thisisafullyfunctionalprototypethatsatisfiesmostoftheexistingrequirements.ForthetimebeingtheNGOPprototypeismonitoring512nodes.DuringthefirstfewmonthsofrunningNGOPhasprovedtobeausefultool.Multipleproblemssuchasnoderesets,offlineCPUs,anddeadsystemdaemonshavebeendetected.NGOPprovidedsystemadministratorswithinformationrequiredforbettersystemtuningandconfiguration.Thecurrentstateofdeploymentandfuturestepstoimprovetheprototypeandtoimplementsomenewfeatureswillbepresented.
机构地区 不详
出版日期 2001年01月11日(中国期刊网平台首次上网日期,不代表论文的发表时间)
  • 相关文献