摘要
ADistributedMonitoringSystem(NGOP)thatwillscaletotheanticipatedrequirementsforRUnIIcomputinghasbeenunderdevelopmentatFermilab.NGOP[1]providesaframeworktocreateMonitoringAgentsformonitoringtheoverallstateofcomputersandsoftwarethatarerunningonthem.SeveralMonitoringAgentsareavailablewithinNGOPthatarecapableofanalyzinglogfiles,andcheckingexistenceofsystemdaemons,CPUandmemoryutilization,etc,NGOPalsoprovidescustomizablegraphicalhierarchicalrepresentationsofthesemonitoredsystems.NGOPisabletogenerateeventswhenseriousproblemshaveoccurredaswellasraisingalarmswhenpotentialproblemshavebeendetected.NGOPallowsperformingcorrectivactionsorsendingnotifications,NGOPprovidespersistentstorageforcollectedevents,alarmsandactions.AfirstimplementationofNGOPwasrecentlydeployedatFermilab.Thisisafullyfunctionalprototypethatsatisfiesmostoftheexistingrequirements.ForthetimebeingtheNGOPprototypeismonitoring512nodes.DuringthefirstfewmonthsofrunningNGOPhasprovedtobeausefultool.Multipleproblemssuchasnoderesets,offlineCPUs,anddeadsystemdaemonshavebeendetected.NGOPprovidedsystemadministratorswithinformationrequiredforbettersystemtuningandconfiguration.Thecurrentstateofdeploymentandfuturestepstoimprovetheprototypeandtoimplementsomenewfeatureswillbepresented.
出版日期
2001年01月11日(中国期刊网平台首次上网日期,不代表论文的发表时间)