Althoughtheproteinsequence-structuregapcontinuestoenlargeduetothedevelopmentofhigh-throughputsequencingtools,theproteinstructureuniversetendstobecompletewithoutproteinswithnovelstructuralfoldsdepositedintheproteindatabank(PDB)recently.Inthiswork,weidentifyaproteinstructuraldictionary(Frag-K)composedofasetofbackbonefragmentsrangingfrom4to20residuesasthestructural"keywords"thatcaneffectivelydistinguishbetweenmajorproteinfolds.Wefirstlyapplyrandomizedspectralclusteringandrandomforestalgorithmstoconstructrepresentativeandsensitiveproteinfragmentlibrariesfromalargescaleofhigh-quality,non-homologousproteinstructuresavailableinPDB.Weanalyzetheimpactsofclusteringcut-offsontheperformanceofthefragmenthbraries.Then,theFrag-KfragmentsareemployedasstructuralfeaturestoclassifyproteinstructuresinmajorproteinfoldsdefinedbySCOP(StructuralClassificationofProteins).OurresultsshowthatastructuraldictionarywithN4004-to20-residueFrag-KfragmentsiscapableofclassifyingmajorSCOPfoldswithhighaccuracy.