您好,欢迎来到华拓科技网。
搜索
您的当前位置:首页A Practical Methodology for Defining Histograms for Predictions and Scheduling

A Practical Methodology for Defining Histograms for Predictions and Scheduling

来源:华拓科技网
APracticalMethodologyforDefiningHistogramsfor

PredictionsandScheduling

JenniferM.Schopf

ComputerScienceDepartmentNorthwesternUniversity

jms@cs.nwu.edu

Abstract

Currentdistributedparallelplatformscanprovidetheresourcesrequiredtoexecuteascientificapplicationefficiently.However,whentheseplatformsaresharedbymultipleusers,theperfor-manceoftheapplicationsusingthesystemmaybeimpactedindynamicandoftenunpredictableways.Performancepredictionbecomesincreasinglydifficultduetothisdynamicbehavior.Evenperformancemodelingtechniquesthatarebuiltspecificallyfordistributedparallelsystemsoftenrequireparameterizationbysingle(point)values.However,insharedenvironments,pointvaluesmayprovideaninaccuraterepresentationofapplicationbehaviorduetovariationsinresourceperformance.

Thispaperaddresstheuseofpracticalhistogramstochasticvaluestoparameterizeper-formancemodels.Whereasapointvalueprovidesasinglevaluerepresentationofaquantity,astochasticvalueprovidesasetofpossiblevaluestorepresentarangeoflikelybehavior.Inpreviousworkweinvestigatedusingeithernormaldistributionsoranupperandlowerboundtorepresentstochasticdata.Inthisworkweexamineusingacombinationofthosemethods,namelyhistograms,torepresentthisdata.Wedefineapracticalapproachtousinghistogramsinapro-ductionsetting,andthengiveexperimentalresultsforasetofapplicationsunderdifferentloadconditions.

1Motivation

Currentdistributedparallelplatformscanprovidetheresourcesrequiredtoexecuteascientificap-plicationefficiently.However,whentheseplatformsaresharedbymultipleusers,theperformanceoftheapplicationsusingthesystemmaybeimpactedindynamicandoftenunpredictableways.Inordertoobtaingoodperformance,accurateperformancepredictionmodelsfordistributedparallelsystemsareneeded.Mostperformancepredictionmodelsuseparameterstodescribesystemandapplicationcharacteristicssuchasbandwidth,availableCPU,messagesize,operationcounts,etc.Modelparametersaregenerallyrepresentedasasinglelikelyvalue,whichwerefertoasapointvalue.Forexample,apointvalueforbandwidthmightbe7Mbits/second.

Inpractice,pointvaluesareoftenabestguess,anestimateunderidealcircumstances,oravaluethatisaccurateonlyforagiventimeframe.Insomesituationsitmaybemoreaccuratetorepresentsystemandapplicationcharacteristicsasarangeofpossiblevalues;forexample,bandwidthmightbereportedasvaryingbetween6and8Mbits/second.Werefertosuchvaluesasstochasticvalues.Whereasapointvaluegivesasinglevalueforaquantity,astochasticvaluegivesasetofvalues,possiblyweightedbyprobabilities,torepresentarangeoflikelybehavior[TK93].

1

Onewaytorepresentstochasticvaluesisbyusinghistograms.Histograms,alsocalledVU-lists[Lue98]orfrequencydistributions[Bla80],consistofasetofintervalswithassociatedprob-abilities.Inthispaperwepresentapracticaldefinitionforhistogramsanddemonstratetheirusewhenpredictingtheperformanceofparalleldistributedapplications.

Thispaperisorganizedasfollows:Section2givesabriefoverviewofamodelingtechniquecalledstructuralmodelingthatcanbeparameterizedbystochasticinformation.Section3detailsapracticaldefinitionforhistograms.WedescribethearithmeticneededtocombinehistogramsinSection4.Section5presentsexperimentalresults,andsummary.

2OverviewofStructuralModeling

Inordertopredictanapplication’sbehavioronsharedresources,weneedbothgoodmodelsoftheimplementationandaccurateinformationabouttheapplicationandsystemwithwhichtoparam-eterizethemodels.Inpreviouswork[Sch98]wedevelopedatechniquecalledstructuralmod-elingthatusesthefunctionalstructureoftheapplicationtodefineasetofequationsthatreflectthetime-dependent,dynamicmixofapplicationtasksoccurringduringexecutioninadistributedparallelenvironment.Astructuralmodelconsistsofatop-levelmodel,componentmodels,andinputparameters.Atop-levelmodelisaperformanceequationconsistingofcomponentmodelsandcompositionoperators.Eachcomponentmodelisalsodefinedtobeaperformanceequation,parameterizedbyplatformandapplicationcharacteristics,suchasbenchmarks,dynamicsystemmeasurements,problemsize,orthenumberofiterationstobeexecuted.Theoutputofastruc-turalmodelisapredictedexecutiontime.Structuralmodelscanalsobedevelopedforalternativeperformancemetrics,suchasspeeduporresponsetime.

Agoodperformancemodel,likeagoodscientifictheory,isabletoexplainavailableobserva-tionsandpredictfutureevents,whileabstractingawayunimportantdetails.Structuralmodelsarerepresentedbyequationsthatallowustoabstractawaydetailsintothetoplevelmodelsandtheparameters,butstillaccuratelymodeltheperformancebyshowingtheinter-relationofapplicationandsystemfactorsthroughthecomponentmodelsandtheirparameterizations.Weparameterizethesemodelswithstochasticvaluesrepresentedashistograms,whichrequiresnotonlycompute-efficientdefinitionsofhistogramsbuttheneededarithmeticaswell.Thesearediscussedinthefollowingsections.

3DefiningHistograms

Astochasticvalue

maybespecifiedusingaone-dimensionalhistogram

asfollows:

(1)

entryinthedefinitionofisapairconsistingofaninterval,andanas-Each

sociatedprobability,.Oneoftheprimarydifficultiesindefininghistogramsfordynamicsystemvaluesisthedeterminationoftheappropriateintervalsandprobabilitiesforeachhistogram.Atradeoffexistsbetweenaddingdetailtothehistogramdescription,whichmaybemoreaccurate,butincreasesthecomputationalcomplexity.Aswillbeseeninthenextsection,arithmeticoverextendedhistogramsinvolvescombiningallpossiblesetsofprobabilisticchoices,whichcanbequitelarge.

2

3.1PreviousHistogramDefinitionsandRelatedWork

Thereareanumberofguidelinesfordefininghistogramsintheliterature[MSW81,Bla80,Cra68,Bai72,Stu26,Sco79].These“rulesofthumb”areusedbyresearchersfordefininghistograms,al-thoughinmostcasestheseconcentrateonwaystodisplaytheinformation,asopposedtousingtheinformationasasummarytechniqueinapracticalsetting,anddonotprovidearigorousapproachtochoosingthenumberofintervalsinahistogram.Thecollectiveexperienceoftheseresearchersindicatesthatthe“best”numberofintervalstoselectisdata-dependent,application-dependentandsystem-dependent,aswellasdependentonhowthehistogramwillbeusedandthegranularityofmeasurement[LMH96].

Inthemostcloselyrelatedworktoourresearch[LH95],histogramsaredefinedinitiallybasedonthenumberofintervalsprovidedbytheuser.Theintervalsarethensplit,usingintervalsplit-ting[MR95,Ske74]andausersuppliedproceduretoproducemoresub-intervalsforintervalswithhighprobabilitiesandfewerforintervalsofthesamewidthbutlowerprobability.Thisprocedureisiteratedforeachdefinedhistogramuntilsplittingtheintervalfurtherdoesnotchangethere-sultingpredictedvalue.Fortheirpurposes,“Computationalcomplexityisreasonablegiventhatthenumberofparametersspecifiedashistogramsisnottoohigh”[LMH96].However,thisisnotpracticalforoursettingaswedonothaveagoodsplittingroutineorthecomputetimetospendonaniterativemethod.

3.2PracticalHistograms

Whendefininghistogramsforuseinmakingpredictionstodetermineschedulesatrun-time,weneedanapproachthatusesfew,ifany,usersuppliedparameters,iscomputationallyefficient,andyetcanadequatelycapturethebehaviorofthesystembeingdescribed.Furthermore,sincethenumberofpartialresultsforarithmeticfunctionsoverhistogramscanbequitelarge(asdescribedinthenextsection)weneedtolimitthisfactoraswell.

Inourapproachweusedatwo-passmethodandlimitedthenumberofintervalsinthehistogramaswellasthepartialresults.Thefirstpassoverthedatawasusedtoevaluateanupperboundandalowerbound,andtodeterminethesizeoftheindividualintervalbins.Thesecondpasswasusedtosortthedataintotheintervalbins,anddefinethehistogramforthestochasticdata.

Toselectthenumberofintervals,weexperimentallyevaluatedarangeofintervalsupto25onapreliminarysetofdata.TheprimarysourceofvaryingbehaviorforthisdatawastheavailableCPUvaluesforthecontendedsystem.ThisdatawassuppliedbytheNetworkWeatherService(NWS)[Wol97],anonlinetoolthatprovidesatimeseriesofdataforCPUavailability,bandwidthandlatency.Inpreviousexperiments[Sch98]wehaddeterminedtheamountofdatafromthetimeseriestouseinourpredictions,namelya5minutewindow.Thishadbeenshowntobenoworsethanusinglargerorsmallerwindowsofdata.

Weexperimentallyexaminedtheaccuracyofthehistogramforarangeofintervalsizesandmeasuredboththetimetocomputetheprediction,whichincludedthetimetodeterminethehis-togramsfortheinputparameters,andtheaccuracyoftheprediction.Forloadcharacteristicscommonlyseenonnetworksofworkstations[Sch98],wedeterminedthatthepredictiveaccurateofmorethanfiveintervalsdidnotincreasesignificantly,althoughthecomputationalexpensegrewquitelargeatthatpoint.Therefore,inthefollowing,weusedfiveintervalsdividedevenlyovertherangeunlessstatedotherwise.

3

3.3ArithmeticoverHistograms

Oneoftheprimarydifficultiesindefininghistogramsfordynamicsystemvaluesisthedetermi-nationoftheappropriateintervalsandprobabilitiesforeachhistogram.Sinceaddingdetailtothehistogramdescriptionincreasesthecomputationalcomplexity.Arithmeticoverextendedhis-oftogramsinvolvescombiningallpossiblesetsofprobabilisticchoices,whichgiventheset

stochasticvaluesrepresentedashistograms,eachwithintervals,thisresultsinforall

.Thisvaluecanbequitelarge,dependingonhowwedefinethenumberofintervalsthe

histogramforagivenstochasticvalueshouldhave.Givenasetofstochasticvaluedparameters,,representedbyhistograms,assumethatparameterhasintervalswith,i.e.:

(2)

Sincehistogramsarejustsetsofintervals,theintervalarithmeticrules[Neu90]beusedtocombinethemarithmetically.However,everyintervalhasanassociatedprobability.Hence,whencalculatingforsomearithmeticfunctionandhistogramvaluesand,itisneces-sarytocalculateintermediateresultsforallpossiblecombinationsofintervalsforeachhistogram.Forexample,giventhehistogramspicturedinFigure1:

(3)

(4)

Thisgivesintermediateresults(depictedinFigure2):

(5)

Oncetheintermediateresultshavebeencalculated,theymustbeaggregatedtoformtheresulthistogram,aspicturedinFigure3.AfulldefinitionofPDF’soverhistogramsisgivenin[Lue98].Inthiscase,theresultinghistogramis:

(6)

0.8Histogram for DHistogram for E0.80.80.70.70.70.60.60.60.50.5probabty012345671011121314150.4probabtyprobabty0.50.40.40.30.30.30.20.20.20.10.10.10012345670001234567101112131415Figure1.HistogramsforsampleparametersDandEFigure2.HistogramofintermediateresultsforD+E

Figure3.ResultofD+E.

4

4PredictionsUsingHistograms

Givenastructuralmodelforanimplementationparameterizedbypointvalueandstochasticvaluesystemandapplicationcharacteristics,wecancalculateapredictionforapplicationsrunningincontentiousenvironments.Thissectiondetailsourexperimentalresults,andcomparesusingahistogramtorepresentstochasticvaluestoothermethods.

WeexaminedthreeapplicationsonadistributednetworkofSparcworkstationslocatedintheUCSDParallelComputationLab.Themachinesareconnectedwithamixof10Mbit(slow)and100Mbit(fast)ethernet.BoththeCPUandthenetworkweresharedwithotherusers.Foreachapplicationweranaseriesofexperimentsunderdifferentloadconditions.Toshowtheexecutiontimetrends,weshowtheresultsoftheactualexecutiontimeswiththeupperboundandlowerboundofthehistogrampredictions,andinTable5givethepercentageofvalueswhichfellintoeachprobabilityintervalofthehistogrampredictionsforthatgraph.Figure4isasamplepredictionforasinglerunoftheSORcode.

Intheexperiments,thestochasticCPUinformationusedinthemodelswassuppliedbytheNetworkWeatherService[Wol97].Incategorizingthemultiplebackgroundworkloadsseenforeachexperimentset,wedefineamachinewithalowvariationinCPUavailabilityashavingavariationof0.25orless,onascalefrom0to1,amediumvariationwhentheCPUavailabilityvariedmorethan0.25,butlessthan0.5,andahighvariationwhentheCPUvaluesvariedoverhalftherange.

40.0Histogram PredictionActual ValueProbabty vaue s n range (0% to 100%)30.02nd20

SOR2Figure7GA2Figure9

40.050.0Execution time (sec)60.070.04th321

14

1520

0

70

8

Outside4250

20.018

28

33

26

42

10.00.030.0LU2Figure11

Figure4.SampleHis-togramPrediction.

70.0Actual Execution TimesInterval Stochastic Value Prediction60.0150.0100.0Execution Time (sec)50.040.0Execution Time (sec)50.030.0Actual Execution TimesInterval Stochastic Value Prediction20.0910253500.0910254000.0910254500.0Time Stamps910255000.0910255500.00.0910255000.0910255500.0910256000.0910256500.0910257000.0910257500.0910258000.0Time StampsFigure6.SOR1-stochasticvaluepredictionfortheSOR.Figure7.SOR2-stochasticvaluepredictionfortheSOR.

4.2Application2-GeneticAlgorithmCode

Weimplementedageneticalgorithm(GA)fortheTravelingSalesmanProblem(TSP)[LLKS85].OurdistributedimplementationofthisGAusesaglobalpopulationandsynchronizesbetweengenerations[Bha96].AlloftheSlavesoperateonaglobalpopulation(eachmemberofthepopu-lationisasolutiontotheTSPforagivensetofcities)thatisbroadcasttothem.EachSlaveworksinisolationtocreateaspecifiednumberofchildren(representingnewtours),andtoevaluatethem.OnceallthesetsofchildrenarereceivedbytheMaster,theyaresorted(byefficiencyofthetour),somepercentagearechosentobethenextgeneration,andthecyclerepeats.

GA1,Figure8,showsthestochasticvaluepredictionsandtheactualtime,runwhenthetwomachinesontheclusterhadahighlyvariableavailability,andtwohadlowvariability.GA2,Figure9,showsthestochasticvaluepredictionsandtheactualtimerunwhenthePCLclusterhadtwohighlyvariablemachinesandtwolowvariabilitymachines.

60.0150.040.0100.0Execution Time (sec)20.0Execution Time (sec)50.0Actual Execution TimesInterval Stochastic Value PredictionActual Execution TimesInterval Stochastic Value Prediction0.0910328000.0910328500.0910329000.0Time Stamps910329500.0910330000.00.0910329500.0910330000.0910330500.0Time Stamps910331000.0910331500.0Figure8.GA1-stochasticvaluepredictionfortheGAcode.Figure9.GA2-stochasticvaluepredictionfortheGAcode.

4.3Application3-LUBenchmark

TheLUbenchmarkisasimulatedCFDapplicationthatsolvesablock-lower-triangular/block-upper-triangularsystemofequations,andisoneoftheNASParallelBenchmarks(NPB)[BHS95].Thissystemofequationsistheresultofanunfactoredimplicitfinite-differencediscretizationof

6

theNavier-Stokesequationsinthreedimensions.TheLUbenchmarkfindslowertriangularanduppertriangularmatrixessuchthatforanoriginalmatrix.Ithasthefeaturethatitteststhecommunicationaspectsofthesystemwellbysendingarelativelylargenumberof5wordmessages.

LU1,Figure10,showsthestochasticvaluepredictionsandtheactualtime,runwhentwoofthemachinesshowedahighvariabilityinavailability,andtwohadalowvariability.LU2,Figure11,showsthestochasticvaluepredictionsandtheactualtime,runwhenthePCLclusteronehighvariabilitymachine,onemediumvariabilitymachineandtwolowvariabilitymachines.

200.0180.0160.0200.0150.0140.0Execution Time (sec)120.0100.080.060.0Execution Time (sec)Actual Execution TimesInterval Stochastic Value Prediction100.050.040.020.00.0910395000.0Actual Execution TimesInterval Stochastic Value Prediction910396000.0910397000.0Time Stamps910398000.0910399000.00.0910398000.0910399000.0910400000.0Time Stamps910401000.0Figure10.LU1-stochasticvaluepredictionforLU.Figure11.LU2-stochasticvaluepredictionforLU.

4.4SummaryofExperiments

Insummary,forthemajorityoftheexperiments,weachievedpredictionsthatcapturedtheexe-cutionbehaviorsusinghistogramrepresentationsforthestochasticinformation.Inaddition,theshapeandpercentagesofthehistogramintervalswereclosetotheactualexecutionvalues,andseemedtoaddvaluableinformationtotheprediction.Futureworkinvolvesexamininghowthesevaluesaffectschedulingdecisionsaswellasadaptivemethodsfordefininghistograms.

4.5ComparisonofStochasticRepresentations

Thisworkhasgivenadetailedapproachtousinghistogramstorepresentstochasticvalues.Inpreviouswork,werepresentedstochasticvaluesasdistributions[SB97]andasintervals[SB99].Eachapproachhasadvantagesanddisadvantages.Intervalsarethemosteasilydefinedsincetheminimumandmaximumvalueinadatasetforastochasticvalueareeasytodetermine.However,outliersinthedatacanaffectthesizeoftheinterval,andnodetailsabouttheshapeofthedataareincludedinasimplerange.Histogramsallowtheshapeofastochasticvaluetobeelucidated,andlendthemselvestothegroupingofvalues,butevenwithoutapproach,canbedifficulttoaccuratelydefineinapracticalsetting.Distributionscanbedefinedusingwellunderstoodmetricsforthedata,butinordertobetractablearithmetically,wemustassumethattheassociateddatafitsawell-definedandcomputationallyefficientfamilyofdistributions,suchasthefamilyofnormaldistributions,whichisnotalwaysavalidassumption.

7

References

[Bai72][Bha96]

A.Baines.Introduction.InO.DaviesandP.Goldsmith,editors,StatisticalMethodsinResearchandProduction,PublishedforImperialChemicalIndustriesLimitedbyOliverandBoyd,1972.KaranBhatia.Personalcommunication,1996.

[BHS95]D.Bailey,T.Harris,W.Saphir,R.vanderWijngaart,A.Woo,andM.Yarrow.Thenasparallel

benchmarks2.0.TechnicalReportReportNAS-95-020,December1995.[Bla80][Cra68][LH95]

LelandBlank.StatisticalProceduresforengineering,management,andscience.McGraw-HillBookCompany,1980.

J.M.Craddock.StatisticsintheComputerAge.TheEnglishUniversitiesPressLimited,1968.J.L¨uthiandG.Haring.Meanvalueanalysisforqueuingnetworkmodelswithintervalsasinputparameters.TR-950101,Instituteofappliedscience,UniversityofVienna,July1995.

[LLKS85]Lawler,Lenstra,Kan,andShmoys.TheTravelingSalesmanProblem.Wiley&Sons,1985.[LMH96]J.L¨uthi,S.Majumdar,andG.Haring.Meanvalueanalysisforcomputersystemswithvariabili-tiesinworkload.InProc.ofIPDS’96,1996.[Lue98]

JohannesLuethi.Histogram-basedcharacterizationofworkloadparametersanditsconse-quencesonmodelanalysis.InProceedingsoftheWorkshoponWorkloadCharacterization

inHigh-PerformanceComputingEnvironments,jointlywithMASCOTS’98,1998.

S.MajumdarandR.Ramadoss.Interval-basedperformanceanalysisofcomputingsystems.InModelAnalysisandSimulationofComputerandTelecommunicationSystems,1995.

[MR95]

[MSW81]WilliamMendenhall,RichardL.Scheaffer,andDennisD.Wackerly.MathematicalStatistics

withApplications,pages4–6.DuxburyPress,1981.[Neu90][SB97][SB99][Sch98][Sco79][Ske74][Stu26][TK93][Wol97]

ArnoldNeumaier.Intervalmethodsforsystemsofequations,BasicPropertiesofIntervalArith-metic.CambridgeUniversityPress,1990.

JenniferM.SchopfandFrancineBerman.Performancepredictioninproductionenvironments.InProceedingsofIPPS/SPDP’98,1997.

JenniferM.SchopfandFrancineBerman.Usingstochasticintervalstopredictapplicationbehavioroncontendedresources.InProceedingsofISPAN99,1999.

JenniferM.Schopf.PerformancePredictionandSchedulingforParallelApplicationsonMulti-UserClusters.PhDthesis,UniversityofCalifornia,SanDiego,1998.D.W.Scott.Onoptimalanddata-basedhistograms.Biometrika,66,1979.StigSkelboe.Computationofrationalintegerfunctions.BIT,14,1974.H.A.Sturges.JournalofAmericanStatisticalAss.,21,1926.

H.TaylorandS.Karlin.AnIntroductiontoStochasticModeling.AcademicPress,1993.R.Wolski.Dynamicallyforecastingnetworkperformancetosupportdynamicschedulingusingthenetworkweatherservice.InProc.HighPerformanceDistributedComputing,August1997.

8

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- huatuo6.cn 版权所有 赣ICP备2024042791号-9

违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务