Show simple item record

dc.contributor.advisorÁvila, Alfonso
dc.contributor.advisorDieck, Graciano
dc.contributor.advisorSalgado, Ricardo
dc.contributor.advisorPeña, Raúl
dc.creatorFreire Bermudez, Luis Alberto
dc.date.accessioned2019-01-03T15:58:31Z
dc.date.available2019-01-03T15:58:31Z
dc.date.created2016-10
dc.identifier.citationFreire, L. (2018). GPGPU Workload Characterization Using Memory Bottleneck Detection and Hierarchical Clustering Analysis (tesis de maestría). Instituto Tecnológico y de Estudios Superiores Monterrey, Monterrey, Méxicoen_US
dc.identifier.urihttp://hdl.handle.net/11285/632729
dc.description.abstractThe use of Graphic Processing Units (GPU) for General Computing (GPGPU) has become increasingly common in recent years. In this type of processor, memory bottlenecks are a critical issue and the way data are commissioned to the partitions can cause several requests to get stalled behind each other, waiting for resources. In this thesis, a methodology to characterize GPGPU kernels based on their likeability to create bottlenecks in the GPGPU memory hierarchy is presented. A GPGPU simulator is used to obtain unique fingerprints from more than 100 workloads and classify them using a Hierarchical Clustering Analysis. The thesis also shows that that optimizations made to the kernels impact its run time memory bottleneck generation and that this behavior is successfully detected by the methodology. Two major groups of kernels were defined, naïve and optimized ones, and to characterize a set of exploration kernels within those groups with an effectiveness rate of over 75% for the two groups. A discussion is also held about how different levels of optimizations can be identified by our clustering engine and how those results could be use by subsequent approaches to predict bottleneck related issues in new kernels added to the cluster. Overall, a simple and transparent methodology to study bottleneck generation on GPGPU kernels is proposed which proves useful for future applications like static chararacterizer and statics predictor.en_US
dc.description.tableofcontentsContenido Chapter 1 ..................................................................................................................................................... 1 Introduction .................................................................................................................................................. 1 1.1 Motivation .................................................................................................................................... 2 1.2 Research Questions .................................................................................................................. 3 1.3 Solution Overview ...................................................................................................................... 3 1.4 Main Contributions ..................................................................................................................... 4 1.5 Thesis Organization ................................................................................................................... 4 Chapter 2 ..................................................................................................................................................... 5 Background ................................................................................................................................................. 5 2.1 State of the art ............................................................................................................................ 5 A. Performance Models .................................................................................................................. 5 B. GPU simulators, emulators and compilers ............................................................................. 7 C. Workload Characterizers and Learning guided techniques ............................................. 7 2.2 Analysis of Areas of Opportunity.............................................................................................. 8 2.2 Theoretical Background .......................................................................................................... 12 A. GPU ............................................................................................................................................ 12 B. Hierarchical Clustering Analysis ............................................................................................ 14 Chapter 3 ................................................................................................................................................... 18 Methodology .............................................................................................................................................. 18 3.1 Obtaining Memory Queues Behavior History ...................................................................... 19 A. GPGPU-Sim Overview ............................................................................................................ 19 B. Extending GPGPU-Sim ........................................................................................................... 20 3.2 Obtaining Kernel signatures ................................................................................................... 29 A. Overview .................................................................................................................................... 29 B. Single Queue Peakness Coefficient ...................................................................................... 30 C. General Stage-wise Peakness Vector .............................................................................. 32 D. The coefficient of Variation and Mean factors ................................................................. 32 E. Final Kernel Signature ............................................................................................................. 33 3.3 Hierarchical Clustering Analysis ............................................................................................ 33 Chapter 4 ................................................................................................................................................... 35 Results ....................................................................................................................................................... 35 4.1 Introduction ................................................................................................................................ 35 4.2 First Study ................................................................................................................................. 35 A. Exploration Kernels ..................................................................................................................... 36 C. Dendrogram .......................................................................................................................... 37 4.3 Final Study................................................................................................................................. 40 A. Dendrogram ................................................................................................................................. 40 B. Number of clusters ...................................................................................................................... 40 C. Optimal Number of Clusters ...................................................................................................... 43 D. Final Radial Dendrogram ........................................................................................................... 45 4.4 Detecting Exploration kernels ................................................................................................. 45 4.5 Discussion ................................................................................................................................. 46 Chapter 5 ................................................................................................................................................... 48 Conclusion ................................................................................................................................................. 48 5.1 Summary ................................................................................................................................... 48 5.2 Contributions ............................................................................................................................. 48 5.3 Strengths and Limitations ....................................................................................................... 49 5.4 Future work ............................................................................................................................... 50 Bibliography .............................................................................................................................................. 51 Appendix A ................................................................................................................................................ 55 Acronyms and Abbreviations Definitions .............................................................................................. 55 Appendix B ................................................................................................................................................ 56 Variables and symbols ............................................................................................................................ 56 Apendix C: Configuration Options Fermi Architecture ........................................................................ 57 Appendix D: CSV files produced ........................................................................................................... 59 Appendix E: Added and modified GPGPU-Sim code ......................................................................... 63 Appendix F: Matlab Signature Calculation Code ................................................................................ 94 Appendix G: Different Variable Selection ............................................................................................. 96en_US
dc.format.extent100en_US
dc.format.mediumTextoen_US
dc.language.isoengen_US
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterreyesp
dc.relation.ispartof266632-CONACYT-SENER-S0019201401en_US
dc.rightsOpen Accessen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/us/*
dc.subject7 INGENIERÍA Y TECNOLOGÍAen_US
dc.titleGPGPU workload characterization using memory bottleneck detection and hierarchical clustering analysisen_US
dc.typeTesis de Maestría / Master Thesisen_US
dc.contributor.mentorCampuzano, Gabriel
dc.publisher.institutionInstituto Tecnológico y de Estudios Superiores de Monterreyen_US
dc.subject.keywordMemory Hierarchyen_US
dc.subject.keywordOptimizationen_US
dc.subject.keywordParallel Processingen_US
dc.subject.keywordPerformance Analysis and Design Aidsen_US
dc.contributor.institutionCampus Monterreyen_US
dc.contributor.institutionCampus Monterreyen_US
dc.contributor.institutionCampus Monterreyen_US
dc.subject.disciplineIngeniería y Ciencias Aplicadas / Engineering & Applied Sciencesen_US
dc.description.degreeMaestro en Ciencias con especialidad en Ingeniería Electrónicaen_US
dc.audience.educationlevelInvestigadores/Researchersen_US
refterms.dateFOA2019-01-03T15:58:32Z
dc.relation.impreso2018-12-03


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Open Access
Except where otherwise noted, this item's license is described as Open Access