dc.contributor.advisor | Ávila, Alfonso | |
dc.contributor.advisor | Dieck, Graciano | |
dc.contributor.advisor | Salgado, Ricardo | |
dc.contributor.advisor | Peña, Raúl | |
dc.creator | Freire Bermudez, Luis Alberto | |
dc.date.accessioned | 2019-01-03T15:58:31Z | |
dc.date.available | 2019-01-03T15:58:31Z | |
dc.date.created | 2016-10 | |
dc.identifier.citation | Freire, L. (2018). GPGPU Workload Characterization Using Memory Bottleneck Detection and Hierarchical Clustering Analysis (tesis de maestría). Instituto Tecnológico y de Estudios Superiores Monterrey, Monterrey, México | en_US |
dc.identifier.uri | http://hdl.handle.net/11285/632729 | |
dc.description.abstract | The use of Graphic Processing Units (GPU) for General Computing (GPGPU) has become increasingly common in recent years. In this type of processor, memory bottlenecks are a critical issue and the way data are commissioned to the partitions can cause several requests to get stalled behind each other, waiting for resources.
In this thesis, a methodology to characterize GPGPU kernels based on their likeability to create bottlenecks in the GPGPU memory hierarchy is presented. A GPGPU simulator is used to obtain unique fingerprints from more than 100 workloads and classify them using a Hierarchical Clustering Analysis.
The thesis also shows that that optimizations made to the kernels impact its run time memory bottleneck generation and that this behavior is successfully detected by the methodology. Two major groups of kernels were defined, naïve and optimized ones, and to characterize a set of exploration kernels within those groups with an effectiveness rate of over 75% for the two groups. A discussion is also held about how different levels of optimizations can be identified by our clustering engine and how those results could be use by subsequent approaches to predict bottleneck related issues in new kernels added to the cluster. Overall, a simple and transparent methodology to study bottleneck generation on GPGPU kernels is proposed which proves useful for future applications like static chararacterizer and statics predictor. | en_US |
dc.description.tableofcontents | Contenido
Chapter 1 ..................................................................................................................................................... 1
Introduction .................................................................................................................................................. 1
1.1 Motivation .................................................................................................................................... 2
1.2 Research Questions .................................................................................................................. 3
1.3 Solution Overview ...................................................................................................................... 3
1.4 Main Contributions ..................................................................................................................... 4
1.5 Thesis Organization ................................................................................................................... 4
Chapter 2 ..................................................................................................................................................... 5
Background ................................................................................................................................................. 5
2.1 State of the art ............................................................................................................................ 5
A. Performance Models .................................................................................................................. 5
B. GPU simulators, emulators and compilers ............................................................................. 7
C. Workload Characterizers and Learning guided techniques ............................................. 7
2.2 Analysis of Areas of Opportunity.............................................................................................. 8
2.2 Theoretical Background .......................................................................................................... 12
A. GPU ............................................................................................................................................ 12
B. Hierarchical Clustering Analysis ............................................................................................ 14
Chapter 3 ................................................................................................................................................... 18
Methodology .............................................................................................................................................. 18
3.1 Obtaining Memory Queues Behavior History ...................................................................... 19
A. GPGPU-Sim Overview ............................................................................................................ 19
B. Extending GPGPU-Sim ........................................................................................................... 20
3.2 Obtaining Kernel signatures ................................................................................................... 29
A. Overview .................................................................................................................................... 29
B. Single Queue Peakness Coefficient ...................................................................................... 30
C. General Stage-wise Peakness Vector .............................................................................. 32
D. The coefficient of Variation and Mean factors ................................................................. 32
E. Final Kernel Signature ............................................................................................................. 33
3.3 Hierarchical Clustering Analysis ............................................................................................ 33
Chapter 4 ................................................................................................................................................... 35
Results ....................................................................................................................................................... 35
4.1 Introduction ................................................................................................................................ 35
4.2 First Study ................................................................................................................................. 35
A. Exploration Kernels ..................................................................................................................... 36
C. Dendrogram .......................................................................................................................... 37
4.3 Final Study................................................................................................................................. 40
A. Dendrogram ................................................................................................................................. 40
B. Number of clusters ...................................................................................................................... 40
C. Optimal Number of Clusters ...................................................................................................... 43
D. Final Radial Dendrogram ........................................................................................................... 45
4.4 Detecting Exploration kernels ................................................................................................. 45
4.5 Discussion ................................................................................................................................. 46
Chapter 5 ................................................................................................................................................... 48
Conclusion ................................................................................................................................................. 48
5.1 Summary ................................................................................................................................... 48
5.2 Contributions ............................................................................................................................. 48
5.3 Strengths and Limitations ....................................................................................................... 49
5.4 Future work ............................................................................................................................... 50
Bibliography .............................................................................................................................................. 51
Appendix A ................................................................................................................................................ 55
Acronyms and Abbreviations Definitions .............................................................................................. 55
Appendix B ................................................................................................................................................ 56
Variables and symbols ............................................................................................................................ 56
Apendix C: Configuration Options Fermi Architecture ........................................................................ 57
Appendix D: CSV files produced ........................................................................................................... 59
Appendix E: Added and modified GPGPU-Sim code ......................................................................... 63
Appendix F: Matlab Signature Calculation Code ................................................................................ 94
Appendix G: Different Variable Selection ............................................................................................. 96 | en_US |
dc.format.extent | 100 | en_US |
dc.format.medium | Texto | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Instituto Tecnológico y de Estudios Superiores de Monterrey | esp |
dc.relation.ispartof | 266632-CONACYT-SENER-S0019201401 | en_US |
dc.rights | Open Access | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/us/ | * |
dc.subject | 7 INGENIERÍA Y TECNOLOGÍA | en_US |
dc.title | GPGPU workload characterization using memory bottleneck detection and hierarchical clustering analysis | en_US |
dc.type | Tesis de Maestría / Master Thesis | en_US |
dc.contributor.mentor | Campuzano, Gabriel | |
dc.publisher.institution | Instituto Tecnológico y de Estudios Superiores de Monterrey | en_US |
dc.subject.keyword | Memory Hierarchy | en_US |
dc.subject.keyword | Optimization | en_US |
dc.subject.keyword | Parallel Processing | en_US |
dc.subject.keyword | Performance Analysis and Design Aids | en_US |
dc.contributor.institution | Campus Monterrey | en_US |
dc.contributor.institution | Campus Monterrey | en_US |
dc.contributor.institution | Campus Monterrey | en_US |
dc.subject.discipline | Ingeniería y Ciencias Aplicadas / Engineering & Applied Sciences | en_US |
dc.description.degree | Maestro en Ciencias con especialidad en Ingeniería Electrónica | en_US |
dc.audience.educationlevel | Investigadores/Researchers | en_US |
refterms.dateFOA | 2019-01-03T15:58:32Z | |
dc.relation.impreso | 2018-12-03 | |