Tesis
Permanent URI for this communityhttps://hdl.handle.net/11285/345119
Colección de Tesis y Trabajos de grado (informe final del proyecto de investigación, tesina, u otro trabajo académico diferente a Tesis, sujeto a la revisión y aceptación de una comisión dictaminadora) presentados por alumnos para obtener un grado académico del Tecnológico de Monterrey.
Para enviar tu trabajo académico al RITEC, puedes consultar este Infográfico con los pasos generales para que tu tesis sea depositada en el RITEC.
Browse
Search Results
- A generalist reinforcement learning agent for compressing multiple convolutional neural networks(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12-11) González Sahagún, Gabriel; Conant ablos, Santiago Enrique; emipsanchez; Ortíz Bayliss, José Carlos; Cruz Duarte, Jorge Mario; Gutiérrez Rodríguez, Andrés Eduardo; School of Engineering and Sciences; Campus MonterreyDeep Learning has achieved state-of-the-art accuracy in multiple fields. A common practice in computer vision is to reuse a pre-trained model for a completely different dataset of the same type of task, a process known as transfer learning, which reduces training time by reusing the filters of the convolutional layers. However, while transfer learning can reduce training time, the model might overestimate the number of parameters needed for the new dataset. As models now achieve near-human performance or better, there is a growing need to reduce their size to facilitate deployment on devices with limited computational resources. Various compression techniques have been proposed to address this issue, but their effectiveness varies depending on hyperparameters. To navigate these options, researchers have worked on automating model compression. Some have proposed using reinforcement learning to teach a deep learning model how to compress another deep learning model. This study compares multiple approaches for automating the compression of convolutional neural networks and proposes a method for training a reinforcement learning agent that works across multiple datasets without the need for transfer learning. The agents were tested using leaveone- out cross-validation, learning to compress a set of LeNet-5 models and testing on another LeNet-5 model with different parameters. The metrics used to evaluate these solutions were accuracy loss and the number of parameters of the compressed model. The agents suggested compression schemes that were on or near the Pareto front for these metrics. Furthermore, the models were compressed by more than 80% with minimal accuracy loss in most cases. The significance of these results is that by escalating this methodology for larger models and datasets, an AI assistant for model compression similar to ChatGPT can be developed, potentially revolutionizing model compression practices and enabling advanced deployments in resource-constrained environments.
- Machine translation for suicide detection: validating spanish datasetsusing machine and deep learning models(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-11) Arenas Enciso, Francisco Ariel; Zareel, Mahdi; emipsanchez; García Ceja, Enrique Alejandro; Roshan Biswal, Rajesh; School of Engineering and Sciences; Sede EGADE MonterreySuicide is a complex health concern that affects not only individuals but society as a whole. The application of traditional strategies to prevent, assess, and treat this condition has proven inefficient in a modern world in which interactions are mainly made online. Thus, in recent years, multidisciplinary efforts have explored how computational techniques could be applied to automatically detect individuals who desire to end their lives on textual input. Such methodologies rely on two main technical approaches: text-based classification and deep learning. Further, these methods rely on datasets labeled with relevant information, often sourced from clinically-curated social media posts or healthcare records, and more recently, public social media data has proven especially valuable for this purpose. Nonetheless, research focused on the application of computational algorithms for detecting suicide or its ideation is still an emerging field of study. In particular, investigations on this topic have recently considered specific factors, like language or socio-cultural contexts, that affect the causality, rationality, and intentionality of an individual’s manifestation, to improve the assessment made on textual data. Consequently, problems like the lack of data in non-Anglo-Saxon contexts capable of exploiting computational techniques for detecting suicidal ideation are still a pending endeavor. Thus, this thesis addresses the limited availability of suicide ideation datasets in non-Anglo-Saxon contexts, particularly for Spanish, despite its global significance as a widely spoken language. The research hypothesizes that Machine- Translated Spanish datasets can yield comparable results (within a ±5% performance range) to English datasets when training machine learning and deep learning models for suicide ideation detection. To test this, multiple machine translation models were evaluated, and the two most optimal models were selected to translate an English dataset of social media posts into Spanish. The English and translated Spanish datasets were then processed through a binary classification task using SVM, Logistic Regression, CNN, and LSTM models. Results demonstrated that the translated Spanish datasets achieved scores in performance metrics close to the original English set across all classifiers, with limited variations in accuracy, precision, recall, F1-score, ROC AUC, and MCC metrics remaining within the hypothesized ±5% range. For example, the SVM classifier on the translated Spanish sets achieved an accuracy of 90%, closely matching the 91% achieved on the original English set. These findings confirm that machine-translated datasets can serve as effective resources for training ML and DL models for suicide ideation detection in Spanish, thereby supporting the viability of extending suicide detection models to non-English-speaking populations. This contribution provides a methodological foundation for expanding suicide prevention tools to diverse linguistic and cultural contexts, potentially benefiting health organizations and academic institutions interested in psychological computation.
- Object detection-based surgical instrument tracking in laparoscopy videos(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Guerrero Ramírez, Cuauhtemoc Alonso; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Monterrey; Medina Pérez, Miguel ÁngelMinimally invasive surgery (MIS) has transformed surgery by offering numerous advantages over traditional open surgery, such as reduced pain, minimized trauma, and faster recovery times. However, endoscopic MIS procedures remain highly operator-dependent, demanding significant skill from the surgical team to ensure a positive postoperative outcome for the patient. The implementation of computer vision techniques such as reliable surgical instru ment detection and tracking can be leveraged for applications such as intraoperative decision support, surgical navigation assistance, and surgical skill assessment, which can significantly improve patient safety. The aim of this work is to implement a Multiple Object Tracking (MOT) benchmark model for the task of surgical instrument tracking in laparoscopic videos. To this end, a new dataset is introduced, m2cai16-tool-tracking, based on the m2cai16-tool locations dataset, specifically designed for surgical instrument tracking. This dataset includes both bounding box annotations for instrument detection and unique tracking ID annotations for multi-object tracking. This work employs ByteTrack, a state-of-the-art multiple-object tracking algorithm that follows the tracking-by-detection paradigm. ByteTrack predicts tool positions and associates object detections across frames, allowing consistent tracking of each instrument. The object detection step is performed using YOLOv4, a state-of-the-art object detection model known for real-time performance. YOLOv4 is first trained on the m2cai16-tool-locations dataset to establish a baseline performance and then on the custom m2cai16-tool-tracking dataset, al lowing to compare the detection performance of the custom dataset with an existing object detection dataset. YOLOv4 generates bounding box predictions for each frame in the laparo scopic videos. The bounding box detections serve as input for the ByteTrack algorithm, which assigns unique tracking IDs to each instrument to maintain their trajectories across frames. YOLOv4 achieves robust object detection performance on the m2cai16-tool-locations dataset, obtaining a mAP50 of 0.949, a mAP75 of 0.537, and a mAP50:95 of 0.526, with a real-time inference speed of 125 fps. However, detection performance on the m2cai16-tool tracking dataset is slightly lower, with a mAP50 of 0.839, mAP75 of 0.420, and mAP50:95 of 0.439, suggesting that differences in data partitioning impact detection accuracy. This lower detection accuracy for the tracking dataset likely affects the tracking performance of ByteTrack, reflected in a MOTP of 76.4, MOTA of 56.6, IDF1 score of 22.8, and HOTAscore of 23.0. Future work could focus on improving the object detection performance to enhance tracking quality. Additionally, including appearance-based features into the track ing step could improve association accuracy of detections across frames and help maintain consistent tracking even in challenging scenarios like occlusions. Such improvements could enhance tracking reliability to support surgical tasks better.
- Pre-diagnosis of diabetic retinopathy implementing supervised learning algorithms using an ocular fundus Latin-American dataset for cross-data validation(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-02) De la Cruz Espinosa, Emanuel; FUENTES AGUILAR, RITA QUETZIQUEL; 229297; Fuentes Aguilar, Rita Quetziquel; emipsanchez; García González, Alejandro; Ochoa Ruiz, Gilberto; Abaunza González, Hernán; School of Engineering and Sciences; Campus MonterreyNowadays diabetes is a disease with worldwide presence and high mortality rate, causing a big social and economic impact. One of the major negative effects of diabetes is visual loss due to diabetic retinopathy (DR). To prevent this condition is necessary to identify referable patients by screening for DR, and complementing with an Optic Coherence Tomography (OCT), that is another study to perform an early detection of blindness doing several longitudinal scans at a series of lateral locations to generate a map of reflection sites in the sample and display it as a two-dimensional image achieving transmission images in turbid tissue. Regrettably the number of ophthalmologists and OCT devices is not enough to provide an adequate health care to the diabetic population. Although there exist AI systems capable of do DR screening, they do not aim the assessment specifically in macula area considering visible and proliferated anomalies, signs of high damage and late intervention. This work presents three surpevised machine learnig algorithms; a Random Forest (RF) classifier, a Convolutional Neural Network (CNN) model, and a transfer learning (TL) pretrained model able to sort fundus images in three classes as an fundus images exclusive database is labeled. Processing techniques such as channel splitting, color space transforms, histogram and spatial based filters and data augmentation are used in order to detect presence of diabetic retinopathy. The stages of this work are: Publicly available dataset debugging, macular segmentation and cropping, data pre-processing, features extraction, model training, test and validation performance evaluation with a exclusive Latin-American dataset considering accuracy, sensitivity and specificity as metrics. The best results achieved are a 61.22% of accuracy, 86.67% of sensitivity and 89.47% of specificity.
- Motor imagery analysis with deep learning for potential application in motor impairment rehabilitation(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022) Lomelín Ibarra, Vicente Alejandro; CANTORAL CEBALLOS, JOSE ANTONIO; 261286; Cantoral Ceballos, José Antonio; emipsanchez; School of Engineering and Sciences; Campus Monterrey; Gutierrez Rodriguez, Andrés EduardoMotor imagery is a complex mental task that represents muscular movement without the execution of muscular action, involving cognitive processes of motor planning and sensorimotor proprioception of the body. The mental process signals of motor imagery are found in the cortical areas of sensory and motor processing of the brain. Since the mental task has similar behavior to that of the motor execution process, it is used to create rehabilitation routines for patients with a form of Motor Skill Impairment. Due to the nature of this mental task, its execution is complicated. It usually requires subject’s training to perform it adequately. The mental task has also proved to vary among subjects, making it difficult to create a general method to process the signals. EEG signal acquisition provides a non-invasive method to acquire electrical potentials generated by neural activity. The techniques provide good temporal resolution, but poor spatial resolution, acquiring signals from every area of the brain. This leads to the problem of mixing different signals from different cognitive processes. To compensate for this problem, filtering and feature extraction are required to isolate the desired signals. Due to this problem, the classification of these signals in scenarios such as Brain-Computer Interface systems tends to have a poor performance. Deep Learning has proved to improve the classification of data fed into it, identifying patterns corresponding to the signal of interest. Throughout this thesis project for the Computer Science Master’s Program, different deep learning architectures were designed in order to classify the execution of Motor Imagery. For this work, a variety of representation of the EEG signal were prepared to serve as an input for the models. Forms of representations include image-based spectrograms, 2D and 3D matrix arrangements, and 1D vectors. In addition, the generated samples consider a process of channel selection to limit the information to the region of interest of the motor cortex. Additionally, this work considers an asymmetric hemispheric channel selection in order to represent the state of the brain during the execution of the mental task at different areas of the motor cortex independently. The best results were observed with a single channel spectrogram representation of the signal as an input for a CNN model, with a reported classification accuracy of 93.3%. Promising results were also obtained through the 1D CNN models, with a classification accuracy of 86.12%. Although the results were not as high, promising results were observed with the 2D CNN models with a 2D and 3D matrix as their input, with reported accuracies that outperformed the state-of-the-art. Lastly, the implementation of sequential models to analyze the signal as a time series was able to return results that outperformed the state-of-the-art with the devised asymmetrical 9- and 5-Channel selection.
- A comparative study of deep learning-based image captioning models for violence description(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2019-10-10) González Martínez, Fernando; CONANT PABLOS, SANTIAGO ENRIQUE; 56551; Conant Pablos, Santiago Enrique; emijzarate; Hugo Terashima, Marín; González Mendoza, Miguel; Nimrod González, Franco; Escuela de Ingeniería y Ciencias; Campus MonterreyThe safety and security of people will always hold one of the top positions for governments, countries, states, enterprises, and families. One of the greatest advances in the field of security technologies was the invention of surveillance cameras, giving public and private owners the possibility to observe recorded past events to protect their property. Giving undeniable proof of events that occurred when they were not present. It is safe to say that most corporations and some homes have some type of security technology, from the simplest surveillance system to more complicated technologies, such as facial and fingerprint recognition. With these types of security systems, there exists a drawback, the volume of data generates from each of them. When talking about surveillance cameras we have thousands of hours being recorded and stored for later access to review any past event. The problem arises when the volume of data generated surpasses the capability of humans to analyze it. However, should humans decide to analyze it, human errors become a factor too, as the quantity and nature of the data could overwhelm, and cause humans to miss an event that should not be missed. In this work, the events contain violence and suspicious behavior, such as robberies, assaults, street riots, and fights, among others. Thus, presenting the need for a system that can recognize such events happening and generate a brief description for a faster interpretation by the humans using the system. The field of image captioning and video captioning have been present in computer science for the past decade. Image captioning works by converting an image and words into features using deep learning models, combining them, and creating predictions from what the model believes should be the output for a given state. Given the time for which this task has existed, Image Captioning has been through many changes in the development of its models. The basic model utilizes convolutional neural networks for image analysis and recurrent neural networks for sentence analysis and generation. The addition of attention further improved the results from these models by teaching models where to focus when analyzing images and sentences. Finally, the creation of the Transformer, which has dominated the field in most tasks, thanks to the ability to perform most of its calculations in parallel, thus being faster than past models. The performance improvements can be seen thanks to previous works that are on top of the leaderboards for image recognition, text generation, and captioning. The purpose of this work is to create and train models to generate descriptions of normal and violent images. The models proposed in this work are Encoder-Decoder, Encoder-Decoder using Attention layers, and Transformers. The dataset used as a base for this work is the Flickr8k dataset. This dataset is a collection of around 8000 images with 5 descriptions each, obtained through human consultation. For this work, we extended the dataset to include violent images and their descriptions. The descriptions were obtained by asking a group of three persons to describe the image shown, mentioning subjects, objects, actions, and places as best they could. The images were retrieved by using Microsoft’s Bing API. The models were then evaluated using BLEU-N, METEOR, CIDEr, and ROUGE-L. These are machine translation evaluation metrics that are used to compare generated sentences to reference sentences and obtain an objective metric. Results show that the models can generate sentences that describe normal and violent images. However, the Soft-Attention model obtained the best performance over normal and violent images. Given our results, these models can generate descriptions of violent and normal images. The availability of these models could help analyze images found on the web, giving a brief description before opening images containing violent content. The results obtained can be used as a base to further improve these models and the possibility of creating models that can analyze violent videos. This could result in a system that is capable of analyzing images and videos in the background and generating a brief description of the events found in them, potentially leading to better reaction times from security and increased crime prevention.
- Attention YOLACT++: achieving robust and real-time medical instrument segmentation in endoscopic procedures.(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-04) Ángeles Cerón, Juan Carlos; Chang Fernández, Leonardo; 345979; Chang Fernández, Leonardo; emipsanchez; González Mendoza, Miguel; Alí, Sharib; Escuela de Ingeniería y Ciencias; Campus Monterrey; Ochoa Ruiz, GilbertoImage-based tracking of laparoscopic instruments via instance segmentation plays a fundamental role in computer and robotic-assisted surgeries by aiding surgical navigation and increasing patient safety. Despite its crucial role in minimally invasive surgeries, accurate tracking of surgical instruments is a challenging task to achieve because of two main reasons 1) complex surgical environment, and 2) lack of model designs with both high accuracy and speed. Previous attempts in the field have prioritized robust performance over real-time speed rendering them unfeasible for live clinical applications. In this thesis, we propose the use of attention mechanisms to significantly improve the recognition capabilities of YOLACT++, a lightweight single-stage instance segmentation architecture, which we target at medical instrument segmentation. To further improve the performance of the model, we also investigated the use of custom data augmentation, and anchor optimization via a differential evolution search algorithm. Furthermore, we investigate the effect of multi-scale feature aggregation strategies in the architecture. We perform ablation studies with Convolutional Block Attention and Criss-cross Attention modules at different stages in the network to determine an optimal configuration. Our proposed model CBAM-Full + Aug + Anch drastically outperforms the previous state-of-the art in commonly used robustness metrics in medical segmentation, achieving 0.435 MI_DSC and 0.471 MI_NSD while running at 69 fps, which is more than 12 points more robust in both metrics and 14 times faster than the previous best model. To our knowledge, this is the first work that explicitly focuses on both real-time performance and improved robustness.
- Detection of suspicious attitudes on video using neuroevolved shallow and deep neural networks models(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-11) Flores Munguía, Carlos; Terashima Marín, Hugo; puemcuervo/tolmquevedo; Oliva, Diego; Ortiz Bayliss, Jose Carlos; School of Engineering and Sciences; Campus MonterreyThe analysis of surveillance cameras is a critical task usually limited by the people involved in the video supervision devoted to such a task, their knowledge, and their judgment. Security guards protect other people from different events that can compromise their security, like robbery, extortion, fraud, vehicle theft, and more, converting them to an essential part of this type of protection system. If they are not paying attention, crimes may be overlooked. Nonetheless, different approaches have arisen to automate this task. The methods are mainly based on machine learning and benefit from developing neural networks that extract underlying information from input videos. However, despite how competent those networks have proved to be, developers must face the challenging task of defining the architecture and hyperparameters that allow the network to work adequately and optimize the use of computational resources. Furthermore, selecting the architecture and hyperparameters may significantly impact the neural networks’ performance if it is not carried out adequately. No matter the type of neural network used, shallow, dense, convolutional, 3D convolutional, or recurrent; hyperparameter selection must be performed using empirical knowledge thanks to the expertise of the designer, or even with the help of automated approaches like Random Search or Bayesian Optimization. However, such methods suffer from problems like not covering the solution space well, especially if the space is made up of large dimensions. Alternatively, the requirement to evaluate the models many times to get more information about the evaluation of the objective function, employing a diverse set of hyperparameters. This work proposes a model that generates, through a genetic algorithm, neural networks for behavior classification within videos. The application of genetic algorithms allows the exploration in the hyperparameters solution space in different directions simultaneously. Two types of neural networks are evolved as part of the thesis work: shallow and deep networks, the latter based on dense layers and 3D convolutions. Each sort of network takes distinct input data types: the evolution of people’s pose and videos’ sequences, respectively. Shallow neural networks are generated by NeuroEvolution of Augmented Topologies (NEAT), while CoDeepNEAT generates deep networks. NEAT uses a direct encoding, meaning that each node and connection in the network is directly represented in the chromosome. In contrast, CoDeepNEAT uses indirect encoding, making use of cooperative coevolution of blueprints and modules. This work trains networks and tests them using the Kranok-NV dataset, which exhibited better results than their competitors on various standard metrics.
- Histopathological image classification using deep learning(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-11) Arredondo Padilla, Braulio; Martínez Ledesma, Juan Emmanuel; emipsanchez; Tamez Peña, José Gerardo; Santos Díaz, Alejandro; Martínez Torteya, Antonio; Escuela de Ingeniería y ciencias; Campus MonterreyThis thesis presents a study of digital pathology classification using and combining several techniques of machine learning and deep learning. Cancer is one of the most common causes of death around the world. One of the main complications of the disease is the prediction in the final stage. Nowadays there are many different studies to obtain a correct diagnosis on time. Some of these studies are tissue biopsies. These samples are analyzed by a pathologist, which must observe pixel by pixel a whole image of high dimensions to give a diagnostic of the disease, including stage and class. This activity takes weeks, even for experts, because usually several samples are extracted from a single patient. To speed up and facilitate this process, several models have been developed for digital pathology classification. With these models, it is easier to discard many patient slides than the traditional method, then, the main activity for a pathologist is to confirm a diagnosis with the most relevant or complicated sample. The downside of these models is that most of them are based on deep learning, a technique that is well known for its great performance, but also for its high requirements like graphic processors and memory resources. Consequently, we performed a complete analysis of several convolutional neural networks used in different ways to compare outcomes and efficiency. In addition, we include techniques such as recurrent neural networks and machine learning. Several models of deep learning and machine learning are presented as alternatives to convolutional neural networks, including 5 computer vision techniques. The main objective of our project is to perform a real alternative capable to achieve similar outcomes to deep learning with limited resources. The experiments were successful, including a real alternative for deep learning for the classification of 3 different types of cancer with an area under the curve higher than 90%.
- Prognosis using Deep Learning in CoViD-19 patients(Instituto Tecnológico y de Estudios Superiores de Monterrey) Guadiana Álvarez, José Luis; MORALES MENENDEZ, RUBEN; 30452; Morales Menéndez, Rubén; emipsanchez; Vargas Martínez, Adriana; Ramírez Mendoza, Ricardo Ambrocio; School of Engineering and Sciences; Campus Monterrey; Rojas Flores, Etna AuroraPrognostics study the prediction of an event before it happens, to enable efficient critical decision making. Over the past few years, it has gained a lot of research attention in many fields, i.e. manufacture, economics, and medicine. Particularly in medicine, prognostics are very useful for front line physicians to predict how a disease may affect a patient and react accordingly to save as many lives as possible. One clear example is the recently discovered Coronavirus Disease 2019 (CoViD-19). Because of its novelty, not nearly enough is known about the virus’ behaviour and Key Performance Indicators (KPIs) to asses a mortality prediction. However, using a lot of complex and expensive medical biomarkers could be impossible for many low budget hospitals. This motivates the development of a prediction model that not only maximizes performance, but does so using the least amount of biomarkers possible. For mortality risk prediction, falsely assuming that a patient has a low mortality risk is far more critical than the opposite. Therefore, false negative predictions should be prioritized over false positive ones. This research project proposes a CoViD-19 mortality risk calculator based on a Deep Learning model trained on a data set provided by the HM Hospitales from Madrid, Spain. A pre-processing strategy for unbalanced classes and feature selection is proposed. Benefit of using over-sampling and imputation techniques is evaluated. Also, an imputation method based on the K-Nearest Neighbor (KNN) algorithm for biomarker data is is proposed and its efficiency is evaluated. Results are compared against a Random Forest (RF) model while showing the trade-off between feature input space and the number of samples available. Results on the MPCD score show the proposed DL outperforms the proposed RF on every data set when evaluating even with an over-sampling technique. Finally, the proposed KNN method proves beneficial for data imputation, improving the model’s Recall score from 0:87 to 0:90.