dc.contributor.advisor	Nolazco Flores, Juan Arturo
dc.contributor.author	Báez Suárez, Abraham
dc.creator	BAEZ SUAREZ, ABRAHAM; 328083	es_MX
dc.date.accessioned	2020-04-17T16:43:34Z
dc.date.available	2020-04-17T16:43:34Z
dc.date.created	2020-04-16
dc.date.issued	2020-04-16
dc.identifier.citation	Báez Suárez, A. (2020). Unsupervised Deep Learning Recurrent Model for Audio Fingerprinting (Doctoral Dissertation). Instituto Tecnológico y de Estudios Superiores de Monterrey (ITESM), Monterrey, México. https://hdl.handle.net/11285/636319	es_MX
dc.identifier.doi	https://dl.acm.org/doi/10.1145/3380828
dc.identifier.uri	https://hdl.handle.net/11285/636319
dc.description.abstract	Audio fingerprinting techniques were developed to index and retrieve audio samples by comparing a content-based compact signature of the audio instead of the entire audio sample, thereby reducing memory and computational expense. Different techniques have been applied to create audio fingerprints, however, with the introduction of deep learning, new data-driven unsupervised approaches are available. This doctoral dissertation presents a Sequence-to-Sequence Autoencoder Model for Audio Fingerprinting (SAMAF) which improved hash generation through a novel loss function composed of terms: Mean Square Error, minimizing the reconstruction error; Hash Loss, minimizing the distance between similar hashes and encouraging clustering; and Bitwise Entropy Loss, minimizing the variation inside the clusters. The performance of the model was assessed with a subset of VoxCeleb1 dataset, a "speech in-the-wild" dataset. Furthermore, the model was compared against three baselines: Dejavu, a Shazam-like algorithm; Robust Audio Fingerprinting System (RAFS), a Bit Error Rate (BER) methodology robust to time-frequency distortions and coding/decoding transformations; and Panako, a constellation algorithm-based adding time-frequency distortion resilience. Extensive empirical evidence showed that our approach outperformed all the baselines in the audio identification task and other classification tasks related to the attributes of the audio signal with an economical hash size of either 128 or 256 bits for one second of audio. Additionally, the developed technology was deployed into two 9-1-1 Emergency Operation Centers (EOCs), located in Palm Beach County (PBC) and Greater Harris County (GH), allowing us to evaluate the performance in real-time in an industrial environment.	es_MX
dc.format.medium	Texto	es_MX
dc.language.iso	eng
dc.language.iso	eng	es_MX
dc.publisher	Instituto Tecnológico y de Estudios Superiores de Monterrey
dc.relation	Department of Homeland Security (DHS) D15PC00185	es_MX
dc.relation	Consejo Nacional de Ciencia y Tecnología (CONACYT) 328083	es_MX
dc.relation	North Atlantic Treaty Organization (NATO) G4919	es_MX
dc.relation.isFormatOf	versión publicada	es_MX
dc.relation.isreferencedby	REPOSITORIO NACIONAL CONACYT
dc.rights	openAccess	es_MX
dc.rights	openAccess
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0	*
dc.subject.classification	INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LAS TELECOMUNICACIONES	es_MX
dc.subject.lcsh	Technology	es_MX
dc.title	Unsupervised Deep Learning Recurrent Model for Audio Fingerprinting	es_MX
dc.type	Tesis Doctorado / doctoral Thesis
dc.contributor.department	Escuela de Ingeniería y Ciencias	es_MX
dc.contributor.committeemember	Vargas Rosales, César Vargas
dc.contributor.committeemember	Gutiérrez Rodríguez, Andrés Eduardo
dc.contributor.committeemember	Rodríguez Dagnino, Ramón Martín
dc.contributor.committeemember	Loyola González, Octavio
dc.identifier.orcid	https://orcid.org/0000-0001-8729-0781
dc.subject.keyword	Artificial Intellligence	es_MX
dc.subject.keyword	Machine Learning	es_MX
dc.subject.keyword	Deep Learning	es_MX
dc.subject.keyword	Unsupervised Learning	es_MX
dc.subject.keyword	Sequence-to-Sequence Autoencoder	es_MX
dc.subject.keyword	Audio Fingerprinting	es_MX
dc.subject.keyword	Audio Identification	es_MX
dc.subject.keyword	Music Information Retrieval	es_MX
dc.contributor.institution	Campus Monterrey	es_MX
dc.description.degree	Doctorado en Tecnologías de la Información y Comunicaciones	es_MX
dc.identifier.cvu	328083
dc.audience.educationlevel	Investigadores/Researchers	es_MX
dc.relation.impreso	2020-04-15
dc.identificator	7\|\|33\|\|3325	es_MX

Unsupervised Deep Learning Recurrent Model for Audio Fingerprinting

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)