{"@context":"http://iiif.io/api/presentation/3/context.json","id":"https://arsc.aviaryplatform.com/iiif/w950g3hp1h/manifest","type":"Manifest","label":{"en":["Content-based Music Retrieval System for Ethnomusicological Sound Archives"]},"logo":"https://d9jk7wjtjpu5g.cloudfront.net/organizations/logo_images/000/000/019/original/ARSC_Full_Logo_RGB_K.jpg?1605438091","metadata":[{"label":{"en":["Agent"]},"value":{"en":["Michael Blass (Presenter)","Jessica Wood (Chair)","Michael Biel (Videographer)","Leah Biel (Videographer)"]}},{"label":{"en":["Date"]},"value":{"en":["2018-05-11 (Created)"]}},{"label":{"en":["Format"]},"value":{"en":["Video","Audio"]}},{"label":{"en":["Description"]},"value":{"en":["\u003cp\u003eDigitalization has been accelerating the growth of ethnomusicological sound archives. As a consequence, studies may become unwieldy since the vast amount of available material is intractable for human listeners. We propose a content-based music retrieval system for ethnomusicological sound archives that allows for data access by rhythm similarity. The system analyses each audio file of a given collection by extracting onsets-synchronous timbral features. From each time series a Hidden Markov Model is trained. These are considered as a rhythm fingerprint that represents the music’s rhythmic structure in terms of timbre. A self-organizing map is utilized to project the high-dimensional fingerprints onto a two-dimensional map in order to make them human comprehensible. This technique preserves the topology of the high-dimensional feature space, which results in similar map positions for similar rhythms. A classification by rhythm similarity is thus achieved. The system, therefore, supports musicologist studies in several ways: the rhythm fingerprinting does not rely on a specific theory of music. Music from any culture can be analyzed and compared in an objective and unbiased manner. Retrieval by similarity allows for an explorative approach to the underlying collection which may generate interesting new hypothesis. It also supports studies by accelerating access to relevant data and thus helps to keep research efficient. The system is currently implemented in the Ethnographic Sound Recordings Archive of the University of Hamburg.\u003c/p\u003e"]}},{"label":{"en":["Language"]},"value":{"en":["English"]}},{"label":{"en":["Publisher"]},"value":{"en":["Association for Recorded Sound Collections"]}},{"label":{"en":["Rights Statement"]},"value":{"en":["\u003cp\u003eCopyright Association for Recorded Sound Collections\u003c/p\u003e"]}},{"label":{"en":["Video Editor"]},"value":{"en":["Nathan Georgitis"]}}],"summary":{"en":["\u003cp\u003eDigitalization has been accelerating the growth of ethnomusicological sound archives. As a consequence, studies may become unwieldy since the vast amount of available material is intractable for human listeners. We propose a content-based music retrieval system for ethnomusicological sound archives that allows for data access by rhythm similarity. The system analyses each audio file of a given collection by extracting onsets-synchronous timbral features. From each time series a Hidden Markov Model is trained. These are considered as a rhythm fingerprint that represents the music\u0026rsquo;s rhythmic structure in terms of timbre. A self-organizing map is utilized to project the high-dimensional fingerprints onto a two-dimensional map in order to make them human comprehensible. This technique preserves the topology of the high-dimensional feature space, which results in similar map positions for similar rhythms. A classification by rhythm similarity is thus achieved. The system, therefore, supports musicologist studies in several ways: the rhythm fingerprinting does not rely on a specific theory of music. Music from any culture can be analyzed and compared in an objective and unbiased manner. Retrieval by similarity allows for an explorative approach to the underlying collection which may generate interesting new hypothesis. It also supports studies by accelerating access to relevant data and thus helps to keep research efficient. The system is currently implemented in the Ethnographic Sound Recordings Archive of the University of Hamburg.\u003c/p\u003e"]},"requiredStatement":{"label":{"en":["Attribution"]},"value":{"en":["\u003cp\u003eCopyright Association for Recorded Sound Collections\u003c/p\u003e"]}},"provider":[{"id":"https://arsc.aviaryplatform.com/aboutus","type":"Agent","label":{"en":["Association for Recorded Sound Collections"]},"homepage":[{"id":"https://arsc.aviaryplatform.com/","type":"Text","label":{"en":["Association for Recorded Sound Collections"]},"format":"text/html"}],"logo":[{"id":"https://d9jk7wjtjpu5g.cloudfront.net/organizations/logo_images/000/000/019/original/ARSC_Full_Logo_RGB_K.jpg?1605438091","type":"Image"}]}],"thumbnail":[{"id":"https://d9jk7wjtjpu5g.cloudfront.net/collection_resource_files/thumbnails/000/097/544/small/open-uri20200922-6764-1jihs3w_1600816414.jpg?1600802040","type":"Image","format":"image/jpeg"}],"items":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544","type":"Canvas","label":{"en":["Media File 1 of 2 - open-uri20200922-6764-1jihs3w.mp4"]},"duration":1686.80533,"width":640,"height":360,"thumbnail":[{"id":"https://d9jk7wjtjpu5g.cloudfront.net/collection_resource_files/thumbnails/000/097/544/small/open-uri20200922-6764-1jihs3w_1600816414.jpg?1600802040","type":"Image","format":"image/jpeg"}],"items":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/content/1","type":"AnnotationPage","items":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/content/1/annotation/1","type":"Annotation","motivation":"painting","body":{"id":"https://aviary-p-arsc.s3.wasabisys.com/collection_resource_files/resource_files/000/097/544/original/open-uri20200922-6764-1jihs3w.mp4?1600801986","type":"Video","format":"video/mp4","duration":1686.80533,"width":640,"height":360},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544","metadata":[]}]}],"annotations":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126","type":"AnnotationPage","label":{"en":["AUTO_TRINT_Content-based Music Retrieval System for Ethnomusicological Sound Archives [Transcript]"]},"items":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/1","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"To the first non plenary session of Ask the Archives one in McCarrell room. I am excited to be chairing a session. We've got four excellent presenters giving three presentations. Each presentation will be 20 minutes, followed by about 10 minutes of questions. So without further ado, let me announce our first panelist, Michael Bloss. Michael studied historical musicology, systematic musicology and philosophy in Saarbrucken and Hamburg, Germany. He was employed as a lecturer in signal processing at the Institute for Systematic Musicology in Hamburg between 2014 and 2017. Since 2017, he has been working as a developer for music analysis software in the d. F g. Funded Computational Music and sound archiving project at the University of Hamburg. His talk today is entitled Content Based Music Retrieval System for Ethno Musicological Sound Archives. Please join me in welcoming Michael Bloss. Yes. Thank you. Just give me a nice introduction. Today, I'm gonna give you some insights into our project, which is concerned with the analysis of especially after the musicological archives by computers in content based way. That means we are looking at the audio signals and not as some meta data. OK. What is an ethno musical logical or an ethnographic archive? It's just the result of ethnographic fieldwork. That means musicologists go to cultures they're interested in and record in cultured a musical culture and preserve these recordings in an archive. So that means the first goal yet of the first goal of the graphic archives is always conservation or preservation. You see some photos of the Moken from Thailand which are making music. And yes, preservation is extremely important since currently there's no body of the Moken live in and to continue their culture. And a colleague of mine who's included took all these photos is now there to helped and to preserve the culture, the music, the culture, especially the ethnographic archives.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=12.25,187.68"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/2","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"A very special piece he did in a minute. And there is a second goal to ethnographic archives, which is exploration. That means archives are not just there to conserve some sound files or of some cultural things, but you can explored archives and maybe find new scientific hypothesis. There are different approaches currently to Audu archives. The first approach, I call it the classical approach, which is, for example, implemented in the archive for the music of Africa, which is located in mines in Germany. They feature about ten thousand sound carry us and an unknown number of single takes with a digitization process. Program is calling progress. They have only access by catalog. That means you have to go there, room, crawl for the catalog, find the things that you might find important, and then you can listen to the audio files. This approach supports exploration, but it's absolutely not feasible for big music. Eickhoff for big ethnographic archives. On the other hand, there is. Oh, here is an example of what they have in their archives. OK, it's nothing like what this is recording from 1980. From a somebody in Central Africa. The second approach is a digital musical, a logical approach to musicology approach, which is commonly purely for the future, a future based. That means you have different archives and you extract some audio features that you might find important. And then you compare only these features. That means you compare some some kind of the shape of different archives. So the advantage is that you can compare lots of sounds. Thousands, hundreds and thousands of sounds. But you can not listen to one single audio file. And that does not support ethnographic work. We need to hear it sounds. So in our project, we try to get the best of both worlds.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=188.22,370.58"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/3","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"And ADD is a catalog based search engine, which allows for online access to upload the sounds from the field where it actually is and to provide access to all other interests of people are musicologists. And on the other hand, we need a content based on music information retrieval machine that extracts sound features from the audio files and orders them by some kind of music similarity. That's the idea. We now look at our own archive. It's called the Apple Graphics Sound Recordings Archive of the University of Hamburg. You can go to the homepage. I give you the address later on and try it out. This archive consists currently of three different parts. The first part is the so-called Wilhelm Hide, its collection of African music. A minus was a German musicologists located in Hamburg. And this collection features about 500 different pieces collected from the north of Africa. Currently, we have 350 pieces digitized. And these are mostly gramophone records recorded between 1960 and Muddy for eight in about 40 regions among 50 ethnics of Africa, of West Africa and the North especially. Here's an example. This is the original Amelia. OK. It's a Turkish song recorded on Tesler, Nikki and 1920s. The next jam We Have No Eickhoff is an excerpt of Taichiro Congress of Rap Music. We have only one hundred free pieces of it. And that was. Yes, collected in chorus. Recorded in nineteen thirty two. Yes. And we have only pieces from Algeria, Tunisia, Iraq, Turkey and Syria. These recordings are very, very good. You can hear. OK, the third segment of our archive is the Rothbart, a few collecting core recordings of Bottas professor for systematic musicology in Hamburg. He provided one of the 19 pieces, especially from South East, South East Asia and especially Myanmar.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=371.42,569.05"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/4","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"So Lanka and Cambodia. And I have no audio file here. We may be here one later on, but you have to trust me. It is very good recordings. OK. You move on. OK. What are the Johnsons and Agnes Karthick sound archives? They are songs to sell its specialty. First of all is the meta data. When you have, you have to collect meta data somehow and some times this missing in our archive, we sometimes have all the meta data missing and sometimes sometimes just one field in this field. And this example is a year of recording. And yes, for audio recordings we don't have we have an image. We don't have any meta data. So. And years. Very good point. We've been catalog based archive. You cannot explore recordings that don't have meta data. How would you do this? It's impossible. You have to go into the audio file. Next fall is plenty of classes and unbalanced data currently and music information retrieval system that organizes archives, music archives is set up as an some kind of classification task for classification task. You need classes where all the instances belong to. And in order to build a good classifier, you need balanced data. That means every class should have approximately the same number of examples in its in ethnographic archives. This doesn't work, for example, in the Estra. We have about eight in the volume Highland's collection sorry, are above eight per person from the two Iraq and another eight per cent is missing data. If you look at the number of tracks per country, you see we have over 14 recordings from Algeria, but only two recordings from Syria. On this basis, you cannot build a reliable classifier. We have a lot of heterogeneous data.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=570.25,711.05"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/5","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"That means, unlike in an archive of classical music, we don't know what's in there. When we have an archive of classical music, we have some thought about it, some some music theory we can imply into the rooms. That's not possible for ethnographic data. For example, we have Chorus's. And extremely complicated, the temporal mixtures. And best of all, we have lots of tests, tons. These are very problematic. Next on it, we have a zoo here, a lot of noise in it, and that is also a problem for classifiers, because when you have all the recordings with a lot of noise and new recordings with absolutely no noise, the classifier may become quite sensitive to the noise and make this classification unreliable. OK. In our computation of Music and Sound Archives project, we tried to provide rich meta data structure so that musicologists can work with their common workflow going into a catalog search for songs. And we provide visual visualization of these archives by music similarity. We do this by extracting low level temporal feature and aggregate this to song level rhythm feature and visualize the mutual similarity of this song level feature on a two dimensional map. Here's an example how the rhythm feature extraction works. We assume that between this book, there is a difference between these both runes. First one. OK. So the board like, OK, this doesn't work. First one's boom. Checks. Checks. Checks. Saying one is checks, checkpoints, checks. Tippens. So these bills, are that exactly the same in terms of time, but they are different in terms of timbre, because in the second half of the this measure, we just swapped the bass and the snare drum. And this implies different rhythmical feeling. We tried to track this feeling. Yes.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=711.68,877.6"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/6","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"Let's go on. By extracting a temporary feature for each and every time instance where there is a musical or relevant sound, as is onset detection, so we detect the onsets. In each recording and extract the temporal feature, which is an outcast. The Spectral Centroid Spectrum Center is known to be correlating very well with the perception of brightness and music. And many studies since the 1970s showed that respect to some period is the first the best perceptual dimension to attract a temporal similarity. You mean. Yes. OK, schools. Unlike Tamil sufferers, a mere frequency sweeps through called fishnets, which do not have anything in common with you. Perception of brightness or something else. Even though the MASC is used in most music information retrieval systems that cover of time and timbre. So all of you have a system where first saurus, which is in our case yesterday, the Ethnographic Sound Recordings Archive. Then we go to a few preprocess and steps. We have a feature extraction. OK. Then we extract from each and every answer at the spectrum center. Right. And then we feed this time series of spectral centroid to the so-called hidden Markov model, which aggregates all the single we'll use to a song level feature. And what it actually does is it computes. In a two dimensional matrix, how did how how high did the probability is to go from one timbre to the next timbre? And that for all timbres that are found in the song. Okay, well, we done do is we put this two dimensional matrix to a self organizing map, which orders these mattresses by similarity. And when it's ORUs is by similarity of the Matrix, we assume that the same ordering is. It's the same Oring which is given in perception of a Fuchs.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=878.56,1036.44"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/7","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"This is. Yes, Speckle Central and so on. OK. Zarkov. And here's an example of how it looks like you can use the system right now if you want to. What we see on the left side is the so-called distance matrix of the self organizing map, and each color shows how different the piece on it is to its neighbor. OK. The rest of the black triangles are the songs which are currently selected on the right side, mitigator. So what we now assume is that a song in the lower right is more similar to another song, the lower right, than to a song in the upper left, for example. You can. Yes, you can throw from the top down and on the on the lower left side. You can extract. Yeah. You can display some Mitta information. There are some information and you can show a lot of songs. Maybe if you want to you can analyze over a large period of time. The beats per minute of a song for example. OK. I can't force me to my conclusion. Ethnographic salt archives of strong potential that goes far beyond mere conservation that this exploration. And currently there is a lot of effort to utilize this potential. The classical methods supports the exploration approach, but are not adequate for big archives. And that's why we want the big data methods. But in a way that then ethnomusicology can use. OK. This is another private example of my nap of one of these maps. Each red dot is a piece. And the William Highland's collection. And what is circled by black line are all the test tones. What is underlined and green? These are a trumpet drum language. Examples from the Dwolla. Just for an example. More than one.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=1037.91,1185.25"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/8","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"OK, I forgot. In the red circle, we have the piece we just heard. And next to it is a song from a very different culture, but with the same ductus. You've got to find. This is how the system works. OK, yes. This conclusion is the most difficult problems evaluation. We can't evaluate self organizing map because it's willful. Explorer two data analysis. There is no evaluation. That is, we agree. We rely on expert on experts to test the system and tell us how good it actually is. And if you want to test the system, you can go to our home page and lock in for free, of course, and try to systems. And of course, you can listen to all the examples that are in the archive. Thank you for your attention. I'll be happy to take your questions if there are any are there any questions for Michael? For the items you didn't have any metadata for, that said, like any. If you find out information about those recordings later on. Do you have a systematic way of filling in that information and getting it populated online? Yes, of course. Do you do it? I struggle with that managing, closing that loop. So if I get you right, you mean if I lost it's the song of a computer methods. Is there a way to to update the meta data, more traditional metadata like where it's from what year? Yes. OK. No, we have an archive. And you can just puts the meta data in as you want to. Yeah. Yeah, yeah, yeah. That was it. I just want to get the address. All your Ralphie. This one, I think they want the you. OK. OK. OK, sorry. Do we have any other questions? I have a question, you mentioned that you were keeping track of noise, and I'm just curious how how do you distinguish what counts as noise and what counts as part of the actual recording, his plans for the future? No.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=1187.58,1391.44"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/9","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"Currently, we are at a stage as an early stage model of our system. And yes, just just to prove that it works, we are going to refine it in the next two years. OK. And we know we don't really keep track of noise. Right. OK. Thank you. Thank you, Joe, for the presentation. So you're focusing on the sound, right? And putting this through the archive, through whatever amazing computer technology this is. Have you discovered other things as you're working on your art archive and other things to take into consideration? Like I was looking at like the labels of the records themselves, or are you photographing those and putting them in? And I noticed there was like a gramophone record. But is that produced in the United States or is that in another country or could they be bootlegs that have this? And are you capturing that data or is it just such an immense amount of data? You're not going to focus on that now? We have a field and all meta data system that captures the publisher and yes, the publisher, of course. And I tried a classification by publisher, but it doesn't seem to work. But this is somehow clear because we are capturing rhythm, similarity and not publish a similarity. And as you classify an archive which is built around the rhythms in the Darity, there is no chance dead on long publisher, which just a special kind of written appears. So this is why it doesn't work. And has anyone approached you about because you're obviously collecting an amazing amount of material, approached you to like looking at your archive to pull out other things that you weren't looking at? Like non not yet, because the archive. We have all these recordings at our institutes and they were in the basement for about 60 years and we are now recovering them and try to make sense of it.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=1392.17,1521.5"},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/10","type":"Annotation","motivation":"transcribing","body":{"type":"TextualBody","value":"Thank you. Anyway, one more question. In your example, you said that there were two different completely. Two completely different recordings. How is it that then if you have one recording, say, being made in Syria and another one being made in the Far East Asia, how are you not confusing cultural? How is another cultural conflicts happening there in your archives and your documentation? And what is Eickhoff, should the chief, is that we just want the songs next to each other. That's really sound similar to each other. That means if the song is now from Syria or from another country of the world, we we don't capture it is cultural information on the computation, enough congestion, computational methods just for low level perception. Yeah, low level too broad perception, nothing else. And should just support ethnomusicologists work so that you may be booked up and a hypothesis around what cultures in the world are and rhythmical sense near to each other. This is also a puzzle. OK, ok, we have three minutes. We have a short question. Hello. Are you also collecting melody and harmony features? No, not at this point. Its plans for the future, especially for the next few months. We are currently researching for methods, computational methods which are adequate for capturing monophone and polyfoam melody features. We don't have them currently and we are not inventing all this over rhythmicity. So Groomes exists already and have proved to be working as. And so we just have to look what it's out there and try to implement it. Thank you very much for the questions and thank you to Michael, to us. Michael.","format":"text/plain"},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544#t=1526.09,1668.91"}]},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126","type":"AnnotationPage","label":{"en":["English [Transcript]"]},"items":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/97544/transcript/19126/annotation/11","type":"Annotation","motivation":"subtitling","body":{"type":"TextualBody","value":"https://d9jk7wjtjpu5g.cloudfront.net/file_transcripts/associated_files/000/019/126/original/open-uri20200924-1389-1xnts13?1600960427","format":"text/vtt","language":"en"},"target":"https://d9jk7wjtjpu5g.cloudfront.net/file_transcripts/associated_files/000/019/126/original/open-uri20200924-1389-1xnts13?1600960427"}]}]},{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/256037","type":"Canvas","label":{"en":["Media File 2 of 2 - ARSC_conf_2018_Blass_audio.mp3"]},"duration":1691.95944,"width":640,"height":360,"thumbnail":[{"id":"https://d9jk7wjtjpu5g.cloudfront.net/public/images/audio-default.png","type":"Image","format":"image/png"}],"items":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/256037/content/1","type":"AnnotationPage","items":[{"id":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/256037/content/2/annotation/1","type":"Annotation","motivation":"painting","body":{"id":"https://aviary-p-arsc.s3.wasabisys.com/collection_resource_files/resource_files/000/256/037/original/ARSC_conf_2018_Blass_audio.mp3?1730826350","type":"Audio","format":"audio/mpeg","duration":1691.95944,"width":640,"height":360},"target":"https://arsc.aviaryplatform.com/collections/1143/collection_resources/29708/file/256037","metadata":[]}]}],"annotations":[]}]}