{"_id":"58fbcd436b29580f00d8ff5b","version":{"_id":"5511fc8d0c1a08190077f90f","__v":11,"project":"5511fc8c0c1a08190077f90c","createdAt":"2015-03-25T00:08:45.273Z","releaseDate":"2015-03-25T00:08:45.273Z","categories":["5511fc8d0c1a08190077f910","5511fd52c1b13537009f5d31","568ecb0cbeb2700d004717ee","568ecb149ebef90d0087271a","568ecb1cbdb9260d00149d42","56a6a012b3ffe00d00156f1e","56a6bfe37ef6620d00e2f25f","58fbccb5809fc30f00f2dc03","58fbcd136b29580f00d8ff3a","5942ec4d50b8a900373ce9ff","59481476d305c20019295d8c"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"project":"5511fc8c0c1a08190077f90c","user":"58fbcc0bd8c0ba0f00cf52d6","__v":0,"category":{"_id":"58fbcd136b29580f00d8ff3a","__v":0,"project":"5511fc8c0c1a08190077f90c","version":"5511fc8d0c1a08190077f90f","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2017-04-22T21:37:23.604Z","from_sync":false,"order":1,"slug":"audio-retrieval-and-analysis","title":"Audio Retrieval and Analysis"},"parentDoc":null,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2017-04-22T21:38:11.241Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"If you are interested in providing us with feedback for the signal processing and Machine Learning principles behind our data analysis, take a look at the main Data Science thread in the [AKER Forum](https://community.akerkits.com/t/main-thread-current-work-status/326).\n[block:embed]\n{\n  \"html\": false,\n  \"url\": \"https://community.akerkits.com/t/main-thread-current-work-status/326\",\n  \"title\": \"Main Thread: Current Work Status\",\n  \"favicon\": \"https://community.akerkits.com/uploads/default/67/7a3ff00589c2d6bd.png\",\n  \"image\": \"http://community.akerkits.com/uploads/default/optimized/1X/409904e239680c27fa0b6237aa1a42a9430b94cd_1_690x487.png\",\n  \"iframe\": false,\n  \"width\": \"100%\",\n  \"height\": \"600\"\n}\n[/block]\n\n[block:api-header]\n{\n  \"title\": \"Introduction and System Overview\"\n}\n[/block]\nThe main goal of our data science workflow is to develop a processing system that can accurately determine, in real time, the state of a beehive out of a possible pool of states.\n\nAs research suggests ([see this Paper](https://drive.google.com/file/d/0BzmpC52tSYTKNGRNRW9VWE1URHM/view)) measuring the energy contained in certain ranges of the audio signal spectrum allows us to differentiate between those states successfully. This has proven to be the case in our preliminary research:\n\n[block:embed]\n{\n  \"html\": \"<iframe class=\\\"embedly-embed\\\" src=\\\"//cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F73RXNL-iYA4%3Ffeature%3Doembed&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D73RXNL-iYA4&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F73RXNL-iYA4%2Fhqdefault.jpg&key=02466f963b9b4bb8845a05b53d3235d7&type=text%2Fhtml&schema=youtube\\\" width=\\\"854\\\" height=\\\"480\\\" scrolling=\\\"no\\\" frameborder=\\\"0\\\" allowfullscreen></iframe>\",\n  \"url\": \"https://www.youtube.com/watch?v=73RXNL-iYA4\",\n  \"title\": \"OSBH - Audio Analysis Workflow\",\n  \"favicon\": \"https://s.ytimg.com/yts/img/favicon-vflz7uhzw.ico\",\n  \"image\": \"https://i.ytimg.com/vi/73RXNL-iYA4/hqdefault.jpg\"\n}\n[/block]\nHere is a list of the states that we aim to recognize:\n\n* Active\n* Dormant\n* Pre-Swarm\n* Swarm\n* Sick (Varroa)\n* Sick (Wax Moth)\n* Sick (Nosema)\n* Theft\n* Collapsed\n* Empty\n\nDue to the high number of states and their similarity, the required level of detail in the analysis of the different audio spectra might be too high for a classifier based on a set of fixed rules. Therefore, we have decided to apply Machine Learning techniques in order to work with variable rules that are able to improve as our database and community grows bigger.\n\nThe overall process can be visualized through this graph:\n\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/baea0a8-Machine_Learning.png\",\n        \"Machine Learning.png\",\n        862,\n        609,\n        \"#0e0c10\"\n      ]\n    }\n  ]\n}\n[/block]\nAs we can see, our audio recording database contains both audio and labels, which specify variables such as location and time of the recording and, most importantly, the actual state of the beehive during the recording. This will be used in order to make the system \"learn\" the rules that associate different behaviour patterns in the beehive with the differences in the recording.\n\nThese labels are gathered through our user community, who will work with us to pay close attention to their beehives and provide valuable information that allows us to establish an audio/beehive health mapping.\n\nAfter that, the audio in the database is processed to compute the energy in different frequency bands, as we noted earlier. This process is named \"feature extraction\" and will produce a feature vector as an output.\n\nEach of these feature vectors has an associated label, corresponding to its originating audio. All the generated feature vectors and their corresponding labels are run through a training algorithm, which will deduce the classification rules that minimize a certain error metric.\n\nThis set of rules are known as a Classifier Model, and can be tested extensively using our existing database.\n\nPerforming this workflow iteratively and automatically while our database grows will bring better accuracy, performance and functionality to new and existing users of the system, allowing it to improve on its own.\n[block:api-header]\n{\n  \"title\": \"Signal Conditioning\"\n}\n[/block]\nSince the beehive audio signals have a significant degree of variability (different beehives, microphone locations, environmental acoustics, background noise...) we have included a series of signal processing steps that will allow them to have more similar characteristics when facing the feature extractor. These are:\n\n* Resampling\n* Time Windowing\n* Normalization\n\n## Resampling\nDue to the insignificant high frequency content in beehive activity recordings, we can afford to resample the signals to just 6300 Hz, discarding all the spectral components above 3150 Hz (Nyquist theorem). This drop in frequency content can be observed in the following graph:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/dac8835-92d13781272fb295530355e65860183e5de0a3aa.jpg\",\n        \"92d13781272fb295530355e65860183e5de0a3aa.jpg\",\n        1281,\n        907,\n        \"#e99327\"\n      ],\n      \"caption\": \"Energy of frequency content\"\n    }\n  ]\n}\n[/block]\nDecibels are in a logarithmic scale so the drop would translate to 40-100x larger using a linear scale.\n\nThe application of this technique to the signal as a very first step brings two main advantages:\n\n* Much lower computational cost. Since every sample has to be processed in some way during the feature extraction processing, cutting the number of samples is the most effective way of reducing computational costs and enabling the system to be implemented in small and affordable computing solutions, such as our [Alpha Sensor Kit](doc:alpha-sensor-kit).\n\n* Much lower storage cost. This very low sampling rate allows the usage of a very low bitrate when storing our database, which in turn makes the database considerably smaller, reducing storage costs.\n\nOur [Alpha Sensor Kit](doc:alpha-sensor-kit) directly samples at 6300 Hz. When we use recordings made with different hardware, we resample the signals so the sampling rate matches 6300 Hz as well.\n\n## Time Windowing\n\nThe time windowing technique splits the signal in intervals or windows of a fixed length. The feature extraction process is then applied to the current time window only, which minimizes the effect of transients and the need to constantly produce estimates with every new input to the system:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/56094ab-f973491ce16cfad2beabb5d1047fbeb284360d7d.gif\",\n        \"f973491ce16cfad2beabb5d1047fbeb284360d7d.gif\",\n        595,\n        581,\n        \"#dbefe9\"\n      ],\n      \"caption\": \"Time Windowing\"\n    }\n  ]\n}\n[/block]\nAs of now, we are using the following parameters for our time windows:\n\n* Type: Rectangular (constant coeficients for every sample)\n* Length: 2 seconds\n* Overlap: 0 seconds\n\n## Normalization\n\nWe have shifted the normalization process to the Feature Extraction phase, so we encourage you to read the next part of the document.\n[block:api-header]\n{\n  \"title\": \"Feature Extraction\"\n}\n[/block]\nAfter the the raw signal is preprocessed, feature extraction can be performed. As we mentioned in the introduction, these features will consist of the energy stored in different frequency bands. In this post we will go over the methods used for computing that energy and deciding which frequency bands will be finally used for classification.\n\n## Computing the energy\n\nFor this stage, we are currently using the following method:\n\nBand-pass filter bank: One method for computing the energy in different frequency bands results from simply filtering out all of the signal spectrum except the band we are interested in (band-pass filtering) and computing the energy of the resulting signal. Doing so for every band of interest would result in the feature vector we are looking for. Here we can observe the frequency response of the filter bank:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/64da3bc-fbefea61a1e661314e55633e5d50faf886b0e501.png\",\n        \"fbefea61a1e661314e55633e5d50faf886b0e501.png\",\n        901,\n        692,\n        \"#827b73\"\n      ],\n      \"caption\": \"Frequency Response\"\n    }\n  ]\n}\n[/block]\n## Normalization\n\nSince we are recording audio from environments with lots of variation, a system that uses fixed energy levels in different frequency bands as features for a classifier would be extremely sensitive to temporary variations of said levels.\n\nTo deal with this problem, we have included a normalization stage that kicks in after the energy of the frequency bands is computed. Since we are dealing with energy, the proper way to normalize from a physical and mathematical standpoint is using energy as well.\n\nOur normalization process consists, therefore, in dividing each energy in the feature vector by the total energy of the signal. From the [Alpha Sensor Kit](doc:alpha-sensor-kit) firmware:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"//Update feature extractor with new sample\\n\\nvoid FeatureExtractor::update (float value)\\n{\\n\\tfloat energy_local;\\n\\tfor (int i=0; i<filters.size(); i++)\\n\\t{\\n\\t\\tenergy_local=filters[i].filter(value);\\n\\t\\tenergy[i]+=pow(energy_local,2);\\n\\t}\\n  \\t\\n  totalEnergy+=pow(value,2);\\n\\tsampleCount=sampleCount+1;\\n  \\n\\t//If sample count has reached window length, normalize, set flag as ready and reset sample count\\n\\tif(sampleCount>=windowSamples)\\n\\t{\\n\\t\\tfor (int i=0; i<filters.size(); i++)\\n\\t\\t{\\n\\t\\t\\tenergy[i]=energy[i]/totalEnergy;\\n\\t\\t}\\n\\t\\tready=true;\\n\\t\\tsampleCount=0;\\n\\t}\\n  \\n}\",\n      \"language\": \"cplusplus\"\n    }\n  ]\n}\n[/block]\n## Feature Selection\n\nIn a Machine Learning system, the process by which the features entering the classifier are determined is named \"feature selection.\" This process can be thought of as trying to discern which features, from all that are available, provide the most information on the state of the system.\n\nThere are a variety of algorithms that allow to do this. The algorithm we picked is named \"fixed-set Linear Forward Selection\", and we based our particular procedure on this paper: [Paper 2](http://www.cs.waikato.ac.nz/ml/publications/2009/guetlein_et_al.pdf).\n\nThe algorithm, in essence, will start by running a classification test using only one feature for every feature there is, and start grouping together those that produce better results, and testing those groups independently. Once adding any more features produces no significant improvement in accuracy, the algorithm stops and returns our selected feature set.\n\nIn our case, our feature vector will consist on the energy of 50 logarithmically spaced frequency points, ranging from 20 Hz to 3.1 KHz.\n\nAs of now and using our current database, these frequency bands seem to be the most determining: 48Hz, 120Hz, 546 Hz, 620 Hz, 750 Hz, 800 Hz and 850 Hz.\n\nWhen performing the actual training and classification, we limit our filter bank, feature extractor and model to consider these frequency bands only, saving a very considerable amount of computing time while maximizing accuracy.\n\n## Classification Algorithm\n\nGiven the relatively reduced range of beehive states currently present in our database, we have decided to use a decision tree classifier during our data collection phase.\n\nThere is a good general overview of how decision tree learning works on [Wikipedia](https://en.wikipedia.org/wiki/Decision_tree_learning).\n\nA few additional reasons for this decision are:\n\n* Very simple implementation\n* We can switch models just by passing a different String to the classifier (see [repository](https://github.com/opensourcebeehives/MachineLearning-Local))\n* Data is - as of writing - linearly separable by class:\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/4c72dfd-a.PNG\",\n        \"a.PNG\",\n        1214,\n        984,\n        \"#f1f1f0\"\n      ],\n      \"caption\": \"Data separability by class\"\n    }\n  ]\n}\n[/block]\nOur tree is generated from a randomized and balanced dataset, using all of our current significant data sets for different beehive states. These states currently consist of:\n\n* Active (Blue)\n* Swarm (Green)\n* Pre-Swarm (Red)\n\nCurrently, our F-Measure over 10-fold cross validation tests is over 97%.\n\nAs time goes on and our database size and complexity grow, we will switch to more advanced classification techniques that will be implemented in the [Alpha Sensor Kit](doc:alpha-sensor-kit).\n\n## Majority Voting\n\nAs mentioned earlier, our system produces estimates every 2 seconds. The [Alpha Sensor Kit](doc:alpha-sensor-kit) is configured, however, to produce 10 second recordings. This means we can produce about 5 estimates per recording. We use these additional estimates as a method for obtaining consensus and maximizing accuracy. The way we do this is simply taking the estimate that has the highest occurrence within the 5 estimate set:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"/** Performs majority voting on stored beehive states\\n*/\\nint majorityVoting(vector<int> states)\\n{\\n\\t//Count all states\\n\\tstd::array<int,11> count={0,0,0,0,0,0,0,0,0,0,0};\\n\\tfor( int i = 0; i < states.size(); i++ )\\n\\t{\\t\\n\\t\\tcount[states[i]]++;\\n\\t}\\n\\n\\t//Determine most repeated state\\n\\tint indexWinner = 1;\\n\\tfor( int i = 1; i < count.size(); i++ )\\n\\t{\\n\\t\\tif( count[i] > count[indexWinner] )\\n\\t\\t{\\n\\t\\t\\tindexWinner = i;\\n\\t\\t}\\n\\t}\\n\\treturn indexWinner;\\n}\",\n      \"language\": \"cplusplus\"\n    }\n  ]\n}\n[/block]","excerpt":"","slug":"theory-behind-audio-analysis","type":"basic","title":"Theory Behind BuzzBox Audio Analysis"}

Theory Behind BuzzBox Audio Analysis


If you are interested in providing us with feedback for the signal processing and Machine Learning principles behind our data analysis, take a look at the main Data Science thread in the [AKER Forum](https://community.akerkits.com/t/main-thread-current-work-status/326). [block:embed] { "html": false, "url": "https://community.akerkits.com/t/main-thread-current-work-status/326", "title": "Main Thread: Current Work Status", "favicon": "https://community.akerkits.com/uploads/default/67/7a3ff00589c2d6bd.png", "image": "http://community.akerkits.com/uploads/default/optimized/1X/409904e239680c27fa0b6237aa1a42a9430b94cd_1_690x487.png", "iframe": false, "width": "100%", "height": "600" } [/block] [block:api-header] { "title": "Introduction and System Overview" } [/block] The main goal of our data science workflow is to develop a processing system that can accurately determine, in real time, the state of a beehive out of a possible pool of states. As research suggests ([see this Paper](https://drive.google.com/file/d/0BzmpC52tSYTKNGRNRW9VWE1URHM/view)) measuring the energy contained in certain ranges of the audio signal spectrum allows us to differentiate between those states successfully. This has proven to be the case in our preliminary research: [block:embed] { "html": "<iframe class=\"embedly-embed\" src=\"//cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F73RXNL-iYA4%3Ffeature%3Doembed&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D73RXNL-iYA4&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F73RXNL-iYA4%2Fhqdefault.jpg&key=02466f963b9b4bb8845a05b53d3235d7&type=text%2Fhtml&schema=youtube\" width=\"854\" height=\"480\" scrolling=\"no\" frameborder=\"0\" allowfullscreen></iframe>", "url": "https://www.youtube.com/watch?v=73RXNL-iYA4", "title": "OSBH - Audio Analysis Workflow", "favicon": "https://s.ytimg.com/yts/img/favicon-vflz7uhzw.ico", "image": "https://i.ytimg.com/vi/73RXNL-iYA4/hqdefault.jpg" } [/block] Here is a list of the states that we aim to recognize: * Active * Dormant * Pre-Swarm * Swarm * Sick (Varroa) * Sick (Wax Moth) * Sick (Nosema) * Theft * Collapsed * Empty Due to the high number of states and their similarity, the required level of detail in the analysis of the different audio spectra might be too high for a classifier based on a set of fixed rules. Therefore, we have decided to apply Machine Learning techniques in order to work with variable rules that are able to improve as our database and community grows bigger. The overall process can be visualized through this graph: [block:image] { "images": [ { "image": [ "https://files.readme.io/baea0a8-Machine_Learning.png", "Machine Learning.png", 862, 609, "#0e0c10" ] } ] } [/block] As we can see, our audio recording database contains both audio and labels, which specify variables such as location and time of the recording and, most importantly, the actual state of the beehive during the recording. This will be used in order to make the system "learn" the rules that associate different behaviour patterns in the beehive with the differences in the recording. These labels are gathered through our user community, who will work with us to pay close attention to their beehives and provide valuable information that allows us to establish an audio/beehive health mapping. After that, the audio in the database is processed to compute the energy in different frequency bands, as we noted earlier. This process is named "feature extraction" and will produce a feature vector as an output. Each of these feature vectors has an associated label, corresponding to its originating audio. All the generated feature vectors and their corresponding labels are run through a training algorithm, which will deduce the classification rules that minimize a certain error metric. This set of rules are known as a Classifier Model, and can be tested extensively using our existing database. Performing this workflow iteratively and automatically while our database grows will bring better accuracy, performance and functionality to new and existing users of the system, allowing it to improve on its own. [block:api-header] { "title": "Signal Conditioning" } [/block] Since the beehive audio signals have a significant degree of variability (different beehives, microphone locations, environmental acoustics, background noise...) we have included a series of signal processing steps that will allow them to have more similar characteristics when facing the feature extractor. These are: * Resampling * Time Windowing * Normalization ## Resampling Due to the insignificant high frequency content in beehive activity recordings, we can afford to resample the signals to just 6300 Hz, discarding all the spectral components above 3150 Hz (Nyquist theorem). This drop in frequency content can be observed in the following graph: [block:image] { "images": [ { "image": [ "https://files.readme.io/dac8835-92d13781272fb295530355e65860183e5de0a3aa.jpg", "92d13781272fb295530355e65860183e5de0a3aa.jpg", 1281, 907, "#e99327" ], "caption": "Energy of frequency content" } ] } [/block] Decibels are in a logarithmic scale so the drop would translate to 40-100x larger using a linear scale. The application of this technique to the signal as a very first step brings two main advantages: * Much lower computational cost. Since every sample has to be processed in some way during the feature extraction processing, cutting the number of samples is the most effective way of reducing computational costs and enabling the system to be implemented in small and affordable computing solutions, such as our [Alpha Sensor Kit](doc:alpha-sensor-kit). * Much lower storage cost. This very low sampling rate allows the usage of a very low bitrate when storing our database, which in turn makes the database considerably smaller, reducing storage costs. Our [Alpha Sensor Kit](doc:alpha-sensor-kit) directly samples at 6300 Hz. When we use recordings made with different hardware, we resample the signals so the sampling rate matches 6300 Hz as well. ## Time Windowing The time windowing technique splits the signal in intervals or windows of a fixed length. The feature extraction process is then applied to the current time window only, which minimizes the effect of transients and the need to constantly produce estimates with every new input to the system: [block:image] { "images": [ { "image": [ "https://files.readme.io/56094ab-f973491ce16cfad2beabb5d1047fbeb284360d7d.gif", "f973491ce16cfad2beabb5d1047fbeb284360d7d.gif", 595, 581, "#dbefe9" ], "caption": "Time Windowing" } ] } [/block] As of now, we are using the following parameters for our time windows: * Type: Rectangular (constant coeficients for every sample) * Length: 2 seconds * Overlap: 0 seconds ## Normalization We have shifted the normalization process to the Feature Extraction phase, so we encourage you to read the next part of the document. [block:api-header] { "title": "Feature Extraction" } [/block] After the the raw signal is preprocessed, feature extraction can be performed. As we mentioned in the introduction, these features will consist of the energy stored in different frequency bands. In this post we will go over the methods used for computing that energy and deciding which frequency bands will be finally used for classification. ## Computing the energy For this stage, we are currently using the following method: Band-pass filter bank: One method for computing the energy in different frequency bands results from simply filtering out all of the signal spectrum except the band we are interested in (band-pass filtering) and computing the energy of the resulting signal. Doing so for every band of interest would result in the feature vector we are looking for. Here we can observe the frequency response of the filter bank: [block:image] { "images": [ { "image": [ "https://files.readme.io/64da3bc-fbefea61a1e661314e55633e5d50faf886b0e501.png", "fbefea61a1e661314e55633e5d50faf886b0e501.png", 901, 692, "#827b73" ], "caption": "Frequency Response" } ] } [/block] ## Normalization Since we are recording audio from environments with lots of variation, a system that uses fixed energy levels in different frequency bands as features for a classifier would be extremely sensitive to temporary variations of said levels. To deal with this problem, we have included a normalization stage that kicks in after the energy of the frequency bands is computed. Since we are dealing with energy, the proper way to normalize from a physical and mathematical standpoint is using energy as well. Our normalization process consists, therefore, in dividing each energy in the feature vector by the total energy of the signal. From the [Alpha Sensor Kit](doc:alpha-sensor-kit) firmware: [block:code] { "codes": [ { "code": "//Update feature extractor with new sample\n\nvoid FeatureExtractor::update (float value)\n{\n\tfloat energy_local;\n\tfor (int i=0; i<filters.size(); i++)\n\t{\n\t\tenergy_local=filters[i].filter(value);\n\t\tenergy[i]+=pow(energy_local,2);\n\t}\n \t\n totalEnergy+=pow(value,2);\n\tsampleCount=sampleCount+1;\n \n\t//If sample count has reached window length, normalize, set flag as ready and reset sample count\n\tif(sampleCount>=windowSamples)\n\t{\n\t\tfor (int i=0; i<filters.size(); i++)\n\t\t{\n\t\t\tenergy[i]=energy[i]/totalEnergy;\n\t\t}\n\t\tready=true;\n\t\tsampleCount=0;\n\t}\n \n}", "language": "cplusplus" } ] } [/block] ## Feature Selection In a Machine Learning system, the process by which the features entering the classifier are determined is named "feature selection." This process can be thought of as trying to discern which features, from all that are available, provide the most information on the state of the system. There are a variety of algorithms that allow to do this. The algorithm we picked is named "fixed-set Linear Forward Selection", and we based our particular procedure on this paper: [Paper 2](http://www.cs.waikato.ac.nz/ml/publications/2009/guetlein_et_al.pdf). The algorithm, in essence, will start by running a classification test using only one feature for every feature there is, and start grouping together those that produce better results, and testing those groups independently. Once adding any more features produces no significant improvement in accuracy, the algorithm stops and returns our selected feature set. In our case, our feature vector will consist on the energy of 50 logarithmically spaced frequency points, ranging from 20 Hz to 3.1 KHz. As of now and using our current database, these frequency bands seem to be the most determining: 48Hz, 120Hz, 546 Hz, 620 Hz, 750 Hz, 800 Hz and 850 Hz. When performing the actual training and classification, we limit our filter bank, feature extractor and model to consider these frequency bands only, saving a very considerable amount of computing time while maximizing accuracy. ## Classification Algorithm Given the relatively reduced range of beehive states currently present in our database, we have decided to use a decision tree classifier during our data collection phase. There is a good general overview of how decision tree learning works on [Wikipedia](https://en.wikipedia.org/wiki/Decision_tree_learning). A few additional reasons for this decision are: * Very simple implementation * We can switch models just by passing a different String to the classifier (see [repository](https://github.com/opensourcebeehives/MachineLearning-Local)) * Data is - as of writing - linearly separable by class: [block:image] { "images": [ { "image": [ "https://files.readme.io/4c72dfd-a.PNG", "a.PNG", 1214, 984, "#f1f1f0" ], "caption": "Data separability by class" } ] } [/block] Our tree is generated from a randomized and balanced dataset, using all of our current significant data sets for different beehive states. These states currently consist of: * Active (Blue) * Swarm (Green) * Pre-Swarm (Red) Currently, our F-Measure over 10-fold cross validation tests is over 97%. As time goes on and our database size and complexity grow, we will switch to more advanced classification techniques that will be implemented in the [Alpha Sensor Kit](doc:alpha-sensor-kit). ## Majority Voting As mentioned earlier, our system produces estimates every 2 seconds. The [Alpha Sensor Kit](doc:alpha-sensor-kit) is configured, however, to produce 10 second recordings. This means we can produce about 5 estimates per recording. We use these additional estimates as a method for obtaining consensus and maximizing accuracy. The way we do this is simply taking the estimate that has the highest occurrence within the 5 estimate set: [block:code] { "codes": [ { "code": "/** Performs majority voting on stored beehive states\n*/\nint majorityVoting(vector<int> states)\n{\n\t//Count all states\n\tstd::array<int,11> count={0,0,0,0,0,0,0,0,0,0,0};\n\tfor( int i = 0; i < states.size(); i++ )\n\t{\t\n\t\tcount[states[i]]++;\n\t}\n\n\t//Determine most repeated state\n\tint indexWinner = 1;\n\tfor( int i = 1; i < count.size(); i++ )\n\t{\n\t\tif( count[i] > count[indexWinner] )\n\t\t{\n\t\t\tindexWinner = i;\n\t\t}\n\t}\n\treturn indexWinner;\n}", "language": "cplusplus" } ] } [/block]