% Copyright (C) 2014 H. Kuehne % % This program is free software: you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation, either version 3 of the License, or % (at your option) any later version. % % This program is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You should have received a copy of the GNU General Public License % along with this program. If not, see . %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This package contains a set of helper-functions to support temporal sequence parsing and recognition with HTK in Matlab. The code is free for any personal or academic use under GNU-GPL without any warranty. Please regard licensing of third-party packages. The functions have only been tested under Win so far, but should also be able to run under Linux as well. Any comments and recommendations are always welcome. Please contact me under kuehne@ira.uka.de -------------------------------------- Part I : Installation -------------------------------------- To run the demo you need to: Step 1) Install HTK : http://htk.eng.cam.ac.uk/ (please make sure it works by running some examples) Step 2) Download and extract training and test data 'hist_h3d_c30.rar' (TODO: link) ´ Step 3) Download and extract corresponding segmentations 'segmentation.rar' (TODO: link) -------------------------------------- Part II : Set environmental variables -------------------------------------- For the following steps, it’s highly recommended to use full path names as the script switches between different directories. You can use relative path names, but make sure, you know what you do. All following scripts are located in the breakfast_demo folder. Step 1) The path to the HTK binaries is hardcoded in the function get_htk_path.m . Please adapt this function to your local configuration. Step 2) Adjust the script demo_breakfast.m . Step 2.1) Adjust the folder of the demo_bundle and the htk wrapper (line 5 + 7). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Path and variables .... % demo_bundle: addpath(genpath('C:\tmp\demo_bundle')); % htk wrapper: addpath(genpath('C:\tmp\matlab_htk')); Step 2.2) Set the folder where the demo_breakfast.m script is located (root folder) and the folder with the training and test data (any frame representation) (line 12 + 14). % root folder with this script path_root = 'C:\tmp\demo_bundle\demo_breakfast'; % folder with the input data path_input = 'C:\tmp\hist_h3d_c30\'; Step 2.3) Set the output folder (line 19) % folder to write temporary files and output: path_output = 'C:\tmp\htk\'; In this folder two subfolders, 'generated' and 'output' will be created: - 'generated' : contains the htkhmm-files and is used for all temporary files needed for and/or generated by HTK. htkhmm-files are store in case the training is interrupted and will be loaded by default if it's resumed afterwards. - 'output' contains: - the HTK hmm file, containing hmm descriptions for each unit: e.g. breakfast_-2sts_1mix_s1.hmms (contains: name_sts_mix_s.hmms) - the reference file, containing the original (loaded) annotations: e.g. demo_breakfast_ref_s1.ref.mlf (contains: name_s.ref.mlf) - the recognition file, containing the recognized units: e.g. demo_breakfast_output_s1.reco.mlf (contains: name_s.reco.mlf) - the dictionary list: e.g. breakfastlist.txt - the test file list: e.g. breakfasttest.list You can modify the naming of the output by adapting the related variables in the config struct in the get_breakfast_demo_config.m or by overwriting them. Step 2.4) Set directory with segmentations (line 47) % folder with segmentation files (xml-style) config.features_segmentation = [path_seg, '\segmentation']; Step 2.5) Set dictionary and grammar file (line 51 + 55). If you unpacked the bundle, they should be located in the same folder as the demo_breakfast.m . % dictionary file config.dict_file = ['', path_root, '\breakfast.dict']; % grammar file config.grammar_file = ['', path_root, '\breakfast.grammar']; ... and you should be good to go! If you want to change anything, just have a look at the config struct. -------------------------------------- Part III : Run the demo -------------------------------------- Run the demo by simply running the script: > demo_breakfast which calls the function (line 59) run_htk(config); If all paths are correct, you should see the list of loaded files and the output of htk training and recognition. The overall runtime for one test and training was ~3.2h on a 3.30 GHz Intel i5. The output is the overall sequence recognition accuracy, the confusion matrix as well as the test and predicted labels of each sequence. The recognized units are listed in the output files demo_breakfast_output_s1.reco.mlf. For the evaluation on sequence and unit level you can run: % evaluation of sequences [accuracy_seq, confmat_seq, test_label_seq, predicted_label_seq] = get_results_seq(config) % evaluation of units [accuracy_seq, acc_unit_parsing, acc_unit_rec, acc_units_perFrames, res_all] = get_results_units(config, vis_on) Input for both is the config and, for the unit evaluation a visualization flag showing the test and recognized sequences on unit level. Output of the unit evaluation is as follows: - acc_activity - the accuracy of sequence labels - acc_sequence_all - the accuracy of unit parsing - acc_units_perFrames - the accuracy of frame-based unit evaluation (segmentation accuracy) - res_all - contains the following evaluation of test and reference data: res_all.test_label_units - the test units for each sequence res_all.predicted_label_units - the predicted units for each sequence res_all.accuracy_units_dtw - the unit accuracy based on unit error rate (1 - unit_error_rate) after alignment by DTW res_all.test_label_sequence_dtw - the test units for each sequence after alignment by DTW res_all.predicted_label_sequence_dtw - the predicted units for each sequence after alignment by DTW res_all.accuracy_sequence_dtw - the accuracy of units per sequence after alignment by DTW res_all.test_label_units_perFrames - the test frames (in unit labels) for each sequence res_all.predicted_label_units_perFrames - the recognized frames (in unit labels) for each sequence res_all.accuracy_units_perFrames - the per frame accuracy of each sequence res_all.accuracy_action - accuracy for sequence recognition res_all.test_label_action - test sequence labels res_all.predicted_label_action - recognized sequence labels For an analysis of the htk output, please consider the htk book. -------------------------------------- Part IV : Optimization -------------------------------------- If you want to run HTK on new data, it might be necessary to adapt various elements to improve the overall performance. Most settings are accessible via the config struct. You can either override the values or adapt them in the get_breakfast_demo_config.m files. - Number of states: config.defnumstates = [ -1 = median length of frames, -2 = median linear divided by 10, > 0 fix number of states ] - Number of Gaussians: config.numberOfMixtures = [ 1 ... n ] - Normalization: config.normalization = 'none' = no normalization; 'full' = scales all values in the sequence from [0 .. 1]; 'frame' = scales all values of a frame from [0 .. 1]; 'std' = standard score (or Z-score) over all values in the sequence, mean = 0 and standard deviation = 1 'std_frame' = standard score (or Z-score) for each frame, mean = 0 and standard deviation = 1 - Read the htk book. You can find a lot of usefull hints how to interpret the results and adapt the system to different data corpora and domains. -------------------------------------- Part IV : File input and output description -------------------------------------- If you want to run HTK with different input data or on a new dataset, you will have to provide the input data and, if you want to try a different dataset, the segmentation information for each video, a related grammar and a dictionary. 1) Input files The input files are plain ascii txt-files. Each line contains the input vector of one frame. The first line is zeros, and the first entry of each line is the frame number. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 . . . The additional header tags are not used and can be omitted if necessary. The child node 'MotionLabel' comprises the relevant information: - 'name' is the name of the unit and has to be identical to the unit names listed in the dictionary and grammar. - 'startPoint' is the first frame of the unit - 'endPoint' is the last frame of the unit. If no end point is set, it is assumed that the unit ends at the last frame before the beginning of the next unit. 3) Dictionary and grammar If you want to change or write your own dictionary and grammar, please follow the instructions in the HTK book and change the source links in the config.