HMDB: A Large Video Database for Human Motion Recognition
The effort was initiated at KTH: the KTH Dataset contains six types of actions and 100 clips per action category. It was followed by the Weizmann Dataset collected at the Weizmann Institute, which contains ten action categories and nine clips per category.
Above two sets were recorded in controlled and simplified settings. Then the first realistic-action dataset collected from movies and annotated from movie scripts is made in INRIA; the Hollywood Human Actions Set contains 8 types of actions, and the number of clips per action class varies between 60 - 140 per class. Its extended version, Hollywood2 Human Actions Set offers a total of 3669 videos distributed over ten classes of human actions under ten types of scenarios.
The UCF group has also been collecting action datasets, mostly from YouTube. There are UCF Sports featuring 9 types of sports and a total of 182 clips, UCF YouTube containing 11 action classes, and UCF50 contains 50 actions classes. We will show in the paper that videos from YouTube could be very biased by low-level features, meaning low-level features (i.e., color and gist) are more discriminative than mid-level fears (i.e., motion and shape).
| Dataset | Year | # Actions | # Clips per Action |
|---|---|---|---|
| KTH | 2004 | 6 | 10 |
| Weizmann | 2005 | 9 | 9 |
| IXMAS | 2006 | 11 | 33 |
| Hollywood | 2008 | 8 | 30-140 |
| UCF Sports | 2009 | 9 | 14-35 |
| Hollywood2 | 2009 | 12 | 61-278 |
| UCF YouTube | 2009 | 11 | 100 |
| MSR | 2009 | 3 | 14-25 |
| Olympic | 2010 | 16 | 50 |
| UCF50 | 2010 | 50 | min. 100 |
| HMDB51 | 2011 | 51 | min. 101 |