Master shot reference and thresholds



Hi All,

M a s t e r   s h o t   r e f e r e n c e:

After some reflection and discussion we've replaced the initial
master shot list with a new one - the seg-60 segmentation -
provided by Georges Quenot. Our thanks to him and to Jiamen Ye at
DCU for the quick turnaround. This change was to address the 
overabundance of very short shots in the first version.

In making this change, we're assuming this year's data is similar 
in segmentation to last year's and that we stick with the CLIPS-
IMAG segmentation because it is available and whatever weaknesses 
it might have are not going to be debilitating. We've compared the 
histograms (appended) and chosen the seg-60 segmentation for these 
reasons:

- None differs much beyond the 40 sec bucket

- All have more in the 10-20 sec range than expected
  compared with last year's manual segmentation (10%)

- Seg-60 has a bulge in the 2-3 sec bucket which was 
  the mode of last year's distribution.

- Seg-30 has too many in the 1-2 sec bucket

- Seg-90 shifts the lower-end bulge too high and lumps
  more into the 10-20 sec bucket.

Part of the discussion at TREC is always about evaluating the
evaluation. That will be the time for further discussion of
the master shot reference, whether it was on balance useful,
how various characteristics of particular one used may have
affected the search and feature extraction evaluation, whether
investing more effort in it earlier could be useful, whether
an arbitrary segmentation might have worked as well, etc.


M i n i m u m   s h o t   l e n g t h s:

We've decided not to impose any (minimum) threshold for
feature extraction or search, but rather to have systems build 
those considerations into their rankings. For example, rank short 
shots low or don't include them when it comes to the monologue 
feature, etc. It's analogous to not including a specific size 
requirement for the face feature. If your system can find it and 
the assessor agrees, your system wins.

We'll work on the limitation we had with the Real player but
reserve the right to set a minimum across-the-board threshold on 
what we evaluate manually.


Thanks for your patience,
Paul

----------------------------------------------------------------

Seg-1

1-2:    3774  10.66%
2-3:    2999   8.47%
3-4:    2516   7.11%
4-5:    2226   6.29%
5-6:    1908   5.39%
6-7:    1693   4.78%
7-8:    1401   3.96%
8-9:    1168   3.30%
9-10:    994   2.81%
10-20:  5178  14.63%
20-30:  1553   4.39%
30-40:   611   1.73%
40-50:   271   0.77%
50-60:   146   0.41%
60-70:    61   0.17%
70-80:    43   0.12%
80-90:    18   0.05%
90-100:   10   0.03%
100-200:  13   0.04%

Seg-15

1-2:    3906  12.77%
2-3:    3035   9.92%
3-4:    2540   8.30%
4-5:    2225   7.27%
5-6:    1924   6.29%
6-7:    1709   5.59%
7-8:    1392   4.55%
8-9:    1171   3.83%
9-10:    999   3.26%
10-20:  5184  16.94%
20-30:  1557   5.09%
30-40:   611   2.00%
40-50:   271   0.89%
50-60:   145   0.47%
60-70:    62   0.20%
70-80:    43   0.14%
80-90:    18   0.06%
90-100:   10   0.03%
100-200:  13   0.04%

Seg-30

1-2:    4137  14.95%
2-3:    3154  11.40%
3-4:    2612   9.44%
4-5:    2219   8.02%
5-6:    1961   7.09%
6-7:    1734   6.27%
7-8:    1406   5.08%
8-9:    1177   4.25%
9-10:   1021   3.69%
10-20:  5207  18.82%
20-30:  1570   5.67%
30-40:   615   2.22%
40-50:   270   0.98%
50-60:   144   0.52%
60-70:    64   0.23%
70-80:    43   0.16%
80-90:    18   0.07%
90-100:   10   0.04%
100-200:  13   0.05%

Seg-60

1-2:     189   0.78%
2-3:    3223  13.28%
3-4:    2796  11.52%
4-5:    2389   9.85%
5-6:    2039   8.40%
6-7:    1788   7.37%
7-8:    1458   6.01%
8-9:    1226   5.05%
9-10:   1045   4.31%
10-20:  5329  21.96%
20-30:  1589   6.55%
30-40:   622   2.56%
40-50:   272   1.12%
50-60:   150   0.62%
60-70:    63   0.26%
70-80:    41   0.17%
80-90:    20   0.08%
90-100:   10   0.04%
100-200:  13   0.05%

Seg-90

1-2:       0   0.00%
2-3:     115   0.53%
3-4:    2617  12.11%
4-5:    2382  11.02%
5-6:    2193  10.14%
6-7:    1950   9.02%
7-8:    1581   7.31%
8-9:    1295   5.99%
9-10:   1079   4.99%
10-20:  5561  25.72%
20-30:  1632   7.55%
30-40:   638   2.95%
40-50:   276   1.28%
50-60:   147   0.68%
60-70:    67   0.31%
70-80:    43   0.20%
80-90:    18   0.08%
90-100:   11   0.05%
100-200:  13   0.06%
-- 
Paul Over - Retrieval Group
	    Information Access Division
	    Information Technology Laboratory
	    National Institute of Standards and Technology
	    Bldg. 225  Rm. A211  (Mailstop 8940)
	    Gaithersburg, MD  20899-8940   USA
	    Voice: 301 975-6784    Fax: 301 975-5287



Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov