In the video track guidelines it is said that the known-item searches will be automatically compared to reference derived from known-items identified in the topics. The measure will be average precision - the mean of the precision obtained after each known item is retrieved, using zero as the precision for known items not retrieved.
However, I don't think we've had a discussion on how to automatically compare search results to known items. When does a returned fragment match the known item? Do we have to have the exact same start and stop times, a certain amount of overlap, or something else? To start this discussion, we made an evaluation script to score known-item searches (it's in perl, see attachment).
In deciding whether a fragment is relevant or not, we only look at its start time. We think, in a realistic setting, someone using a search system on a video archive will not stop watching a fragment just because the system thinks it's no longer relevant. Therefore, we disregard the stop time of the returned fragments.
Now, we think a fragment is relevant if its starting point lies between the start and stop times of one of the known items. In addition, we allow fragments to start just before the start of the known item.
-------------------|----------------------------------|----------
KI-start
KI-stop
<--deltaT-->
|_______________________________________________|
<--------- relevant starting
points ---------->
Feel free to experiment with the script and with other matching functions. Comments on our matching conditions and other suggestions are of course welcome.
Thijs Westerveld
#!/Utils/bin/perl
#
# Name: trecvideval
#
# Author: Thijs Westerveld
# Date: 25/07/2001
#
# To evaluate results on Known Item topics for TREC's video track
#
# usage: cat <results.xml> | trecvideval <topics.xml>
#
# trecvideval takes one or more trecvid topics and processes a set of results.
#
# Both topics and resulst should be in the official video track xml formats:
# http://www-nlpir.nist.gov/projects/t01v/topics/videoTopic.dtd
# http://www-nlpir.nist.gov/projects/t01v/topics/videoTopics.dtd
# http://www-nlpir.nist.gov/projects/t01v/videoSearchResults.dtd
#
# The output of the script is a filtered version of <results.xml>, in which for each topic only the
# relevant results are shown (i.e. the results that match an known item).
#
# In addition for each topic, the average precision is reported (the mean of the precision obtained after each known item is retrieved, using zero as the precision for known items not retrieved).
# Finally, at the end, these precision values are averaged over all known-item topics.
#
# A fragment is regarded relevant if its start point lies between the start and stop times of one of the known items
# In addition, we allow fragments to start just before the known item.
#
#
# ------------------------------|----------------------------------|--------------------------------------
# KI-start KI-stop
#
# <--deltaT-->
# |_______________________________________________|
# <--------- relevant starting points ---------->
#
# The tolerance at the starting point can be varied by changing the value of deltaT.
#
#
# Note: Stop times of fragments are disregarded, because we think someone using a video search system will not
# stop watching a retrieved fragment just because the system thinks it's no longer relevant.
# A user will stop watching either when he/she has found a what was searched for or
# when he/she gets bored and thinks there's nothing there.
#
die "usage: cat <results> | trecvideval <topics.xml>\n" unless ($#ARGV == 0);
$deltaT = 1; # tolerance at starting point (in seconds)
%knownItems=();
$numKItopics=0;
$sumavgprec=0;
open(TOPICS, "$ARGV[0]")|| die "Cannot open $ARGV[0]\n";
# Read known items from topics descriptions
while(<TOPICS>){
if (/videoTopic num/){
($topic) = /num=\"(\d+)\"/;
}
if(/knownItem.*src/){
$knownItems{$topic}.="$_";
}
}
while(<STDIN>){
if(/VideoSearchResult tNum/){
print;
# This is a new Topic, so get KnownItems
@ki=();
($topic)=/tNum=\"(\d+)\"/;
if(exists $knownItems{$topic}){
# If there are known items for this topic, then read them into a list
$i=0;
foreach $item (split (/\n/, $knownItems{$topic})){
$hour=$min=$sec=$milsec=0;
# known item source
($ki[$i]{src}) = ($item =~ m/src=\"([^\"]+)(.mpe?g)?\"/i);
$ki[$i]{src} =~ s/\..*$//;
# known item start time
($start)=($item =~ m/start=\"([^\"]+)\"/);
($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($start =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);
$start=3600*$hour+60*$min+$sec+0.001*$milsec;
$ki[$i]{start} = $start;
# known item stop time
($stop)=($item =~ m/stop=\"([^\"]+)\"/);
($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($stop =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);
$stop=3600*$hour+60*$min+$sec+0.001*$milsec;
$ki[$i]{stop} = $stop;
# next item
$i++;
}
print "<!-- $i known-Items for this Topic -->\n";
# Initialise number of found Known Items and score
$numfound=0;
$score=0;
}
else {
# No known Items for this topic, so clear knownitem list
@ki=();
print "<!-- No known-Items for this Topic -->\n";
}
}
elsif(/item seqNum/){
# This is a result
if (@ki !=() ){
# If we there are known items,
# then extract result info and compare to known items
$result=$_;
$hour=$min=$sec=$milsec=0;
#result rank
($rank) = ($result =~ m/seqNum=\"([^\"]+)\"/);
# result source
($src) = ($result =~ m/src=\"([^\"]+)(.mpe?g)?\"/i);
$src =~ s/\..*$//;
# result start time
($start)=($result =~ m/start=\"([^\"]+)\"/);
($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($start =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);
$start=3600*$hour+60*$min+$sec+0.001*$milsec;
# result stop time
($stop)=($result =~ m/stop=\"([^\"]+)\"/);
($houropt,$hour,$minopt,$min,$secopt,$sec,$milsecopt,$milsec) = ($stop =~ m/((\d+)h)?((\d+)m)?((\d+)s)?((\d+)ms)?/);
$stop=3600*$hour+60*$min+$sec+0.001*$milsec;
# compare to each known item
foreach $i (0...$#ki){
#
# Below is the matching function: compare sources check differences between start (and stop) times
#
if(uc($ki[$i]{src}) eq uc($src) && ($start >= ($ki[$i]{start}-$deltaT)) && ($start < ($ki[$i]{stop}))){
# MATCH
$numfound++;
# Add precision to total score
$score+=$numfound/$rank;
# make sure we don't find the same known item twice
$ki[$i]{src}="thisitemisalreadyfound";
print;
# break;
}
}
}
}
elsif (/\<\/VideoSearchResult\>/){
# end of topic
if (@ki !=() ){
# IF there are know items print average precision for this topic
# avg prec. is mean of precisions after each known item is retrieved (using 0 as prec. for K-I not retrieved
print "<!-- average precision = ",$score/($#ki+1)," -->\n";
#add average prec. to sum over all topics:
$sumavgprec+=$score/($#ki+1);
# ...and increase number of topics for which Known Items exist
$numKItopic++;
}
print;
}
else{
print;
}
}
print "<!-- MEAN AVG. PREC. over $numKItopic Known-Item topics:", $sumavgprec/$numKItopic," -->\n";