Face size
- Subject: Face size
- From: Huiping Li <huiping@umiacs.umd.edu>
- Date: Wed, 29 May 2002 13:09:11 -0400
- Content-Type: text/plain; charset="iso-8859-1"
hi, there,
I saw in the website that the face required to be detected for the
evaluation is at least a quarter of frame width and height. This is a
really big face. I browse some video clips and did not find one clip
contains such a big face. maybe we can change it to something like 30x30
pixels ?
- Huiping
-----Original Message-----
From: Ching-yung Lin [mailto:chingyung@us.ibm.com]
Sent: Wednesday, May 29, 2002 12:57 PM
To: Multiple recipients of list
Subject: Re: Video TREC mailing list
Hi,
We met a similar problem. I am working on visual feature analysis, which
needs to consider the relationship between frames. I saw there are many
short shots (< 4 frames), mostly are transitions or oversegmentations
caused by noise. Those short shots may cause problem to us. Therefore, I
think eliminating short shots (<0.5 second) is a good idea.
Regards,
Ching-Yung
=============================================
Ching-Yung Lin, Ph.D. Email: cylin@watson.ibm.com
Research Staff Member
IBM T. J. Watson Research Center
19 Skyline Dr., Hawthorne, NY 10532
Phone: (914)784-7822; Fax: (914)784-7455
Giridharan
Iyengar/Watson/IB To: Multiple recipients
of list <trecvid@nist.gov>
M@IBMUS cc:
Sent by: Subject: Re: Video TREC
mailing list
trecvid@nist.gov
05/29/2002 12:36
PM
Please respond to
trecvid
Paul:
I agree with Roman. This is very much in line with the approach that we are
taking with respect to the audio features. I would like to add that a
monologue should also have such a minimum duration and that 15 video frames
is too small for that. I would suggest that for monologues, the minimum
duration should be 2 seconds (60 frames).
-giri
Giridharan Iyengar
Research Staff Member
Human Language Technologies
IBM TJ Watson Research Center
Yorktown Heights, NY 10598
914 945 2474
(Embedded image moved to file: pic19074.gif)Paul Over <over@nist.gov>
(Embedd (Embedded image moved to file: (Embedded image moved to
ed pic07372.gif) file: pic05664.gif)
image Paul Over
moved <over@nist.g To: Multiple recipients of
to ov> list <trecvid@nist.gov>
file: Sent by: cc:
pic0037 trecvid@nist Subject: Video TREC mailing
6.gif) .gov list
05/29/2002
12:04 PM
Please
respond to
trecvid
Posting for the group at DCU.....
I am currently working on audio feature extraction for video
TREC, and encoutered an issue which is potentially relevant
to others. The common shot boundary files provide a
description of the duration of each shot, which can be as
short as 1 frame. This is roughly 33 ms in time.
I am working on speech and music (instrumental sound)
extraction. To distinguish between music and speech, humans
require several hundred miliseconds of sound. However, the
current shot boundaries are often far shorter than that,
which creates a problem in these applications.
My proposal to solve this problem is to suggest a minimum
shot duration of 15 frames (i.e. 0.5 sec. in time), below
which the shots are not considered to be relevant for audio
feature extraction. In this way the common shot boundary
numbering is preserved, but audio features will not be
extracted from shots with less than 15 frames, as they are
irrelevant from audio perspective.
Roman Jarina
jarinar@eeng.dcu.ie
Centre for Digital Video Processing
Dublin City University
-------------------
PS: I wonder if there should be a minimum shot duration
for the other features too?
- Paul
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov