Image annotation is to build a ML model to auto annotate image using predefined tag set. It is a multiple label problem. In 2006, I start working image annotation and content understanding. Also attend benchmark evaluation on image retrieval (CLEF & TREC-VIDEO). Refer to https://aisengtech.com/project#image-tag.
Music summary is to extract a short clip from music recording to represent music content, which is used to engage consumer to buy music recording. A simple way is to use the beginning of audio. But it may not characterize the most engaging part of the music. I developed a music structure analysis and repeated pattern identification algorithm. The repeated pattern or segment may reflect the most engaging content in the recording, which is used as music summary. Refer to https://aisengtech.com/project#music-summary.
Objective function is the mathematical formulation of how to estimate classifier parameters. The classical objective function is derived from maximal log-likelihood function on training samples for the proposed classifier. Classifier parameters are estimated by solving the objective function. But log-likelihood is not directly related to performance metric, e.g. training on likelihood, and preferred evaluation metric maybe F1, accuracy or ranking. This criteria gap between training and evaluating causes the classifier trained on log-likelihood is not optimal for F1 , classification error or ranking. This is the intention of our work on MFoM based classifier learning. Updated the work on https://aisengtech.com/project#mfom. Hereafter MFoM, there are many research papers on learning classifier for specified metric in research community, in which learn-to-rank is most famous, and learn-to-rank is now a core module for modern search engine.
In around 2013, music search is becoming hot application in internet industry with the increasing coverage of mobile phone. Its intention is to provide music / song search experience using a music clip recorded by mobile phone anywhere anytime. Its challenges are robust (diverse noise e.g. town hall, road, audio edit, pitch shift) & compact audio fingerprint extraction and quick response to support real-time search. Fortunately I developed audio landmark binary feature as fingerprint and inverted document index framework for audio search (C++). Interests to learn more, please refer to https://aisengtech.com/project#speech-recognition