Tag Archives: data science

Cookie – Tracking user behavior & recommendation

Cookie is a short code to tracking user behavior when surfing in the internet, reading news and article, watching video and podcast and audio program. From cookie collected data, we can understand who, which, where and when content clicked and dwelling time. When you google, google cookie will assign a unique identity (UUID) to you, and trace you, similarly when you Baidu, Bing. But the UUID is different in Google, Baidu, Bing because UUID is not cross browser. But when you login different browsers using same Email account, these UUIDs can be linked and identified as a single user.

Different cookie is used to track different user behavior. For example, cookie tracking user surfing news is different from tracking user watching TV program, or listening radio channel. Third-party cookie service is often used in media company to support news recommendation, audio program recommendation, video program recommendation. There are many DSP (data side platform), DMP (data management platform), and SSP (supply side platform) to provide technology services, e.g. cxense, lotame, ……

Media company often requires customized recommendation system. Third-party service provides cookie and widget toolkit to satisfy customized requirement. For news recommendation, through the widget setting, the customer can configure news category, keyword, name entity, term weighting, period, blacklist & whitelist. These functions can satisfy basic business requirements on news recommendation. This is traditional information retrieval application in news, and cannot do personalized news recommendation, which is widely applied in Google, Facebook or Microsoft Bing search. In-house data science team can exploit internal audience data to understand user interests, build machine learning model to do personalized recommendation. In practice, most companies have no such capability.

For audio / podcast and video program recommendation, most of time, it is still treated as a text information retrieval problem. These program have meta text description such as caption, short description of program, editors or reporter names, program director and actor names. Using these available meta data, recommendation can fulfill most business requirements. Audio and video/image processing and content understanding are not widely used. It is not only because of less manpower capability but also because of hungry computing resources to processing audio and image. In terms of ROI (return on investment), they may not be a good investment.

Media company as publisher platform – 1

Media companies, such as SPH, MediaCorp in Singapore, CCTV in China, Washington Post, are publisher platform. They create high-quality content (e.g. audio, video, news) to engage the users. How do they earn money to support their business? One way is to earn money by subscription fee. But for national broadcast company such as MediaCorp, most of their content, such as broadcast Yes 933, Channel News Asia (CNA), Channel 5, Channel 8, Suria, Vasantham, are free. They earn money most from advertiser. In general, media company is a publish platform, bridging the users (audience, content consumer) with the advertisers (marketing their products to consumers).

Publisher platform:

  • Platform [ Create ] high-quality content [Engage] ==> audience <== [ Consume ] product [ Create ] Advertiser

In media companies, reporters, editors, media creators generate creative, original high-quality content (news, audio and video) . Although their volume is relatively smaller comparing with UGC (user generated content) data in internet companies such as Google, Facebook, their quality is high and trusty, which is important to brand products and companies.

Business is core in media company, they prefer to use third-party service to exploit their content and serve audiences and advertisers. However, they also have strong intention to build in-house data science technology to completely mining their gold data and serve their customers. With increasing strict government policy on user privacy and data security, it is impossible to completely explore in-house data to third-party. In some business application scenario, customized solution is preferred, and out-source or third-party service is not satisfied.