In modern society, many of us have multiple devices,such as desktop pc,mobile phone,pad,et at and are surfing the internet using different devices. Some may surf using different browsers such as Google, Bing, Baidu, Firefox, ……. Cookies in different devices and browsers are unique. For example, I read news paper in CNA using chrome and have assigned a cookie ID to me. Next time, I read news in CNA using Firefox using same device, and is assigned another cookie id. Then from cookie id, I have 2 different identities, i.e. different person. Now the question is, can we identify the the two cookie ids referring to one person?
Simple way is to ask users to create an account and login every time they consume contents in your platform. Thus, the user account can link different cookie ids among different devices or browsers.
But if users do not want login, have a way to do it? Technically, it can. This is called device graph, i.e. based on available cookie data, link different cookie ids to one unique identify, referring to one person.
Cookie records all the behaviors in the surfing, e.g. dwelling time, visiting url, click, ip address, device types (e.g. iphone, samsung S20, huawei P40, PC, …), browser type (e.g. chrome, firefox, safari, bing, baidu). It looks like
- Cookie-id = 1234, IP:0.0.0., visiting url: wordpress.com at 21:00:00 20210908, browser: firefox, device: pc, …
For each cookie-id, we can aggregate behaviors in the history with different window, e.g. last 30 days, last 60 days, last 90 days, and extract a high-dimensional feature to characterize the cookie-id.
Then it needs to group similar cookie-ids into a cluster. From the view of pattern recognition and machine learning, it is a supervised cluster problem. If there are some login user available, these login users can give us some golden answers about which cookies must be in the same cluster. Thus, it is becoming a semi-supervised clustering problem.
Clustering can also be viewed from graph theory. We can calculate similarity score between any pair of cookies to measure the probability of the pair cookie that is from the same identity (Only need to keep high probability candidate). Then a cookie graph is built, node being cookie-id, and edge being weight to measure link strength. Now any graph cut algorithm can be exploited to solve the clustering problem. Graph cut can identify sub-graph in which all cookie ids is identified as the same identity.
Device graph is very useful technology in ads targeting, personalized recommendation.