Machine learning | AI, Tech & Life

Tag Archives: Machine learning

How to build machine learning model to generate trading signal

Posted on September 27, 2022 by sheng gao Leave a comment

Recently I am working on how to build machine learning model to predict signal and build a winning strategy. Technical traders will read the price trending curve and use many technical analysis based signal to facilitate strategy operation such as bollinger band, macd, etc. But it is post analysis based on history, not having prediction power. I use the target stock history price sequence with the selected other security, they work together to extract statistics of price moving in different look-back window. Then I collect a set of feature samples and label the sample as buy, sell, and hold, a 3-category classification problem.

In general, the overall processing flow as follows:

Download selected target stock history price and other selected stocks to enrich the target stock price
Feature engineering, e.g. extract n-look-back day statistics. I do not directly use price as feature because it depends on actual price, sensitive to price scale, and difficult to scale to other tasks
Label sample as buy, hold and sell using selected criterio
Develop machine learning model
- Split data into train, development, and evaluation along the time
- Model training and optimization based on development set
- Predict buy, sell and hold signal in evaluation set. Save prediction to file for following analysis
Based on predict signal, use backtrader, https://www.backtrader.com/, to backtest performance of the machine learning based strategy

Some thinking:

Use extra-security price improving prediction power
Even a little increase in prediction accuracy, e.g. 1%, will see gain increase & sharpe ratio improvement
Feature engineering is very important
Next step:
- strategy needs further improvement.
- post-processing predicted signal to increase stability
- need adding capital management.

I have no finance & trading experience. It is just a personal development to investigate if my ML & system experience can work in the domain. Welcome discussion or leave comments if you are interested.

backtest performce on APPLE.

backtest on TQQQ

Signal to suggest operation next day

Your browsing behavior expose your gender, age & ethnicity

Posted on September 14, 2021 by sheng gao Leave a comment

Gender, age and ethnicity are the basic profile of the user, which are valuable information in recommendation and precise Ads targeting. Unfortunately, the profile feature is often not available among the users surfing the web. One reason is that the users are becoming more concern about their privacy. Even they fill them, they are faked and noisy. In media and publishing companies, selling Ads in their platform is their core business. To engage advertisers, they need put right ads in right place so that right audience is targeted and improve Ads performance, e.g. CTR (click through rate).

For example, it is not good to place women makeup product ads in a news article discussing local flood, or recommend Chinese food to an Indian or Malay.

In order to achieve the goal, the platform must understand the audience profile and their preference and favorite who visit their sites. Based on cookies, data scientists can extract a set of features to profiling the the audience from various dimensions, which is a high dimensional binary vector stored in database. The collected features describe the user activities in the platform from different dimension.

User basic profile
- Gender: male or female
- Age group: in the Ads marketing, the useful is age segment or group rather than actual age. The age group may look like <18, 18-24, 25-34, 35-44, 45-54, 55-64, and 65 and older.
- Ethnicity: In Singapore, there are 4 main ethnicity, i.e. Chinese, Malay, India, and other.
News channel or site visited in the history. It is often counted based on various spanning window,e.g. last 30-day, 60-day, 90-day, 180-day, ……
Radio channel listened in the history
Video channel watched in the history
Topics reading in the history. The topics look like internal news, local news, crime, food & kitchen, sports, electronics, ……. The topics are predefined as content taxonomy. Refer to IAB https://www.iab.com/guidelines/content-taxonomy/ to find complete definition on taxonomy. IAB taxonomy is often modified to add or remove some in order to customize for particular platform.

The above statistics are count and frequency. The next step is to analyze the distribution of each feature dimension and set thresholds to binary the feature. After the process, each user is characterized by a high-dimensional vector, and audience can be analyzed and reported from the various combined segmentation. For example, it can answer how many users read a topic like sports, including gender distribution, age distribution, or ethnicity distribution. This insight analysis can help business to make decision.

In company, the user profile database together report metrics and UI exists as a data product.

As discussed in the beginning, gender, age, and ethnicity are often missing. Thus it needs to build machine learning models to predict them. In terms of pattern recognition and machine learning, gender prediction is a binary classification problem, age group and ethnicity are a multi-class classification problem.

Before building classifiers, training samples, with golden truth, are needed, i.e. given a user, we 100% know it is male or female, age, and ethnicity. These golden users are often costly collected. Based on these golden users, and collecting their browsing behaviors discussed above as classification features, ML models are trained to do prediction for all users with unknown age, gender and ethnicity. In data science, the most important step is data collection, data clean, and feature preparation. The model selection is relatively not so important. Most traditional ML model can complete these prediction tasks.

Learn a metric oriented classifier

Posted on August 29, 2021 by sheng gao Leave a comment

Objective function is the mathematical formulation of how to estimate classifier parameters. The classical objective function is derived from maximal log-likelihood function on training samples for the proposed classifier. Classifier parameters are estimated by solving the objective function. But log-likelihood is not directly related to performance metric, e.g. training on likelihood, and preferred evaluation metric maybe F1, accuracy or ranking. This criteria gap between training and evaluating causes the classifier trained on log-likelihood is not optimal for F1 , classification error or ranking. This is the intention of our work on MFoM based classifier learning. Updated the work on https://aisengtech.com/project#mfom. Hereafter MFoM, there are many research papers on learning classifier for specified metric in research community, in which learn-to-rank is most famous, and learn-to-rank is now a core module for modern search engine.

AI, Tech & Life

涓涓细流，汇成江河

Tag Archives: Machine learning

How to build machine learning model to generate trading signal

Your browsing behavior expose your gender, age & ethnicity

Learn a metric oriented classifier

Buy me a coffee