AI, Tech & Life

Fix error when install ta-lib:

Posted on May 7, 2024 by sheng gao Leave a comment

ta-lib is a popular open-source to calculatte technical analysis indicators in finace anaylysis. But when you install

pip installl ta-lib

The following error occurs

ERROR: Could not build wheels for ta-lib, which is required to install pyproject.toml-based projects

The reason is that ta-lib python wrapper depending on the ta-lib c++ library. You need manually install. So how to install it?

Download ta-lib c/c++ source code ta-lib-0.4.0-src.tar.gz (Do not download from github c/c++ source code, which misses configure. Then following the steps to install. You will be successfull (Test in Apple Mac M1 and Ubuntu).

unzip
cd ta-lib
./configure --prefix=/usr/local
make
sudo make install
pip install ta-lib

Some initial comparison: ta-lib-python vs ta ta

ta is easy to install. But it supports only 43 technical indicators. As comparison, ta-lib supports 200 indicators.

fintech

Competitive intelligence in e-commerce: all about product pricing and assortment

Posted on May 5, 2024 by sheng gao Leave a comment

In the current e-commerce market, competitio is brutal, particularly in southeast asia market, where the resident income is very low. Low price is at the top priority in online shopping. As more and more players join the battle, as the online platform operator, it is critical to monitor the product price comparing with the competitor’s platform in realtime, and make sure product price is lower than their competitors.

As the customers, price hunting is the normal behavior, which they compare price in different platform for the same or similar products and search the best deal. However, the method is not suitable for the platform, because the platform has very large scale product base, millions / billions of product. Everyday millions of new products are listing. Even the business only operate a few popular items. Thousands products are normal. The other hurdle is language. The operators cannot understand every language. When considering realtime monitoring, it is impossible for mannual.

So the e-commerce company develop the algorithm driven same product match system to address the above issues. The system includes many modules in order to determine which product pairs are same and which products cannot find the matched items in the competitor.

The basic problem in the competitive intelligence in the e-commerce is to identify whether a pair of products is same product or not based on the product title, main image, attributes, and description. It is a binary classification problem, but in practice there are not annotated samples available to train the classifier at the cold-start stage. In addition, the product volume is very large, e.g. millions or billions of products in the platform. Thus it is impossible to score all possibile pairs.

One possible solution is to leverage low-cost KNN to extract a set of high probability candidate pairs and then use fine-grained model to re-rank. The overall algorithm processing flow is similar to the search: build indexing, coarse recall, fusion, and re-ranking. A possible framework of the competitive intelligence system is depicted in the following figure.

Data source

Design the guildline of product match or not

In order to develop same product match system, the first step is to define what is the meaning of same product. The definition is balanced among business requirement, algorithm feasibility and explainability. The guideline book will be iterated and updated based on the feedback and bad case analysis in the product development untill it is finalized. The guidline book will include many rules with examples to define in which condition the product pair is same and in which condition it is not. The examples should cover different cases such as covering as many level-1 categories as possible. The guideline is the base of matching performance evaluation and the human annotation on the pair of product.

Prepare product base pool

In the product match, two produc base pool are needed, which are business related in most times. For example, if we want to monitor product prices between our own platform and our competitors, one product pool is extracted from our own platform, while another is from the competitor platform (buy data or crawl). If we want to know how the price distribution of the same products in our own platform, both product pools are from our own platform (but product selection logic maybe different based on business.) .

Algorithm flow of product match

In developing product match system, the various features are extracted to represent the product content:

Title: product title carries most informative information about the product, e.g. product model, color, size, ……
Main image: normally there are multiple images available for an item. If possible, try to use as many as possible to improve matching performance.
Attribute: the attributes have multiple categories and values such as cloth size (S, M, XL …), color (black, green …). The attributes and values cotain many noises, i.e. it is not formal defined even in a platform, let alone in the different platforms. Thus de-noise is a must step to normalize them

Text content representation

In order to measure the text similarity score, it needs to represent the text as a spare or dense vector.

TF-IDF: classical text representation using term frequency and inverted document frequency, popular methods such as BM25. In product match, it achives high precision with low recall (semantc gap)

Dense vector: there are a lot of candidates available such as fasttext, bert, xrobert, ….

Image content representation

There are many pretrained deep learning models available such as ResNet, ViT, …..

Multimodality content representation

The text-image multimodality models are CLIP, ….

Coarse recall

In the stage, the object is to use low-cost KNN to generate possible product candidates for any given product input. The vector based indexing is first built on one product pool while quering the index using the product in the other pool. There are many vector based indexing algorithms such as FAISS.

For bag-of-word based feature (TF-IDF), elasticsearch is a good choice.

After the corase recall, multiple ranking lists are oputput together with similarity score.

Fusion and re-ranking

In order to generate the final match/not-match decision on the product pair, we need to combine multiple ranking list into one. If there are not labelled training data available (human annotated match/not-match product pair), heuristic method is often used for fusion. If training data availabe, the binary classifier can be trained to combine the multiple list into one.

Output

The final output of the product match system maybe includes:

Same product pair: in the case, the price comparison result can be tagged, e.g. same price, price lower than competitor, price higher than the competitor.
Produc which cannot find any same products: these outputs are also informative. For example, from these products, we can understand which products sold in the competitor but not sold in our platform. It will remind business operators to find supplier for these products.

Some tips in implementation

Cache: use cache to record which pairs are already computed and reduce duplicated computation.
- Cache update: check if the product title or main image are changed. If yes, need trigger to re-calculate
Fine-tune in-house models: text, image, and multimodality can significantlly improve F1 or AUC. This is because the e-commerce product title and image is quite from the training data in the pretrained models in the style. The pre-trained models cannot provide sufficient discrimination capability in calculating similarity score.
Discriminative training (e.g. MoCo based) on DNN model can further improve performance metric in product match.

The product match output can support many downstream business applcations such as product pricing to improve price competition, assortment operation, traffic boosting, ……

Last step: colse loop to iteration

In the period of data usage in the business operators, there will be many feedbacks on the algorithm matching quality. Let us design a feedback module to allow the users to thumb up or down on the results, and allow them correct the wrong cases. These feedback as the human annonated product pairs can be further used to fine-tune the algorithm matching module, and dinamically measures the system performance.

Copyright:

Please request permission before copy the content.

Role-play AI agent

Posted on April 19, 2024 by sheng gao Leave a comment

AI agent is amazing, promising, and useful in many business scenaria. With the increasing capability of large language model such as ChatGPT-4 and opensource models such as Llama3, it beccomes easy to develop LLM based agent to help the human to do tedious work. For example, it can replace coacher with AI agent to train the junior salesman, while the latter can play with the AI agent to improve their pitch skill. What you do is to select suitable opensource tools and models and to write promts, i.e. prompt engineering, for each application scenario. The following image is the screenshot of the role-agent demo what I build.

AI agent is salesman

Human user is a buyer

Ai agent is buyer

Human user is salesman

What you need to build the demo:

Streamlit: UI design
Ollama: manage open-source LLM.
LangChain: build LLM based chat pipeline
Prompt engineering: according to the task and role, write prompt.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31