Fine-tuning NER vs LLM prompt for asset ticker extraction in finance UGC - AI Consultant | Machine Learning Solutions

With the increasing multiple-task capability of LLM model, it makes it easy to do specific ML task by prompting LLM to focus on specified task, such as sentiment classification, name entity recognition, document analysis, OCR, et al. Before LLM, it needs to collect domain data and train particular ML model to complete these tasks. Easily and quickly developing product by calling LLM API such as openapi, azure, google gemini, does not mean the performance match requirement in practice. Only depending on prompt engineering cannot handle wrong cases and consistently improve performance. Let alone hallucination in LLM.

In the post, I study how LLM works in asset ticker (stock ticker, crypto ticker, forex code) extraction from user message in the AI based fintech product. I will compare prompt engineering based LLM (commercial API) with specific NER model, GLiNER, which demonstrates good performance in many tasks, and fine-tuned version using internal data.

We collect a few ten thousands of user messages and LLM extracted tickers, which is treated as a ground-truth data when fine-tuning GLiNER model.

How LLM perform in finance UGC?

Analysis on message containing tickers

Randomly select 50 samples which LLM find at least one ticker and manually check and annotate truth tickers. Compared with human label, LLM only has F1 62% with recall only about 52%.

Even in the small randomly selected set, there are a few hallucination cases found, i.e. LLM output tickers that does not exist in the user message, e.g. "provide an in-the-money options trade for one of the companies (UnitedHealth Group Inc, AbbVie Inc) with an expiration date of xxx", which LLM output tickers such as "UNH, ABBV".

Analysis on message not containing tickers

Randomly select 50 samples which LLM cannot find any ticker and manually check and annotate. It is found there is 14 messages which actually contain tickers, but LLM misses.

NER task-specific GLiNER

Based on our internal user messages and LLM generated tickers, randomly samples about 10K balanced samples (messages with/without tickers) to study performance. We also samples a few ten thousands of messages with tickers to fine-tune GLiNER. Based on these train data & test set, we compare F1 on asset ticker aamong

Pretrained model (small & large): it is found small model works better than large model in out data set

Fine-tuned pretrained model with full layers (small)

Fine-tuned pretrained model with frozen layers (small)

Fine-tuned pretrained model with frozen layers (small) plus valid ticker filters

Here is a F1 metric summary.

GLiNER Model	F1
Pretrained small	43%
Pretrained large	18%
Fine-tuned small (full layers)	79%
Fine-tuned small with frozen layers	85%
Fine-tuned small with frozen layers + valid ticker filter	91%

Conclusion

Fine-tune pretrained model using domain data always better than pretrained, which is normal and common practice

When training data is not BIG, only train selected layers (normally these layers in the output or closest. Embedding layer as a feature extraction layer, normally we frozen) is better. It is also a common practice when fine-tuning.

Analysis on the result show many wrong extracted tickers, which are actually not valid based on domain knowledge, thus using domain knowledge as filter will improve precision

How GLiNER works on messages of LLM without tickers

Using above 50 samples where LLM without empty, it is found GLiNer achieves F1 59% with recall >90%. Thus GLiNER will increase recall as expected.

More importantly, Small GLiNER can deploy in CPU-only service. It can improve response latency (calling commercial LLM API, e.g. openAPI may need a few seconds), no hallucination, good data privacy, and performance.

FEATURED TAGS

computer program javascript nvm node.js Pipenv Python 美食 AI artifical intelligence Machine learning data science digital optimiser user profile Cooking cycling green railway feature spot 景点 e-commerce work technology F1 中秋节 dog setting sun sql photograph Alexandra canal flowers bee greenway corridors programming C++ passion fruit sentosa Marina bay sands pigeon squirrel Pandan reservoir rain otter Christmas orchard road PostgreSQL fintech sunset thean hou temple in sungai lembing 海上日出 SQL optimization pieces of memory 回忆 garden festival ta-lib backtrader chatGPT generative AI stable diffusion webui draw.io streamlit LLM speech recognition AI goverance prompt engineering fastapi stock trading artificial-intelligence Tariffs AI coding AI agent FastAPI 人工智能 Tesla AI5 AI6 FSD AI Safety AI governance LLM risk management Vertical AI Insight by LLM LLM evaluation AI safety enterprise AI security AI Governance Privacy & Data Protection Compliance Microsoft Scale AI Claude Anthropic 新加坡传统早餐咖啡 Coffee Singapore traditional coffee breakfast Quantitative Assessment Oracle OpenAI Market Analysis Dot-Com Era AI Era Rise and fall of U.S. High-Tech Companies Technology innovation Sun Microsystems Bell Lab Agentic AI McKinsey report Dot.com era AI era Speech recognition Natural language processing ChatGPT Meta Privacy Google PayPal Edge AI Enterprise AI Nvdia AI cluster COE Singapore Shadow AI AI Goverance & risk Tiny Hopping Robot Robot Materials SCIGEN RL environments Reinforcement learning Continuous learning Google play store AI strategy Model Minimalism Fine-tuning smaller models LLM inference Closed models Open models Privacy trade-off MIT Innovations Federal Reserve Rate Cut Mortgage Interest Rates Credit Card Debt Management Nvidia SOC automation Investor Sentiment Enterprise AI adoption AI Innovation AI Agents AI Infrastructure Humanoid robots AI benchmarks AI productivity Generative AI Workslop Federal Reserve AI automation Multimodal AI Google AI AI agents AI integration Market Volatility Government Shutdown Rate-cut odds AI Fine-Tuning LLMOps Frontier Models Hugging Face Multimodal Models Energy Efficiency AI coding assistants AI infrastructure Semiconductors Gold & index inclusion Multimodal Chinese open-source AI AI hardware Semiconductor supply chain Open-Source AI prompt injection LLM security AI spending AI Bubble Quantum Computing Open-source AI AI shopping Multi-agent systems AI research breakthroughs AI in finance Financial regulation Custom AI Chips Solo Founder Success Newsletter Business Models Indie Entrepreneur Growth Apple Claude AI Infrastructure AI chips robotaxi Global expansion AI security embodied AI AI tools IPO artificial intelligence venture capital multimodal AI startup funding AI chatbot AI browser space funding Alibaba quantum computing DeepSeek enterprise AI AI investing tech bubble AI investment prompt injection attacks AI red teaming agentic browsing agentic AI cybersecurity AI search AI boom AI adoption data centre model quantization AI therapy neuro-symbolic AI AI bubble tech valuations sovereign cloud Microsoft Sentinel large language models investment-grade bonds data residency