A CAPTCHA is a type of challenge-response test used in coomputing to determine whether the user is human in order to deter bot attacks and spam. In the captcha test in the website, a captcha image is display to the user, who recognizes the characters and input the correct characters in the reponse box. It is similar to OCR. Here is a captcha example.

From the another view, it is an image-to-text understanding task. With the latest progress of multi-modality (image - text ) transformer models, it is possible to use the pretrained multi-modality model, such as BLIP or other image-text models, to solve the problem. Comparing the traditional machine learning models such as xgboost, logistic regression, svm, it is not a must to collect the training samples to learn these models. Many available pretrained models are available now (Many image-to-text pretrained models are available in Huggingface image-to-text. The following is an sample code
from transformers import pipeline
from PIL import Image
class CaptchaI2T(object):
def __init__(self, ):
self.pipe = pipeline("image-to-text", model="microsoft/trocr-large-printed")
def __call__(self, im_path):
"""
args:
im_path: PIL image
"""
try:
pred2 = self.pipe(im_path)[0]
pred = pred2["generated_text"]
except:
pred = None
return pred
Of course, the pretrained model is not yet perfect. If you want to get higher accuracy, the training data collection is still needed.
-
Previous
How to setup huggingface environment to save models and data in your specified directory -
Next
Llama.cpp,let running LLM in low vRAM gpu smoooth