LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
pip install lekcut
from lekcut import word_tokenize
word_tokenize("ทดสอบการตัดคำ")
# output: ['ทดสอบ', 'การ', 'ตัด', 'คำ']API
word_tokenize(text: str, model: str="deepcut", path: str="default", providers: List[str]=None) -> List[str]Parameters:
text: Text to tokenizemodel: Model to use (default: "deepcut")path: Path to custom model file (default: "default")providers: List of ONNX Runtime execution providers (default: None, which uses default CPU provider)
LEKCut supports GPU acceleration through ONNX Runtime execution providers. To use GPU acceleration:
-
Install ONNX Runtime with GPU support:
pip install onnxruntime-gpu
-
Use the
providersparameter to specify GPU execution:from lekcut import word_tokenize # Use CUDA GPU result = word_tokenize("ทดสอบการตัดคำ", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) # Use TensorRT (if available) result = word_tokenize("ทดสอบการตัดคำ", providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'])
Available Execution Providers:
CPUExecutionProvider- Default CPU executionCUDAExecutionProvider- NVIDIA CUDA GPU accelerationTensorrtExecutionProvider- NVIDIA TensorRT optimizationDmlExecutionProvider- DirectML for Windows GPU- And more (see ONNX Runtime documentation)
Note: The providers are tried in order, and the first available one will be used. Always include CPUExecutionProvider as a fallback.
deepcut- We ported deepcut model from tensorflow.keras to ONNX model. The model and code come from Deepcut's Github. The model is here.
If you have trained your custom model from deepcut or other that LEKCut support, You can load the custom model by path in word_tokenize after porting your model.
- How to train custom model with your dataset by deepcut - Notebook (Needs to update
deepcut/train.pybefore train model)
See notebooks/