Search
Results
The Best GPUs for Deep Learning in 2023 — An In-depth Analysis
BERT Transformers – How Do They Work? | Exxact Blog
Excellent document about BERT transformers / models and their parameters: - L=number of layers. - H=size of the hidden layer = number of vectors for each word in the sentence. - A = Number of self-attention heads - Total parameters.
6 Ways to Run LLMs Locally (also how to use HuggingFace)
Various methods to run LLM models locally hugging face is only one of them.
deepseek-ai (DeepSeek)
They have the 1.3B version!!! This may be the best to start with Newspeak. Should work train even on huggingcface
deepseek-ai/deepseek-coder-6.7b-instruct · Hugging Face
Another possible model. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
StarCoder: A State-of-the-Art LLM for Code
Article has comparison with other code-LLM models
Fine-tune a pretrained model
Use the Bert model to train on Yelp dataset
How to train a new language model from scratch using Transformers and Tokenizers
Describes how to train a new language (desperanto) model.