Train and use my model
Optimum is an extension of Transformers that provides a set of performance optimization tools to train and run models on targeted hardware with maximum efficiency. It is also the repository of small, mini, tiny models.
Repository of all Bert models, including small. Start using this model for testing.
Comparison of efficiency of all LLM models on hugging face
GGML quantized models. They would let you leverage CPU and system RAM, instead of having to rely on a GPU’s. This could save you a fortune, especially if go for some used AMD Epyc platforms. This could be more viable for the larger models, especially the 30B/65B parameters models which would still press or exceed the VRAM on the P40.
High level how to train a model
Research community developing various code models, small and big. Models may not be instruct
They have the 1.3B version!!! This may be the best to start with Newspeak. Should work train even on huggingcface
Another possible model. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
With the optimizers of bitsandbytes (like 8 bit AdamW), you would need 2 bytes per parameter, or 14 GB of GPU memory.
Another potential model to use for Newspeak, but it is NOT open source. Adventage: 2.5B params, so should be usable in small GPUs
Comparison of LLM models for coding
Open source with lots of information. Uses Multiple undrelying models. Not sure how I would train for it
The Mixtral model is new, and seems to be good. Click on “Demo“ to test it
Article has comparison with other code-LLM models
Review of LLM specialized for code generation
List of LLM models on Wikipedia
Chat, models. Not open source, but instruct and relatively small (3B). The 3B instruct may be the best to try on Newspeak.
Model which generates code for Python, Javascript, Go, Shell, Perl, Swifg, Ruby, PHP
AI Code tools : Good summary. Does not talk about which pre-trained model they use. One is gemini (bard) -> alphacode2
Best summary of Natural Language Processing and terms - model (a language model - e.g. BertModel, defines encoder and decoder and their properties), transformer (a specific neural network based on attention paper), encoder (series of transformers on input), decoders (series of transformers on output). Bert does NOT use decoder. TensorFlow and PyTorch are possible backends to Transformers (NN). Summary: BERT is a highly complex and advanced language model that helps people automate language understanding.
BigCode is an open scientific collaboration working on the responsible development and use of large language models for code
Hi level only talk about training for a language
Describes how to train a new language (desperanto) model.
Look for models that could be used in Newspeak