Search
Results
Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4
Comparison of efficiency of all LLM models on hugging face
6 Ways to Run LLMs Locally (also how to use HuggingFace)
Various methods to run LLM models locally hugging face is only one of them.
Training Bert on Yelp - Copy of training.ipynb - Colaboratory
(1) Interesting cheap GPU option: Instinct Mi50 : LocalLLaMA
AMD seems to sell these accelerators, which are like video cards.
Ditching CUDA for AMD ROCm for more accessible LLM training and inference. | by Rafael Manzano Masapanta | Medium
Train LLM on AMD APU. In this scenario, we’ll use an APU because most laptops with a Ryzen CPU include an iGPU; specifically, this post should work with iGPUs based on the “GCN 5.0” architecture, or “Vega” for friends. We’ll use an AMD Ryzen 2200G in this post, an entry-level processor equipped with 4C/4T and an integrated GPU.
(1) Most cost effective GPU for local LLMs? : LocalLLaMA
GGML quantized models. They would let you leverage CPU and system RAM, instead of having to rely on a GPU’s. This could save you a fortune, especially if go for some used AMD Epyc platforms. This could be more viable for the larger models, especially the 30B/65B parameters models which would still press or exceed the VRAM on the P40.
Optimizing LLMs for Speed and Memory
7 Steps to Mastering Large Language Models (LLMs) - KDnuggets
A Step-by-Step Guide to Training Your Own Large Language Models (LLMs). | by Sanjay Singh | GoPenAI
GenAI Stack Exchange
7 steps to master large language models (LLMs) | Data Science Dojo
LLM for a new language : MachineLearning
High level how to train a model
Up to date List of LLM Models
OSCAR dataset
The OSCAR project (Open Super-large Crawled Aggregated coRpus) is an Open Source project aiming to provide web-based multilingual resources and datasets for Machine Learning (ML) and Artificial Intelligence (AI) applications.
Newspeak-test-dataset
Dataset is just a zip of files
Introduction to Constructing Your Dataset | Machine Learning | Google for Developers
(2) Are there any tiny (1-3b) models finetuned for coding available in GGUF format? : LocalLLaMA
bigcode (BigCode)
Research community developing various code models, small and big. Models may not be instruct
WizardLM (WizardLM)
deepseek-ai (DeepSeek)
They have the 1.3B version!!! This may be the best to start with Newspeak. Should work train even on huggingcface
DeepSeek
deepseek-ai/deepseek-coder-6.7b-instruct · Hugging Face
Another possible model. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
LLaMA 7B GPU Memory Requirement - Transformers - Hugging Face Forums
With the optimizers of bitsandbytes (like 8 bit AdamW), you would need 2 bytes per parameter, or 14 GB of GPU memory.
stabilityai/stable-code-3b · Hugging Face
Another potential model to use for Newspeak, but it is NOT open source. Adventage: 2.5B params, so should be usable in small GPUs
Large Language Models for Domain-Specific Language Generation: How to Train Your Dragon | by Andreas Mülder | Medium
training a model like Llama with 2.7 billion parameters outperformed a larger model like Vicuna with 13 billion parameters. Especially when considering resource consumption, this might be a good alternative to using a 7B Foundation model instead of a full-blown ChatGPT. The best price-to-performance base model for our use case turned out to be Mistral 7b. The model is compact enough to fit into an affordable GPU with 24GB VRAM and outperforms the other models with 7B parameters.
Can Ai Code Results - a Hugging Face Space by mike-ravkine
Comparison of LLM models for coding
(26) Discord
Openchat Chatbot UI
Online UI to Openchat. This seems really good, open source etc. It uses the LLama2 and Mistral models, according to https://github.com/imoneoi/openchat
openchat/openchat-3.5-0106 · Hugging Face
Open source with lots of information. Uses Multiple undrelying models. Not sure how I would train for it
Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face
The Mixtral model is new, and seems to be good. Click on “Demo“ to test it
StarCoder: A State-of-the-Art LLM for Code
Article has comparison with other code-LLM models
huybery/Awesome-Code-LLM: An awesome and curated list of best code-LLM for research.
Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model
Large language models and the rise of the AI code generators | InfoWorld
Review of LLM specialized for code generation
Large language model - Wikipedia
List of LLM models on Wikipedia
List of datasets for machine-learning research - Wikipedia
stabilityai (Stability AI) - Stable Diffusion running on Huggingface
Chat, models. Not open source, but instruct and relatively small (3B). The 3B instruct may be the best to try on Newspeak.
ChatGPT
Le Chat Mistral
Chat on Mistral. Does well on Python and Smalltalk
Mistral AI - Wikipedia
fast.ai – fast.ai—Making neural nets uncool again
OpenAI Codex - Wikipedia
Model which generates code for Python, Javascript, Go, Shell, Perl, Swifg, Ruby, PHP
HuggingChat
codellama (Code Llama) - Huggingface model for generating programs. Maybe can be used for Newspeak?
Gemini
Gemini chat from Google. Can generate Python and other other code.
Introducing Gemini: Google’s most capable AI model yet
Advanced coding Our first version of Gemini can understand, explain and generate high-quality code in the world’s most popular programming languages, like Python, Java, C++, and Go. Using a specialized version of Gemini, we created a more advanced code generation system, AlphaCode 2,
AI Code Tools: The Ultimate Guide in 2024
AI Code tools : Good summary. Does not talk about which pre-trained model they use. One is gemini (bard) -> alphacode2
Getting Started w/BERT.ipynb - Colaboratory
Jupyter notebook to test Bert
Introduction - Hugging Face NLP Course
Natural Languge processing - full course.
BERT 101 - State Of The Art NLP Model Explained
Best summary of Natural Language Processing and terms - model (a language model - e.g. BertModel, defines encoder and decoder and their properties), transformer (a specific neural network based on attention paper), encoder (series of transformers on input), decoders (series of transformers on output). Bert does NOT use decoder. TensorFlow and PyTorch are possible backends to Transformers (NN). Summary: BERT is a highly complex and advanced language model that helps people automate language understanding.