Search
Results
The Best GPUs for Deep Learning in 2023 — An In-depth Analysis
BERT Transformers – How Do They Work? | Exxact Blog
[https://www.exxactcorp.com/blog/Deep-Learning/how-do-bert-transformers-work] - - public:mzimmerm
Excellent document about BERT transformers / models and their parameters: - L=number of layers. - H=size of the hidden layer = number of vectors for each word in the sentence. - A = Number of self-attention heads - Total parameters.
google/bert_uncased_L-4_H-256_A-4 · Hugging Face
[https://huggingface.co/google/bert_uncased_L-4_H-256_A-4] - - public:mzimmerm
Repository of all Bert models, including small. Start using this model for testing.
Generative pre-trained transformer - Wikipedia
How to train a new language model from scratch using Transformers and Tokenizers
[https://huggingface.co/blog/how-to-train] - - public:mzimmerm
Describes how to train a new language (desperanto) model.