Search

Results

Cross-Attention in Transformer Architecture

[https://vaclavkosar.com/ml/cross-attention-in-transformer-architecture] - 2024-09-02 23:49:39 - public:isaac

ai, attention, transformer, tutorial - 4 | id:1509355 -

Merge two embedding sequences regardless of modality, e.g., image with text in Stable Diffusion U-Net with encoder-decoder attention.

Optimum

[https://huggingface.co/docs/optimum/index] - 2024-03-11 19:44:39 - public:mzimmerm

ai, doc, huggingface, llm, model, optimum, repo, small, transformer - 9 | id:1489894 -

Optimum is an extension of Transformers that provides a set of performance optimization tools to train and run models on targeted hardware with maximum efficiency. It is also the repository of small, mini, tiny models.

google-research/bert: TensorFlow code and pre-trained models for BERT

[https://github.com/google-research/bert/] - 2024-03-11 04:44:09 - public:mzimmerm

ai, bert, github, home, llm, mini, model, tiny, transformer - 9 | id:1489883 -

BERT model home on github

BERT Transformers – How Do They Work? | Exxact Blog

[https://www.exxactcorp.com/blog/Deep-Learning/how-do-bert-transformers-work] - 2024-03-11 04:39:00 - public:mzimmerm

ai, bert, doc, good, llm, parameter, progress, todo, transformer - 9 | id:1489882 -

Excellent document about BERT transformers / models and their parameters: - L=number of layers. - H=size of the hidden layer = number of vectors for each word in the sentence. - A = Number of self-attention heads - Total parameters.

Solving Transformer by Hand: A Step-by-Step Math Example | by Fareed Khan | Level Up Coding

[https://levelup.gitconnected.com/understanding-transformers-from-start-to-end-a-step-by-step-math-example-16d4e64e6eb1] - 2024-03-06 00:46:01 - public:mzimmerm

yabs.io

Yet Another Bookmarks Service