What are the three types of Transformers?

Study for the Hugging Face Agent Certification. Prepare with interactive quizzes and multiple-choice questions, complete with explanations and hints. Ace your exam!

Multiple Choice

What are the three types of Transformers?

Explanation:
Transformers come in three practical configurations that describe how the model handles data. An encoder-only Transformer reads the input sequence and converts it into a set of hidden representations. A decoder-only Transformer generates an output sequence token by token, conditioned on previously produced tokens. The full encoder-decoder style, often called a Seq2Seq (Encoder-Decoder) model, uses an encoder to map the input to representations and a decoder to produce the output, with attention letting the decoder focus on relevant input parts during generation. This framing is why the best answer lists Encoders, Decoders, and Seq2Seq (Encoder-Decoder). In practice, you can also see specialized variants like encoder-only setups for classification (e.g., BERT) or decoder-only for language modeling (e.g., GPT), but they are still variations of the same architectural family. Other options mix different model types that aren’t Transformer configurations, such as RNNs, CNNs, GANs, or entirely different families like SVMs, which is why they don’t fit the question.

Transformers come in three practical configurations that describe how the model handles data. An encoder-only Transformer reads the input sequence and converts it into a set of hidden representations. A decoder-only Transformer generates an output sequence token by token, conditioned on previously produced tokens. The full encoder-decoder style, often called a Seq2Seq (Encoder-Decoder) model, uses an encoder to map the input to representations and a decoder to produce the output, with attention letting the decoder focus on relevant input parts during generation. This framing is why the best answer lists Encoders, Decoders, and Seq2Seq (Encoder-Decoder). In practice, you can also see specialized variants like encoder-only setups for classification (e.g., BERT) or decoder-only for language modeling (e.g., GPT), but they are still variations of the same architectural family. Other options mix different model types that aren’t Transformer configurations, such as RNNs, CNNs, GANs, or entirely different families like SVMs, which is why they don’t fit the question.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy