Jul 21, 2023
Also there are studies that suggest that while pre-trained LLMs such as BERT and GPT can produce transferrable text representations, they are not ideal for tasks such as retrieval and text matching where a single vector embedding of texts is more desired (which is what we want in this case). Can refer to this: https://arxiv.org/pdf/2212.03533.pdf
I would also refer to the MTEB leaderboard to get a sense of the leading open-source embeddings models out there, since they are trained in a specific way