Vectors from llm

Name: Vectors from llm
Creator: Maksim Pokrovskiy
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Education and Learning Technologies

Citation Author(s):: Maksim Pokrovskiy
Submitted by:: Maksim Pokrovskiy
Last updated:: Thu, 10/17/2024 - 14:53
DOI:: 10.21227/t44r-9011

71 views

Categories:

Education and Learning Technologies

Keywords:

nearest neighbour search

Dataset

text vector embeddings

ACCESS DATASET CITE

Abstract

Here i got parsed literature site https://avidreaders.ru for about 10.000.000 sentences from russian books and make sentence vector embeddings from them using Mistral open API.

Embeddings got resized from 1024 to 256 dimensions using python scikit-learn PCA method.

Word embeddings are a way of representing words as vectors in a multi-dimensional space, where the distance and direction between vectors reflect the similarity and relationships among the corresponding words.

Mistral AI is a French company specializing in artificial intelligence (AI) products. Founded in April 2023 by former employees of Meta Platforms and Google DeepMind,[1] the company has quickly risen to prominence in the AI sector.