Skip to main content

Datasets

Standard Dataset

Vectors from llm

Citation Author(s):
Maksim Pokrovskiy
Submitted by:
Maksim Pokrovskiy
Last updated:
DOI:
10.21227/t44r-9011
No Ratings Yet

Abstract

Here i got parsed literature site https://avidreaders.ru for about 10.000.000 sentences from russian books and make sentence vector embeddings from them using Mistral open API.

Embeddings got resized from 1024 to 256 dimensions using python scikit-learn PCA method.

Word embeddings are a way of representing words as vectors in a multi-dimensional space, where the distance and direction between vectors reflect the similarity and relationships among the corresponding words.

Mistral AI is a French company specializing in artificial intelligence (AI) products. Founded in April 2023 by former employees of Meta Platforms and Google DeepMind,[1] the company has quickly risen to prominence in the AI sector.

Instructions:

All information is in "ReadMe.txt".