Vectors from llm

Citation Author(s):
Maksim
Pokrovskiy
Submitted by:
Maksim Pokrovskiy
Last updated:
Thu, 10/17/2024 - 10:53
DOI:
10.21227/t44r-9011
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Here i got parsed literature site https://avidreaders.ru for about 10.000.000 sentences from russian books and make sentence vector embeddings from them using Mistral open API.

Embeddings got resized from 1024 to 256 dimensions using python scikit-learn PCA method.

Word embeddings are a way of representing words as vectors in a multi-dimensional space, where the distance and direction between vectors reflect the similarity and relationships among the corresponding words.

Mistral AI is a French company specializing in artificial intelligence (AI) products. Founded in April 2023 by former employees of Meta Platforms and Google DeepMind,[1] the company has quickly risen to prominence in the AI sector.

Instructions: 

All information is in "ReadMe.txt".

Comments

Update

Submitted by Maksim Pokrovskiy on Mon, 10/14/2024 - 02:25

Dataset Files

LOGIN TO ACCESS DATASET FILES

Documentation

AttachmentSize
File ReadMe.txt389 bytes