Vectors from llm

Citation Author(s):
Maksim
Pokrovskiy
Submitted by:
Maksim Pokrovskiy
Last updated:
Mon, 10/14/2024 - 02:31
DOI:
10.21227/t44r-9011
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Here i got parsed one literature site for about 10.000.000 sentences from russian books and make sentence vector embeddings from them using Mistral open API.

Embeddings got resized from 1024 to 256 dimensions using python scikit-learn PCA method.

Word embeddings are a way of representing words as vectors in a multi-dimensional space, where the distance and direction between vectors reflect the similarity and relationships among the corresponding words.

Mistral AI is a French company specializing in artificial intelligence (AI) products. Founded in April 2023 by former employees of Meta Platforms and Google DeepMind,[1] the company has quickly risen to prominence in the AI sector.

Instructions: 

All information is in "ReadMe.txt".

Comments

Update

Submitted by Maksim Pokrovskiy on Mon, 10/14/2024 - 02:25

Dataset Files

LOGIN TO ACCESS DATASET FILES

Documentation

AttachmentSize
File ReadMe.txt389 bytes