A simple data loader for the KillrVideo videos table, written in C# (.NET 9.0). This loader uses version 3.22.0 of the CassandraCSharpDriver to write to Astra DB, and generates vector emebeddings using the ibm-granite/granite-embedding-30m-english model from HuggingFace.
- Dotnet 9+ runtime.
- A DataStax Astra DB serverless database – get a free account.
- The videos.csv file from the killrvideo/killrvideo-data repository.
- A HuggingFace API key.
# clone
git clone git@github.com:KillrVideo/kv-dataloader-csharp.git
cd kv-dataloader-csharp
# build and install dependencies
dotnet buildDatabase schema:
- Assumes that Astra DB has a keyspace named
killrvideo. (create it if it does not exist) - Create the
videostable from the CQL file in the killrvideo/killrvideo-data: https://github.com/KillrVideo/killrvideo-data/blob/master/schema-astra.cql For reference, the table and its required index on the vector column can be seen here:
CREATE TABLE killrvideo.videos (
videoid uuid PRIMARY KEY,
added_date timestamp,
category text,
content_features vector<float, 384>,
content_rating text,
description text,
language text,
location text,
location_type int,
name text,
preview_image_location text,
tags set<text>,
userid uuid,
views int,
youtube_id text);
CREATE CUSTOM INDEX videos_content_features_idx ON killrvideo.videos (content_features) USING 'StorageAttachedIndex' WITH OPTIONS = {'similarity_function': 'COSINE'};Environment variables (via export):
| Variable | Description |
|---|---|
ASTRA_DB_SECURE_BUNDLE_LOCATION |
Downloaded from the Astra UI once you have created your database |
ASTRA_DB_APPLICATION_TOKEN |
Token created in Astra UI |
ASTRA_DB_KEYSPACE |
killrvideo |
dotnet build
dotnet runOr simply...
dotnet run