No description
  • Rust 99%
  • Makefile 0.9%
Find a file
2026-07-02 01:40:06 +02:00
.cargo Add Unix-specific permission and ownership handling: 2026-05-23 18:16:47 +02:00
.idea General cleanup 2026-06-05 15:21:04 +02:00
src Update TS scheduler comment to reflect switch from earliest deadline to longest processing time algorithm 2026-07-02 01:39:13 +02:00
.gitignore Logical and memeory improvemnts 2026-06-04 21:07:45 +02:00
.todorc.yaml Added documentation and some minor optimazations. 2026-05-31 01:43:26 +02:00
Cargo.toml Bumped version to 1.3.0. Marking an official fully functional release 2026-06-23 13:58:41 +02:00
codebook.toml Update TS scheduler to accurately describe the scheduling algorithm for what it is aka an LPT algorithm; revise related comments, documentation, and README 2026-07-02 01:15:12 +02:00
drop_caches.sh Added some compiler annotations and clarified some comments 2026-06-18 01:07:45 +02:00
heimdall-logo.jpg Upload files to "/" 2025-12-07 02:39:13 +01:00
Makefile Added documentation and some minor optimazations. 2026-05-31 01:43:26 +02:00
README.md Update TS scheduler to accurately describe the scheduling algorithm for what it is aka an LPT algorithm; revise related comments, documentation, and README 2026-07-02 01:15:12 +02:00

Bifrost Logo

Bifrost: A multi-threaded copying program

Bifrost is a multi-threaded data copying tool built on the Tokio runtime, designed to maximize I/O throughput. It uses multiple schedulers to group and distribute data transfers across threads, each with different strategies for ordering and batching workloads. The goal is to minimize overhead and improve performance in high-concurrency scenarios.

Note: The program was renamed from heimdall to bifrost.


Meet the schedulers 📋⚙️:

CFS: Completely Fair Scheduler (implemented)

This is the default scheduler. It distributes files across threads by grouping them into workgroups, then assigning one workgroup per thread. Groups are formed by finding subsets of files whose combined size approaches a target value k, defined as the average file size across all inputs. This ensures each workgroup carries a roughly equal total load. Files that cannot be cleanly grouped — either because they are too large or have missing metadata — are treated as outliers and distributed evenly across the existing groups to maintain balance.

OS: Ordering Scheduler (implemented)

This scheduler clusters files by size using the K-Means algorithm, allowing users to specify the number of clusters (K) for optimal load balancing. Recommended for directories with many files. It balances workloads across cores, reducing idle time and improving parallel processing efficiency. While clustering adds a small initial overhead, it minimizes scheduling conflicts and maximizes throughput—especially for data-intensive tasks. For best results, set K to a multiple of the available cores (K = number of cores × n), ensuring balanced workload distribution and scalability. K by default is set to the number of cores of your CPU.

TS: Temporal Scheduler (implemented)

This scheduler optimizes group allocations using statistical analysis. Recommended for directories with a very, very large number of files. It samples file sizes from the provided dataset using reservoir sampling and trains a linear regression model to determine the line of best fit. This approach makes scheduling semi-autonomous: users control the number of groups by setting the number of samples, while group sizes may vary dynamically. After predicting time estimates, the scheduler uses the LPT (Longest Processing Time) algorithm. The goal is to minimize the total makespan (the total time it takes for all jobs to finish) for each group, ensuring efficient and balanced workload distribution.


Notes:

  • Be careful when using Bifrost with Network destination, I haven't tested it yet, but most of them are not really optimized for parallel copying or can significantly saturate your Network stack potentially causing issues.
  • All the Bifrost tests have been done on SSDs: DRAM and DRAM-less. In both cases the program performs well. I don't know how the program will perform on HDDs, due to speed limitation and caching.

Planned features:

  • TS scheduler: Based on LPT algorithm with time predictions done using linear regression.
  • TS scheduler automatic disk probing: In the future TS scheduler will automatically discover disk read and write speeds.
  • Archival Mode: when a flag specified the creation of the Destination directory will be skipped and only the new files in Source will be copied over, so essentially set minus: Files To Be Copied = Source - Destination. This mode will require the Destination to exist and the top level name to be the same.
  • File Chunking: large files that can bottleneck the copying process will be split into smaller chunks and mixed into the schedule; they will be recreated on the other side.
  • Multi-Threaded directory analysis: possibility to spawn multiple walkers to analyze the directory in parallel

Maybe in the future...

  • Bifrost SCP server: over the network copying via SSH.