dev #76

Merged
jakeISAC merged 78 commits from dev into main 2026-06-23 22:31:46 +02:00
Owner

Merging dev as stable to main.

Merging dev as stable to main.
- Implement `process_file_split` with support for `Smart` and `Basic` split strategies.
- Introduce `FileReconstructionData` struct for handling file chunk metadata.
- Add initial stub for `basic_split` function.
- Update `mod.rs` to include the new `file_chunking` module.
- Remove obsolete TODO comment from `path_preprocessor.rs`.
- Extend CLI arguments to include `--preserve-mode`.
- Update `copy_multithreaded` to handle permissions and ownership.
- Implement basic permission and ownership handling logic in `dir_preprocessor.rs`.
- Add MacOS build target and corresponding linker configuration in Makefile and `.cargo/config.toml`.
- Change FileData getters to return references to avoid cloning
- Remove the file_chunking module and related code
- Add a permissions module with skeleton set_dir_permissions and
  set_dir_ownership
- Update dir_preprocessor to use borrowed PathBuf keys and depth-based
  path graph
- Improve archive_mode logging to display path strings when available
- Implement `set_dir_permissions` and `set_dir_ownership` with depth-based ordering.
- Extend `--preserve-mode` support for sequential permission/ownership updates.
- Add `BTreeMap` for optimized metadata storage.
- Update tests and refactor related modules for improved consistency.
- Introduced `--preserve-mode` support for both sequential and multithreaded file copying.
- Added Unix-specific `chown` implementation in `copy_sequential` and `copy_multithreaded`.
- Integrated `jemallocator` for Linux builds and added debug allocator logging.
- Updated Makefile to include test-specific build targets.
- Enhanced platform-specific configurations in `.cargo/config.toml` for Linux and macOS.
- Refactored module imports with conditional compilation for Unix targets.
- Fixed typos and improved comments for better clarity and maintainability.
- Replace `HashMap` with `BTreeMap` for ordered path graph storage.
- Add early exit for empty relative paths in directory analysis.
- Fix edge case for empty file paths to prevent incorrect handling.
- Improve comments for better readability and debugging insights.
- Enhance comment clarity for K-Means++ centroid initialization algorithm.
- Add reference to external implementation for better context.
- Adjust comments in archive_mode to clarify lazy evaluations and new file insertion logic.
- Introduce `multithreaded_dir_analysis` with Rayon thread pool for concurrent processing.
- Add `generate_file_data` macro to simplify and standardize file metadata generation.
- Optimize single-file and directory handling logic in `analysis`.
- Update test to reflect new directory analysis behavior.
- Move debug timing logic closer to `multithreaded_dir_analysis`.
- Remove redundant timing logic in `root_contents` processing.
- Extend `analysis` to support mode selection and provide analysis duration.
- Add `analysis_mode` and `multi_threaded_analysis` options to CLI for flexible analysis operations.
- Update tests and logging to accommodate new analysis behavior.
- Add elapsed time display for analysis in standard output.
- Add `num_walkers` parameter to `analysis` for improved flexibility in multithreaded directory analysis.
- Update CLI to include `number_of_walkers` option.
- Modify tests and associated logic to support the new parameter.
- Replace sequential directory processing with parallelized chunk processing using Rayon.
- Add `process` helper function for better modularity and reuse.
- Improve edge case handling for empty directories with detailed logging.
- Minor formatting improvements in `analysis` function signature and related logic.
- Update `destination` argument logic to depend on either `completions` or `analysis_mode`.
- Minor whitespace cleanup.
- Reorganize CLI arguments for improved clarity and functionality.
- Refactor source and destination validation to accommodate `analysis_mode`.
- Enhance analysis output with walker count and formatted duration display.
- Reintegrate scheduler validation and destination preprocessing after analysis logic.
- Replace static walker count with `LazyCell` for dynamic CPU-based initialization.
- Add ARM64 Windows release build target.
- Update `build-all` target to include ARM64 Windows.
- Extend hash calculation in `Makefile` for ARM64 Linux builds.
- Replace `cargo build` with `cargo xwin build` for `build-release-windows-arm64`.
- Include hash calculation for ARM64 Windows release binaries.
- Add `human_readable_number` macro for formatted output.
- Refactor source and destination validation logic.
- Improve analysis output with resolved paths, human-readable counts, and walker info.
- Simplify scheduler and destination preprocessing.
- Remove temporary fixes and clean up unused code.
- Trim and standardize scheduler value parsing.
- Update dependencies and version to `1.2.17`.
Simplified permission setting function by removing an unecessarry
creation of BTreeMap. Instead we simply reverse an iterator on an
already sorted extracted_paths --> shalow to deep --> deep to shalow.

Added some extract comments and docs.
Schedulers were redesigned with Rayon in mind a while a ago and keeping
the function singature async did not make much sense.

Made some additonal optimazations: removed the human_readbale_macro,
since I have to use indicatif anyway.
- Deleted `disk_speed_probe.rs` and its reference in `mod.rs`.
- Refined k-means and CFS scheduler docs for clarity, added detailed error and result descriptions.
- Minor adjustments to file operation and directory analysis comments for consistency.
- added some hints in form of `[cold]` and `[inline]` to help compiler
  optimze the code
- formatted some code and docs to make it more alligned
- added some new comments and docks
Reviewed-on: #72
Reviewed-on: #73
Instead of creating cluster with owened data points, I now create
clusters with refernces to points created before the main loop. Data is
statically declared for the duration of clustering a.k.a lifetime is
long enough to just keep references to these points.
During the centroid reintialization procedure the point was still being
cloned when distance to centroids was beign calculated.
- rewrote the target calcualtion function for cfs. In order to
  accomodate directories where all file size have a uniform
  distribution, now the target is choose beteen equale share of bytes
  per group and RMS.
- max_group_size was updated to by default be chunk_size, but user can
  overwrite it.
- main and class updated to accomodate the new input paramters

Clean up:
 - added new commetns
 - modified k_means to in full rellay on Vec<&DataPoint> to avoid
   unecessary cloning and allocations

 And of course general clean-up of code and dependecies.
After intial testing it was casuing more issues than benefits.
Added intial code for the linear regression scheduler
- fixed some typos in comments and code
- removed an unecessary Atomic counter from the progress-bar logic in
  favour of build-in .inc() function
- added a new function in class.rs which returns schedulers String
  representation
Reviewed-on: #74
- modified the return type of the `from_string` function to Result<Self>
  from Option<Self>; this approach removed the unecessary `.validate()`
  function
- Added intial code -- parameters and constructors -- to TS scheduler
- added a funciton to calcuate the palne representing relative file
  sizes and time needed to copy them
- found a nice website on how to find the line of best fit
- Add EDF to the codebook
- Introduce sampling_iteration_limit in SchedulerConfig
- Thread limit through to TemporalScheduler and its config
- Update linear_regression to use a sampling limit
allocator features.
Additinally:
- added necessary arguments for TS to main
- moved shceduling timing from schedule() function to run() for more
  accurate and persistant results
- general performance optimazations
with all major features implemented, with exception of file chunking.
Reviewed-on: #75
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
jakeISAC/bifrost!76
No description provided.