Skip to content

Yusheng-Hu/Position-Pure-Algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

293 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

License: MIT Language: C++ Complexity: O(n) DOI Permutation Generation ORCID GitHub stars


Position Pure Algorithm

Official implementation and interactive visualizations of the Position Pure (PP) linear-time ranking/unranking algorithm.

πŸš€ Evolution of Efficiency: Beyond Heap's Algorithm

The Position-Pure Algorithm and Circle-Permutation-Algorithm represent a paradigm shift in combinatorial generation by optimizing the fundamental cost of state transitions.

  • Heap's Algorithm (The Classical Baseline): Long considered the gold standard due to its swap-based approach. While swaps were once thought to be the minimum-cost operation, they involve multiple memory accesses and conditional logic that limit peak performance.
  • Ives' Iterative Algorithm: Improved upon Heap's by utilizing a single-shift and single-assignment pattern, reducing the overhead of element movement.
  • Position-Pure Algorithm (Our Work): Achieves a new frontier by eliminating conditional branching and implementing a highly refined shift-and-assign strategy.
    • Scalability: Optimized for large $N$.
    • Concurrency: Stateless design allows for massive parallel execution, a feat traditional recursive algorithms cannot achieve.
  • Circle-Permutation-Algorithm : Specifically engineered for circular symmetry. While it introduces controlled spatial overhead, it shatters traditional performance barriers by realizing an O((n-1)!) complexity framework.

Key Insights

  • Significant Speedup: permPure_full consistently outperforms Heap's Algorithm by a factor of approximately 7x.
  • Algorithmic Efficiency: The performance gap highlights the superior memory access patterns and lower computational overhead inherent in the PositionPure algorithm.
  • Scalability: As the permutation space grows factorially, the performance gap remains stable, demonstrating excellent algorithmic efficiency for large-scale generation.

πŸš€ Live Demonstrations

πŸš€ Full Permutation generation: Iterative Permutation Generation

The PositionPure algorithm is a high-performance, iterative approach to generating all permutations of a set. By utilizing an iterative state machine rather than traditional recursion, it significantly reduces function call overhead and optimizes CPU branch prediction efficiency.

The algorithm utilizes an iterative state machine to eliminate recursion overhead and leverages hardware affinity binding via the Windows API to minimize context-switching noise, with performance verified to nanosecond-level accuracy through high-precision benchmarking.

Last Run: Thu Feb 19 09:58:45 2026 UTC / Thu Feb 19 17:58:45 2026 (UTC+8) Environment: AMD EPYC 7763 64-Core Processor (GitHub Actions Runner)

N Heap (s) PP (s) Speedup
9 0.005729 0.000998 5.74x
10 0.058123 0.006297 9.23x
11 0.645687 0.064628 9.99x
12 7.834273 0.720868 10.86x
13 103.662456 8.662150 11.96x

Last Run: Fri Feb 06 07:20:50 2026 UTC / Fri Feb 06 15:20:50 2026 (UTC+8) Environment: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz (GitHub Actions Runner)

N Heap (s) PP (s) Speedup
9 0.006039 0.000871 6.93x
10 0.059160 0.007174 8.24x
11 0.657323 0.075407 8.71x
12 7.809986 1.004821 7.77x
13 103.446021 11.693761 8.84x
14 1468.660328 162.986940 9.01x

πŸš€ Comparison with Python's Built-in itertools

At the request of Reddit users and other community members, a performance comparison with Python’s built-in itertools was conducted. Results are based on the implementation in Position-Pure-Algorithm/python/pp_iter.py.

  • Runtime Environment: The tests were performed using PyPy3. Since the standard itertools library is C-optimized, using PyPy3 helps bridge the low-level language gap, allowing for a more authentic comparison of algorithmic efficiency.
  • Results: Benchmark tests show that this algorithm improves performance by at least 1.4x, with potential gains reaching over 2x compared to the standard library.
  • Future Plans: For applications requiring even higher performance, a C-compiled version should be considered.

🐍 Position Pure Iterator Performance (PyPy)

Last Run: Wed Feb 11 15:24:27 2026 UTC / Wed Feb 11 23:24:27 2026 (UTC+8) Environment: AMD EPYC 7763 64-Core Processor (GitHub Actions Runner)

N Total Permutations Itertools (s) Position Pure (s) Speed-up
10 3,628,800 0.1078s 0.0676s 1.60x
11 39,916,800 1.2125s 0.5686s 2.13x
12 479,001,600 15.0159s 6.5625s 2.29x
13 6,227,020,800 200.7682s 84.1281s 2.39x
14 87,178,291,200 2884.4946s 1158.2538s 2.49x

Last Run: Fri Feb 06 05:40:44 2026 UTC / Fri Feb 06 13:40:44 2026 (UTC+8) Environment: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz (GitHub Actions Runner)

N Total Permutations Itertools (s) Position Pure (s) Speed-up
10 3,628,800 0.1513s 0.1334s 1.13x
11 39,916,800 2.0003s 1.0969s 1.82x

⚑ Dual-Core Parallel Performance (GitHub Actions)

Last Run: Sat Feb 21 13:38:29 UTC 2026 / Sat Feb 21 21:38:29 2026 (UTC+8) Environment: AMD EPYC 7763 64-Core Processor (GitHub Actions Runner)

N Core ID Execution Time Throughput (G/s) Total Count
10 Core 0 0.0029s 0.63 -
10 Core 1 0.0029s 0.63 3628800
11 Core 0 0.0310s 0.64 -
11 Core 1 0.0308s 0.65 39916800
12 Core 0 0.3597s 0.67 -
12 Core 1 0.3585s 0.67 479001600
13 Core 0 4.5065s 0.69 -
13 Core 1 4.5037s 0.69 6227020800
14 Core 0 65.2698s 0.67 -
14 Core 1 64.4870s 0.68 87178291200

πŸš€ Liner rank unrank: Position Pure (PP) vs. Myrvold-Ruskey (MR)

Last Run: Sat Feb 21 13:35:00 2026 UTC / Sat Feb 21 21:35:00 2026 (UTC+8) Environment: AMD EPYC 7763 64-Core Processor (GitHub Actions Runner)

N Dist MR (ns/it) PP (ns/it) Speedup
1000 Random 1057.0 633.4 1.67x
1000 Sorted 676.9 637.7 1.06x
1000 Reverse 873.1 728.2 1.20x
100000 Random 114563.3 94561.8 1.21x
100000 Sorted 67846.6 62763.3 1.08x
100000 Reverse 80115.5 66770.6 1.20x
1000000 Random 1568522.4 1247111.3 1.26x
1000000 Sorted 699835.1 626860.0 1.12x
1000000 Reverse 802942.6 669304.3 1.20x

Last Run: Sat Feb 21 13:32:53 2026 UTC / Sat Feb 21 21:32:53 2026 (UTC+8) Environment: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz (GitHub Actions Runner)

N Dist MR (ns/it) PP (ns/it) Speedup
1000 Random 697.2 578.0 1.21x
1000 Sorted 651.1 476.6 1.37x
1000 Reverse 666.3 574.2 1.16x
100000 Random 109696.3 104611.9 1.05x
100000 Sorted 63265.3 46876.3 1.35x
100000 Reverse 69900.0 79963.0 0.87x
1000000 Random 2331791.2 2143208.0 1.09x
1000000 Sorted 712214.9 470972.9 1.51x
1000000 Reverse 853533.5 870776.9 0.98x

πŸ’» Source Code

The core C++ implementations of the algorithms can be found here:

  • map_perm_algorithms.cpp: Includes Position_unrank, Position_rank, PositionPure_unrank, and PositionPure_rank.
  • permPure_full.cpp: Serves as a high-performance reference implementation for generating all permutations of a set using the PositionPure iterative algorithm.

πŸ›  Compilation Commands

The following commands compile both algorithms using the same aggressive optimization flags to ensure a fair performance comparison:

# Compile options
g++ -O3 -std=c++11 -march=native -ffast-math -fomit-frame-pointer

πŸ“– Citation

If you use this algorithm or implementation in your research, please cite it as follows:

APA Style

HU, Y. (2026). Position Method: A Linear-Time Generation Algorithm for Permutations (Version v1.0.0). GitHub Repository. DOI: 10.5281/zenodo.18170157

BibTeX

@software{hu_yusheng_2026_18170157,
  author       = {Hu, Yusheng},
  title        = {Position Method: A Linear-Time Generation Algorithm for Permutations},
  year         = 2026,
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.18170157},
  url          = {[https://doi.org/10.5281/zenodo.18170157](https://doi.org/10.5281/zenodo.18170157)}
}

πŸ’‘ Future Ideas & Extensions

1. Zero-Memory "Lazy" Permutation

Since we can determine any value at position k in O(N) time without generating the full array, we can implement a Lazy-Loaded Permutation Object.

  • Use Case: Accessing elements of a massive permutation (N > 10^6) that exceeds RAM capacity.
  • Concept: Overload operator[] to call PP_get_value_at_position(k) on the fly.

2. SIMD-Accelerated Partial Search

The backward-tracing logic if (C[i] == current_target_pos) is highly vectorizable.

  • Idea: Use AVX-512 to scan 16 elements of C simultaneously.
  • Goal: Achieve near O(1) perceived latency for individual element lookups.

3. Distributed Constrained Search

Leverage the independence of the get_value function to split a single permutation validation task across multiple CPU cores or network nodes without shared state.

πŸ’‘ Why Position Pure?

The Position Pure (PP) algorithm provides a more intuitive $O(n)$ implementation compared to the classic Myrvold-Ruskey (MR) method:

  • Simpler Indexing: Replaces the traditional swap-based unranking with a direct mapping logic.
  • Educational Value: The provided visualizations make the complex mapping process easy to understand.

πŸ“„ Academic Status

  • Preprint: need long time.
  • Status: $O(n)$ complexity achieved, improving upon the classic Myrvold-Ruskey logic.

πŸ“’ Important Update: From "Position Pro" to "Position Pure"

Please note: Starting from version v1.1.1, this algorithm has been officially renamed from Position Pro to Position Pure (PP).

Why the name change?

  1. Trademark Consideration: We found that "Position Pro" is a registered trademark in several industrial and commercial sectors. To ensure the algorithm's academic independence and avoid legal confusion, we have transitioned to a unique name.
  2. Technical Essence (Why "Pure"?): The name "Pure" more accurately describes the algorithm's technical implementation. In computer science, "pure" often implies a clean, branchless execution flow.
  3. Core Feature - Branchless Logic: Unlike the base version, Position Pure (PP) achieves its linear-time performance through direct element-wise shifting and replacement. It eliminates conditional branching (no if statements in the core loop), leading to a "purer" and faster execution path on modern CPUs.

About

Official implementation and interactive visualization of the Position Pure (PP) linear-time unranking algorithm.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •