Skip to content

Priority queue container #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .ci/atime/tests.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
test.list <- atime::atime_test_list(
binseg_normal_best=atime::atime_test(
setup={
max.segs <- as.integer(N/2)
data_vec <- 1:N
},
expr=binsegRcpp::binseg_normal(data_vec, max.segs)
)
)
32 changes: 32 additions & 0 deletions .github/workflows/performance-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: atime performance tests

on:
pull_request:
types:
- opened
- reopened
- synchronize
paths:
- 'R/**'
- 'src/**'
- '.ci/atime/**'

jobs:
comment:
runs-on: ubuntu-latest
container: ghcr.io/iterative/cml:0-dvc2-base1
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
repo_token: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-r@v2
with:
r-version: release
http-user-agent: release
use-public-rspm: true
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
needs: check
- uses: Anirban166/[email protected]
32 changes: 26 additions & 6 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Workflow derived from https://github.com/r-lib/actions/tree/master/examples
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
Expand All @@ -15,16 +15,36 @@ jobs:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-r@v1
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v1
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: covr
extra-packages: any::covr
needs: coverage

- name: Test coverage
run: covr::codecov()
run: |
covr::codecov(
quiet = FALSE,
clean = FALSE,
install_path = file.path(Sys.getenv("RUNNER_TEMP"), "package")
)
shell: Rscript {0}

- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash

- name: Upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: binsegRcpp
Type: Package
Title: Efficient Implementation of Binary Segmentation
Version: 2025.4.29
Version: 2025.5.6
Authors@R: person(given = c("Toby", "Dylan"),
family = "Hocking",
role = c("aut", "cre"),
Expand Down
4 changes: 4 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
Changes in version 2025.5.6

- add new container: STL priority_queue (heap), almost no difference with multiset.

Changes in version 2025.4.29

- update Rcpp code in interface.cpp to avoid access warnings: use .begin() instead of &vec[0] to get pointer to first element of vector.
Expand Down
6 changes: 3 additions & 3 deletions R/binseg.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ binseg <- structure(function # Binary segmentation
min.segment.length=NULL,
### Positive integer, minimum number of data points per
### segment. Default NULL means to use min given distribution.str.
container.str="multiset"
container.str="priority_queue"
### C++ container to use for storing breakpoints/cost. Most users
### should leave this at the default "multiset" for efficiency but you
### could use "list" if you want to study the time complexity of a
### should leave this at the default "priority_queue" for efficiency, but you
### could use "list" if you want to study the time complexity of an asymptotically
### slower implementation of binary segmentation.
){
##alias<< binsegRcpp
Expand Down
6 changes: 3 additions & 3 deletions man/binseg.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ log-linear time, using coef method.}
weight.vec = rep(1,
length(data.vec)),
min.segment.length = NULL,
container.str = "multiset")}
container.str = "priority_queue")}
\arguments{
\item{distribution.str}{String indicating distribution/loss function, use
\code{\link{get_distribution_info}} to see possible values.}
Expand All @@ -32,8 +32,8 @@ default=1:length(\code{data.vec}).}
\item{min.segment.length}{Positive integer, minimum number of data points per
segment. Default NULL means to use min given \code{distribution.str}.}
\item{container.str}{C++ container to use for storing breakpoints/cost. Most users
should leave this at the default "multiset" for efficiency but you
could use "list" if you want to study the time complexity of a
should leave this at the default "priority_queue" for efficiency, but you
could use "list" if you want to study the time complexity of an asymptotically
slower implementation of binary segmentation.}
}
\details{Each iteration involves first computing and storing the
Expand Down
77 changes: 40 additions & 37 deletions src/binseg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -384,18 +384,9 @@ template <typename T>
class MyContainer : public Container {
public:
T segment_container;
typename T::iterator best;
int get_size(void){
return segment_container.size();
}
void remove_best(void){
segment_container.erase(best);
}
virtual typename T::iterator get_best_it(void) = 0;
const Segment* set_best(void){
best = get_best_it();
return &(*best);
}
};

typedef std::multiset<Segment> segment_set_type;
Expand All @@ -410,14 +401,17 @@ container_umap_type* get_container_umap(void){
return &container_umap;
}

#define CMAKER(CONTAINER, INSERT, BEST) \
class CONCAT(CONTAINER,Wrapper) : public MyContainer< std::CONTAINER<Segment> > { \
#define CMAKER(CONTAINER, STRUCT, INSERT, SET_IT, GET_SEG, ERASE) \
class CONCAT(CONTAINER,Wrapper) : public MyContainer< STRUCT > { \
public: \
void insert(Segment& new_seg){ \
segment_container.INSERT(new_seg); \
} \
std::CONTAINER<Segment>::iterator get_best_it(void){ \
return BEST; \
Segment get_best(void){ \
SET_IT; \
Segment seg = GET_SEG; \
ERASE; \
return seg; \
} \
}; \
Container* CONCAT(CONTAINER,construct) (){ \
Expand All @@ -429,9 +423,21 @@ container_umap_type* get_container_umap(void){
static ContainerFactory CONCAT(CONTAINER,_instance) \
( #CONTAINER, CONCAT(CONTAINER,construct), CONCAT(CONTAINER,destruct) );

CMAKER(multiset, insert, segment_container.begin())
#define CIT(CONTAINER, INSERT, BEST) \
CMAKER(CONTAINER, std::CONTAINER<Segment>, INSERT, std::CONTAINER<Segment>::iterator it = BEST, *it, segment_container.erase(it))

CIT(multiset, insert, segment_container.begin())

CIT(list, push_back, std::min_element(segment_container.begin(),segment_container.end()))

CMAKER(list, push_back, std::min_element(segment_container.begin(),segment_container.end()))
class PQ_Compare {
public:
bool operator()(Segment a, Segment b){
return !(a < b);
}
};
#define PQ_STRUCT std::priority_queue<Segment,std::vector<Segment>,PQ_Compare>
CMAKER(priority_queue, PQ_STRUCT, push, , segment_container.top(), segment_container.pop())

class Candidates {
public:
Expand Down Expand Up @@ -729,35 +735,32 @@ int binseg
int seg_i = 0;
while(V.container_ptr->not_empty() && ++seg_i < max_segments){
// Store loss and model parameters associated with this split.
const Segment *seg_ptr = V.container_ptr->set_best();
const Segment seg = V.container_ptr->get_best();
out_arrays.save
(seg_i,
subtrain_loss[seg_i-1] + seg_ptr->best_decrease,
validation_loss[seg_i-1] + seg_ptr->validation_decrease,
seg_ptr->best_split.this_end,
seg_ptr->depth,
seg_ptr->best_split.before,
seg_ptr->best_split.after,
seg_ptr->invalidates_index,
seg_ptr->invalidates_after,
seg_ptr->best_split.this_end - seg_ptr->first_i + 1,
seg_ptr->last_i - seg_ptr->best_split.this_end);
subtrain_loss[seg_i-1] + seg.best_decrease,
validation_loss[seg_i-1] + seg.validation_decrease,
seg.best_split.this_end,
seg.depth,
seg.best_split.before,
seg.best_split.after,
seg.invalidates_index,
seg.invalidates_after,
seg.best_split.this_end - seg.first_i + 1,
seg.last_i - seg.best_split.this_end);
// Finally add new split candidates if necessary.
V.maybe_add
(seg_ptr->first_i, seg_ptr->best_split.this_end,
(seg.first_i, seg.best_split.this_end,
0,//invalidates_after=0 => before_mean invalidated.
seg_i, seg_ptr->best_split.before.loss,
seg_ptr->before_validation_loss,
seg_ptr->depth);
seg_i, seg.best_split.before.loss,
seg.before_validation_loss,
seg.depth);
V.maybe_add
(seg_ptr->best_split.this_end+1, seg_ptr->last_i,
(seg.best_split.this_end+1, seg.last_i,
1,//invalidates_after=1 => after_mean invalidated.
seg_i, seg_ptr->best_split.after.loss,
seg_ptr->after_validation_loss,
seg_ptr->depth);
// Erase at end because we need seg_ptr->values during maybe_add
// inserts above.
V.container_ptr->remove_best();
seg_i, seg.best_split.after.loss,
seg.after_validation_loss,
seg.depth);
}
return 0;//SUCCESS.
}
Expand Down
4 changes: 2 additions & 2 deletions src/binseg.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include <string>
#include <string.h>
#include <list>
#include <queue>
#include <algorithm>
#include <set>//multiset
#include <unordered_map>
Expand Down Expand Up @@ -129,8 +130,7 @@ class Container {
public:
virtual void insert(Segment&) = 0;
virtual int get_size(void) = 0;
virtual const Segment* set_best(void) = 0;
virtual void remove_best(void) = 0;
virtual Segment get_best(void) = 0;
virtual ~Container() {};
bool not_empty(void){
return get_size() > 0;
Expand Down
Loading