merge() optimization #363

ancapdev · 2018-03-26T14:04:20Z

Summary:

Unchecked construction of TimeArray objects.
Direct implementation of outer join.
Support for arbitrary types by user defined missing value.
More optimal index based assignment than generic indexing operation (applies to left, right, and outer join).

Mostly focused on optimizing outer join. Benchmark example:

using TimeSeries
using BenchmarkTools

struct SimpleTime <: Dates.TimeType
    value::Int64
end

Base.isless(x::SimpleTime, y::SimpleTime) = x.value < y.value
Base.isequal(x::SimpleTime, y::SimpleTime) = x.value == y.value
Base.:(==)(x::SimpleTime, y::SimpleTime) = x.value == y.value

t1 = [SimpleTime(x) for x in 1:2_000_000]
t2 = [SimpleTime(2 * x) for x in 1:2_000_000]
v1 = rand(Float64, length(t1), 10)
v2 = rand(Float64, length(t2), 10)
ts1 = TimeArray(t1, v1)
ts2 = TimeArray(t2, v2)
@btime merge(ts1, ts2, :outer)

Previous result:

  2.008 s (659 allocations: 1.97 GiB)

New result:

  415.149 ms (518 allocations: 503.57 MiB)

At these scales it's mostly load/store bound so any reduction in that (e.g. smaller index types, smaller data types, fewer passes) make the biggest difference. For my, and maybe other people's use cases Float32 support helps a lot. The above benchmark with Float32 values is ~224ms.

codecov-io · 2018-03-26T14:16:16Z

Codecov Report

Merging #363 into master will increase coverage by 0.61%.
The diff coverage is 98.18%.

@@            Coverage Diff             @@
##           master     #363      +/-   ##
==========================================
+ Coverage   86.19%   86.81%   +0.61%     
==========================================
  Files          10       10              
  Lines         478      508      +30     
==========================================
+ Hits          412      441      +29     
- Misses         66       67       +1

Impacted Files	Coverage Δ
src/utilities.jl	`100% <100%> (ø)`	⬆️
src/combine.jl	`98.73% <92.85%> (-1.27%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dd503e...b5c13a4. Read the comment docs.

iblislin · 2018-03-26T14:33:37Z

src/utilities.jl

+    idx_b = Vector{IndexType}(length(b))
+    k = 1
+    @inbounds while (i <= na) && (j <= nb)
+		if a[i] < b[j]


The indentation looks weird here. Please use space.

Think the original code had some tabs and my editor was set to auto. Should be fixed.

iblislin · 2018-03-26T14:40:14Z

src/combine.jl

 function merge(ta1::TimeArray{T, N, D}, ta2::TimeArray{T, M, D}, method::Symbol=:inner;
-               colnames::Vector=[], meta::Any=Void) where {T, N, M, D}
+               colnames::Vector=[], meta::Any=Void, missingvalue=NaN) where {T, N, M, D}


We already have a padding keyword in other APIs, like lag(..., padding=true).
I think something like padvalues or paddedvalue might better?

Renamed to padvalue

ancapdev · 2018-03-28T11:09:05Z

@iblis17 Do these look in a state you're happy to merge now?

iblislin · 2018-03-28T11:18:28Z

src/utilities.jl

-end  # sorted_unique_merge
+For each column in src, insert elements from src[srcidx[i], column] to dst[dstidx[i], column].
+"""
+function insertbyidx!(dst::AbstractArray, src::AbstractArray, dstidx::Vector, srcidx::Vector)


I think this function can be replaced by

broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx)

which is faster in your use case?

using BenchmarkTools using TimeSeries dst = zeros(1_000_000) src = zeros(1_000_000) dstidx = [1:length(dst)...] srcidx = [1:length(dst)...] @btime broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx) @btime TimeSeries.insertbyidx!(dst, src, dstidx, srcidx)

9.486 ms (8 allocations: 7.63 MiB)
2.208 ms (0 allocations: 0 bytes)

I would love for the core language and library features to work optimally, so worth revisiting this in the future.

iblislin · 2018-03-28T11:43:29Z

Thanks for your great contributions! 👍

ancapdev · 2018-03-28T11:51:44Z

It's fun to help a little bit when so much great work is done by other people before me 😃.

Btw, broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx) doesn't seem to work multidimensionally, only copying the first column. dst[dstidx, :] = src[srcidx, :] works of course, and is about 3-4x slower when copying half the rows from a 10_000_000 x 10 array.

iblislin · 2018-03-28T12:10:26Z

Actually, dst[dstidx, :] = src[srcidx, :] can have a nice result, and don't benchmark against global variable (Type of global variable is unpredictable, so it isn't being optimized).

julia> f = (dst, src, srcidx, dstidx) -> @inbounds(dst[dstidx, :] = @view(src[srcidx, :]))                                                         
(::#26) (generic function with 1 method)                                                                                                           
                                                                                                                                                   
julia> @btime f($dst, $src, $srcidx, $dstidx)                                                                                                      
  1.700 ms (5 allocations: 192 bytes)                                                                                                              
                                                                                                                                                   
julia> @btime TimeSeries.insertbyidx!($dst, $src, $dstidx, $srcidx)                                                                                
  1.632 ms (0 allocations: 0 bytes)                                                                                                                
                                                                                                                                                   
julia> @btime broadcast_setindex!($dst, broadcast_getindex($src, $srcidx), $dstidx)                                                                
  4.636 ms (4 allocations: 7.63 MiB)

ancapdev · 2018-03-28T12:50:21Z

Yep, aware of the globals type instability. In this case I figured it wasn't going to make much difference because it only affects the dispatch, and the functions in the benchmark operate on a fairly large dataset.

You're right though, with @inbounds and @view the performance is about the same, so this is a nicer solution.

ancapdev added 4 commits March 23, 2018 14:14

Optimize merge

e280b53

Remove sorted_subset_idx and roll sorted_unique_merge

34e7753

Make sorted_unique_merge indexing safe for larger size inputs

8140861

Merge branch 'master' into merge-opt

a0e5996

Tests for custom missing values in merge()

9a1d78a

iblislin added this to the 0.12.0 milestone Mar 26, 2018

iblislin reviewed Mar 26, 2018

View reviewed changes

ancapdev added 3 commits March 26, 2018 16:42

Fix tabs to spaces

49e5dfe

Rename missingvalue to padvalue

15ea954

Remove out of date comment

b5c13a4

iblislin reviewed Mar 28, 2018

View reviewed changes

iblislin added the need news label Mar 28, 2018

iblislin merged commit 21c7dcc into JuliaStats:master Mar 28, 2018

iblislin mentioned this pull request Oct 1, 2018

Extend TimeSeries functionality for multi-rate series and fixed sampling rates? #376

Open

iblislin removed the need news label Oct 24, 2018

iblislin mentioned this pull request May 5, 2020

Lead/lag with respect to a time variable JuliaArrays/ShiftedArrays.jl#37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge() optimization #363

merge() optimization #363

ancapdev commented Mar 26, 2018

codecov-io commented Mar 26, 2018 •

edited

Loading

iblislin Mar 26, 2018

ancapdev Mar 26, 2018

iblislin Mar 26, 2018

ancapdev Mar 26, 2018

ancapdev commented Mar 28, 2018

iblislin Mar 28, 2018 •

edited

Loading

ancapdev Mar 28, 2018

ancapdev Mar 28, 2018

iblislin commented Mar 28, 2018

ancapdev commented Mar 28, 2018

iblislin commented Mar 28, 2018

ancapdev commented Mar 28, 2018

merge() optimization #363

merge() optimization #363

Conversation

ancapdev commented Mar 26, 2018

codecov-io commented Mar 26, 2018 • edited Loading

Codecov Report

iblislin Mar 26, 2018

Choose a reason for hiding this comment

ancapdev Mar 26, 2018

Choose a reason for hiding this comment

iblislin Mar 26, 2018

Choose a reason for hiding this comment

ancapdev Mar 26, 2018

Choose a reason for hiding this comment

ancapdev commented Mar 28, 2018

iblislin Mar 28, 2018 • edited Loading

Choose a reason for hiding this comment

ancapdev Mar 28, 2018

Choose a reason for hiding this comment

ancapdev Mar 28, 2018

Choose a reason for hiding this comment

iblislin commented Mar 28, 2018

ancapdev commented Mar 28, 2018

iblislin commented Mar 28, 2018

ancapdev commented Mar 28, 2018

codecov-io commented Mar 26, 2018 •

edited

Loading

iblislin Mar 28, 2018 •

edited

Loading