Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge() optimization #363

Merged
merged 8 commits into from
Mar 28, 2018
Merged

merge() optimization #363

merged 8 commits into from
Mar 28, 2018

Conversation

ancapdev
Copy link
Contributor

Summary:

  • Unchecked construction of TimeArray objects.
  • Direct implementation of outer join.
  • Support for arbitrary types by user defined missing value.
  • More optimal index based assignment than generic indexing operation (applies to left, right, and outer join).

Mostly focused on optimizing outer join. Benchmark example:

using TimeSeries
using BenchmarkTools

struct SimpleTime <: Dates.TimeType
    value::Int64
end

Base.isless(x::SimpleTime, y::SimpleTime) = x.value < y.value
Base.isequal(x::SimpleTime, y::SimpleTime) = x.value == y.value
Base.:(==)(x::SimpleTime, y::SimpleTime) = x.value == y.value

t1 = [SimpleTime(x) for x in 1:2_000_000]
t2 = [SimpleTime(2 * x) for x in 1:2_000_000]
v1 = rand(Float64, length(t1), 10)
v2 = rand(Float64, length(t2), 10)
ts1 = TimeArray(t1, v1)
ts2 = TimeArray(t2, v2)
@btime merge(ts1, ts2, :outer)

Previous result:

  2.008 s (659 allocations: 1.97 GiB)

New result:

  415.149 ms (518 allocations: 503.57 MiB)

At these scales it's mostly load/store bound so any reduction in that (e.g. smaller index types, smaller data types, fewer passes) make the biggest difference. For my, and maybe other people's use cases Float32 support helps a lot. The above benchmark with Float32 values is ~224ms.

@codecov-io
Copy link

codecov-io commented Mar 26, 2018

Codecov Report

Merging #363 into master will increase coverage by 0.61%.
The diff coverage is 98.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #363      +/-   ##
==========================================
+ Coverage   86.19%   86.81%   +0.61%     
==========================================
  Files          10       10              
  Lines         478      508      +30     
==========================================
+ Hits          412      441      +29     
- Misses         66       67       +1
Impacted Files Coverage Δ
src/utilities.jl 100% <100%> (ø) ⬆️
src/combine.jl 98.73% <92.85%> (-1.27%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dd503e...b5c13a4. Read the comment docs.

@iblislin iblislin added this to the 0.12.0 milestone Mar 26, 2018
src/utilities.jl Outdated
idx_b = Vector{IndexType}(length(b))
k = 1
@inbounds while (i <= na) && (j <= nb)
if a[i] < b[j]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks weird here. Please use space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think the original code had some tabs and my editor was set to auto. Should be fixed.

src/combine.jl Outdated
function merge(ta1::TimeArray{T, N, D}, ta2::TimeArray{T, M, D}, method::Symbol=:inner;
colnames::Vector=[], meta::Any=Void) where {T, N, M, D}
colnames::Vector=[], meta::Any=Void, missingvalue=NaN) where {T, N, M, D}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a padding keyword in other APIs, like lag(..., padding=true).
I think something like padvalues or paddedvalue might better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to padvalue

@ancapdev
Copy link
Contributor Author

@iblis17 Do these look in a state you're happy to merge now?

end # sorted_unique_merge
For each column in src, insert elements from src[srcidx[i], column] to dst[dstidx[i], column].
"""
function insertbyidx!(dst::AbstractArray, src::AbstractArray, dstidx::Vector, srcidx::Vector)
Copy link
Collaborator

@iblislin iblislin Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function can be replaced by

broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx)

which is faster in your use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using BenchmarkTools
using TimeSeries
dst = zeros(1_000_000)
src = zeros(1_000_000)
dstidx = [1:length(dst)...]
srcidx = [1:length(dst)...]
@btime broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx)
@btime TimeSeries.insertbyidx!(dst, src, dstidx, srcidx)

9.486 ms (8 allocations: 7.63 MiB)
2.208 ms (0 allocations: 0 bytes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love for the core language and library features to work optimally, so worth revisiting this in the future.

@iblislin iblislin merged commit 21c7dcc into JuliaStats:master Mar 28, 2018
@iblislin
Copy link
Collaborator

Thanks for your great contributions! 👍

@ancapdev
Copy link
Contributor Author

It's fun to help a little bit when so much great work is done by other people before me 😃.

Btw, broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx) doesn't seem to work multidimensionally, only copying the first column. dst[dstidx, :] = src[srcidx, :] works of course, and is about 3-4x slower when copying half the rows from a 10_000_000 x 10 array.

@iblislin
Copy link
Collaborator

Actually, dst[dstidx, :] = src[srcidx, :] can have a nice result, and don't benchmark against global variable (Type of global variable is unpredictable, so it isn't being optimized).

julia> f = (dst, src, srcidx, dstidx) -> @inbounds(dst[dstidx, :] = @view(src[srcidx, :]))                                                         
(::#26) (generic function with 1 method)                                                                                                           
                                                                                                                                                   
julia> @btime f($dst, $src, $srcidx, $dstidx)                                                                                                      
  1.700 ms (5 allocations: 192 bytes)                                                                                                              
                                                                                                                                                   
julia> @btime TimeSeries.insertbyidx!($dst, $src, $dstidx, $srcidx)                                                                                
  1.632 ms (0 allocations: 0 bytes)                                                                                                                
                                                                                                                                                   
julia> @btime broadcast_setindex!($dst, broadcast_getindex($src, $srcidx), $dstidx)                                                                
  4.636 ms (4 allocations: 7.63 MiB)

@ancapdev
Copy link
Contributor Author

Yep, aware of the globals type instability. In this case I figured it wasn't going to make much difference because it only affects the dispatch, and the functions in the benchmark operate on a fairly large dataset.

You're right though, with @inbounds and @view the performance is about the same, so this is a nicer solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants