Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Guidebook - Good first issue] How to implement a new SQL function #48201

Open
zclllyybb opened this issue Feb 22, 2025 · 0 comments
Open

[Guidebook - Good first issue] How to implement a new SQL function #48201

zclllyybb opened this issue Feb 22, 2025 · 0 comments

Comments

@zclllyybb
Copy link
Contributor

zclllyybb commented Feb 22, 2025

New SQL functions are usually one of the most proper parts for newcomers to try to participate in Doris' development. For future reference, we'd like to list the basic processes for implementing a new function in Doris.

What to do

To implement a new SQL function, here's what you need to write in your PR:

  1. The function implementation and registration in BE
  2. The function signature and visitor for nereids planner in FE
  3. The constant fold implementation in FE if possible. just like what https://github.com/apache/doris/pull/40744/files did in functions/executable/NumericArithmetic.java.
  4. A function docs PR in https://github.com/apache/doris-website must follow our newest docs specification. See https://github.com/apache/doris-website/pull/1992/files for an example.
  5. Enough regression-test and BE-UT cases, referring files test_template_{X}_arg(s).groovy in https://github.com/apache/doris/pull/47307/files (maybe updated. So find the newest version in master branch)

You could refer to https://github.com/apache/doris/pull/47307/files as a complete example(only missing FE constant folding)

btw: You may see some PR modified doris_builtin_functions.py. Now we don't need it anymore.

Key Points

BE Implementations

  1. Use the base template when you try to implement a date/arithmetic calculation. You can find them by searching for other similar functions.
  2. Execution speed is very, very important for Doris. Therefore, you must eliminate all unnecessary copies and calculations. Try to use raw operations on inputs and outputs. If you can use the output Column's memory to receive the calculation result, do not add another variable and copy them. Don't call any virtual function in a loop. If it's necessary, use the template to generate different function entities to eliminate type judgment.

FE Signature

Most functions use one of the following interfaces:

  1. AlwaysNullable means the function's return type is always wrapped in Nullable. Use it when the function may generate the null value for not-null input.
  2. AlwaysNotNullable means the function's return type is never wrapped in Nullable. Use it when the function changes all the null input to a not-null output.
  3. PropagateNullable: when the input columns contain at least one Nullable column, the output column is Nullable. otherwise not. When you calculate the result for a not-null value and leave null alone, it's the right choice.

Testcase

The testcases' type and quantities must not be less than the corresponding files in https://github.com/apache/doris/pull/47307/files.

The data you use must cover all the borders of its datatype and other sensitive values.

Add BE-UT case with check_function_all_arg_comb interface to cover Const-combinations.

You can run the cases using the scripts run-regression-test.sh and run-be-ut.sh. They have details explanations in them.

Other Advice

  1. If you don't know how to use the proper interface of a certain type of object, just look for how others played with them.
  2. The AI-assisted programming is quite mature now. So ask AI first if you want to make clear how some parts of the code work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant