Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch/arm/stm32h7: Add lazy FPU for STM32H7 ARMV7 single core chips #15876

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

linguini1
Copy link
Contributor

Summary

Introduces LAZYFPU by enabling LSPEN and ASPEN for lazy stacking and automatic state preservation. FPU registers are no longer stored in XCPT registers (HW and SW) if LAZY FPU is enabled.

Discussed in #15826

Impact

This reduces context switching overhead in some contexts where the FPU is used. This feature is implemented for all ARMv7-M chips, however I have only added select ARCH_HAVE_LAZYFPU for the STM32H7 chips that are single ARMv7-M cores as I do not have an exhaustive list of all ARMv7-M chips to test on.

Testing

Performed on an STM32H743 chip. I have increased the number of FPU loops to 25 and reduced the FPU thread stack size to 2048.

Here is the results of ostest when LAZYFPU is disabled:

nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
ostest_main: putenv(Variable1=BadValue3)
ostest_main: setenv(Variable1, GoodValue1, TRUE)
ostest_main: setenv(Variable2, BadValue1, FALSE)
ostest_main: setenv(Variable2, GoodValue2, TRUE)
ostest_main: setenv(Variable3, GoodValue3, FALSE)
ostest_main: setenv(Variable3, BadValue2, FALSE)
show_variable: Variable=Variable1 has value=GoodValue1
show_variable: Variable=Variable2 has value=GoodValue2
show_variable: Variable=Variable3 has value=GoodValue3
ostest_main: Started user_main at PID=7

user_main: Begin argument test
user_main: Started with argc=5
user_main: argv[0]="ostest"
user_main: argv[1]="Arg1"
user_main: argv[2]="Arg2"
user_main: argv[3]="Arg3"
user_main: argv[4]="Arg4"

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    69720    69720
uordblks     a3f0     a3f0
fordblks    69b30    69b30

user_main: getopt() test
getopt():  Simple test
getopt():  Invalid argument
getopt():  Missing optional argument
getopt_long():  Simple test
getopt_long():  No short options
getopt_long():  Argument for --option=argument
getopt_long():  Invalid long option
getopt_long():  Mixed long and short options
getopt_long():  Invalid short option
getopt_long():  Missing optional arguments
getopt_long_only():  Mixed long and short options
getopt_long_only():  Single hyphen long options

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    69720    69720
uordblks     a3f0     a3f0
fordblks    69b30    69b30

user_main: libc tests

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    69720    69720
uordblks     a3f0     a3f0
fordblks    69b30    69b30
show_variable: Variable=Variable1 has value=GoodValue1
show_variable: Variable=Variable2 has value=GoodValue2
show_variable: Variable=Variable3 has value=GoodValue3
show_variable: Variable=Variable1 has no value
show_variable: Variable=Variable2 has value=GoodValue2
show_variable: Variable=Variable3 has value=GoodValue3

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        4
mxordblk    69720    69720
uordblks     a3f0     a3d0
fordblks    69b30    69b50
show_variable: Variable=Variable1 has no value
show_variable: Variable=Variable2 has no value
show_variable: Variable=Variable3 has no value

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         4        3
mxordblk    69720    69720
uordblks     a3d0     a358
fordblks    69b50    69bc8

user_main: setvbuf test
setvbuf_test: Test NO buffering
setvbuf_test: Using NO buffering
setvbuf_test: Test default FULL buffering
setvbuf_test: Using default FULL buffering
setvbuf_test: Test FULL buffering, buffer size 64
setvbuf_test: Using FULL buffering, buffer size 64
setvbuf_test: Test FULL buffering, pre-allocated buffer
setvbuf_test: Using FULL buffering, pre-allocated buffer
setvbuf_test: Test LINE buffering, buffer size 64
setvbuf_test: Using LINE buffering, buffer size 64
setvbuf_test: Test FULL buffering, pre-allocated buffer
setvbuf_test: Using FULL buffering, pre-allocated buffer

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    69720    69720
uordblks     a358     a358
fordblks    69bc8    69bc8

user_main: /dev/null test
dev_null: Read 0 bytes from /dev/null
dev_null: Wrote 1024 bytes to /dev/null

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    69720    69720
uordblks     a358     a358
fordblks    69bc8    69bc8

user_main: FPU test
Starting task FPU#1
fpu_test: Started task FPU#1 at PID=8
FPU#1: pass 1
Starting task FPU#2
fpu_test: Started task FPU#2 at PID=9
FPU#2: pass 1
FPU#1: pass 2
FPU#2: pass 2
FPU#1: pass 3
FPU#2: pass 3
FPU#1: pass 4
FPU#2: pass 4
FPU#1: pass 5
FPU#2: pass 5
FPU#1: pass 6
FPU#2: pass 6
FPU#1: pass 7
FPU#2: pass 7
FPU#1: pass 8
FPU#2: pass 8
FPU#1: pass 9
FPU#2: pass 9
FPU#1: pass 10
FPU#2: pass 10
FPU#1: pass 11
FPU#2: pass 11
FPU#1: pass 12
FPU#2: pass 12
FPU#1: pass 13
FPU#2: pass 13
FPU#1: pass 14
FPU#2: pass 14
FPU#1: pass 15
FPU#2: pass 15
FPU#1: pass 16
FPU#2: pass 16
FPU#1: pass 17
FPU#2: pass 17
FPU#1: pass 18
FPU#2: pass 18
FPU#1: pass 19
FPU#2: pass 19
FPU#1: pass 20
FPU#2: pass 20
FPU#1: pass 21
FPU#2: pass 21
FPU#1: pass 22
FPU#2: pass 22
FPU#1: pass 23
FPU#2: pass 23
FPU#1: pass 24
FPU#2: pass 24
FPU#1: pass 25
FPU#2: pass 25
FPU#1: Succeeded
FPU#2: Succeeded
fpu_test: Returning

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        5
mxordblk    69720    67f40
uordblks     a358     b368
fordblks    69bc8    68bb8

user_main: task_restart test

Test task_restart()
restart_main: setenv(VarName, VarValue, TRUE)
restart_main: Started restart_main at PID=10
restart_main: Started with argc=4
restart_main: argv[0]="ostest"
restart_main: argv[1]="This is argument 1"
restart_main: argv[2]="Argument 2 here"
restart_main: argv[3]="Lastly, the 3rd argument"
restart_main: Variable=VarName has value=VarValue
restart_main: I am still here
restart_main: I am still here
restart_main: Started restart_main at PID=10
restart_main: Started with argc=4
restart_main: argv[0]="ostest"
restart_main: argv[1]="This is argument 1"
restart_main: argv[2]="Argument 2 here"
restart_main: argv[3]="Lastly, the 3rd argument"
restart_main: Variable=VarName has value=VarValue
restart_main: Started with argc=4
restart_main: argv[0]="ostest"
restart_main: argv[1]="This is argument 1"
restart_main: argv[2]="Argument 2 here"
restart_main: argv[3]="Lastly, the 3rd argument"
restart_main: Variable=VarName has value=VarValue
restart_main: Exiting

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         5        3
mxordblk    67f40    67330
uordblks     b368     c780
fordblks    68bb8    677a0

user_main: waitpid test

Test waitpid()
waitpid_start_child: Started waitpid_main at PID=11
waitpid_main: PID 11 Started
waitpid_start_child: Started waitpid_main at PID=12
waitpid_main: PID 12 Started
waitpid_start_child: Started waitpid_main at PID=13
waitpid_main: PID 13 Started
waitpid_test: Waiting for PID=11 with waitpid()
waitpid_main: PID 11 exitting with result=14
waitpid_test: PID 11 waitpid succeeded with stat_loc=0e00
waitpid_last: Waiting for PID=13 with waitpid()
waitpid_main: PID 12 exitting with result=14
waitpid_main: PID 13 exitting with result=14
waitpid_last: PASS: PID 13 waitpid succeeded with stat_loc=0e00

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        5
mxordblk    67330    62b50
uordblks     c780    10790
fordblks    677a0    63790

user_main: mutex test
Initializing mutex
Starting thread 1
Starting thread 2
                Thread1 Thread2
        Loops   32      32
        Errors  0       0

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         5        3
mxordblk    62b50    68f18
uordblks    10790     ac60
fordblks    63790    692c0

user_main: timed mutex test
mutex_test: Initializing mutex
mutex_test: Starting thread
pthread:  Started
pthread:  Waiting for lock or timeout
mutex_test: Unlocking
pthread:  Got the lock
pthread:  Waiting for lock or timeout
pthread:  Got the timeout.  Terminating
mutex_test: PASSED

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    68f18    68f18
uordblks     ac60     ac60
fordblks    692c0    692c0

user_main: cancel test
cancel_test: Test 1a: Normal Cancellation
cancel_test: Starting thread
start_thread: Initializing mutex
start_thread: Initializing cond
start_thread: Starting thread
start_thread: Yielding
sem_waiter: Taking mutex
sem_waiter: Starting wait for condition
cancel_test: Canceling thread
cancel_test: Joining
cancel_test: waiter exited with result=0xffffffff
cancel_test: PASS thread terminated with PTHREAD_CANCELED
cancel_test: Test 2: Asynchronous Cancellation
... Skipped
cancel_test: Test 3: Cancellation of detached thread
cancel_test: Re-starting thread
restart_thread: Destroying cond
restart_thread: Destroying mutex
restart_thread: Re-starting thread
start_thread: Initializing mutex
start_thread: Initializing cond
start_thread: Starting thread
start_thread: Yielding
sem_waiter: Taking mutex
sem_waiter: Starting wait for condition
cancel_test: Canceling thread
cancel_test: Joining
cancel_test: PASS pthread_join failed with status=ESRCH
cancel_test: Test 5: Non-cancelable threads
cancel_test: Re-starting thread (non-cancelable)
restart_thread: Destroying cond
restart_thread: Destroying mutex
restart_thread: Re-starting thread
start_thread: Initializing mutex
start_thread: Initializing cond
start_thread: Starting thread
start_thread: Yielding
sem_waiter: Taking mutex
sem_waiter: Starting wait for condition
sem_waiter: Setting non-cancelable
cancel_test: Canceling thread
cancel_test: Joining
sem_waiter: Releasing mutex
sem_waiter: Setting cancelable
cancel_test: waiter exited with result=0xffffffff
cancel_test: PASS thread terminated with PTHREAD_CANCELED
cancel_test: Test 6: Cancel message queue wait
cancel_test: Starting thread (cancelable)
Skipped
cancel_test: Test 7: Cancel signal wait
cancel_test: Starting thread (cancelable)
Skipped

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    68f18    67718
uordblks     ac60     c460
fordblks    692c0    67ac0

user_main: robust test
robust_test: Initializing mutex
robust_test: Starting thread
robust_waiter: Taking mutex
robust_waiter: Exiting with mutex
robust_test: Take the lock again
robust_test: Make the mutex consistent again.
robust_test: Take the lock again
robust_test: Joining
robust_test: waiter exited with result=0
robust_test: Test complete with nerrors=0

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    67718    67718
uordblks     c460     c460
fordblks    67ac0    67ac0

user_main: semaphore test
sem_test: Initializing semaphore to 0
sem_test: Starting waiter thread 1
sem_test: Set thread 1 priority to 191
waiter_func: Thread 1 Started
sem_test: Starting waiter thread 2
waiter_func: Thread 1 initial semaphore value = 0
sem_test: Set thread 2 priority to 128
waiter_func: Thread 1 waiting on semaphore
waiter_func: Thread 2 Started
waiter_func: Thread 2 initial semaphore value = -1
waiter_func: Thread 2 waiting on semaphore
sem_test: Starting poster thread 3
sem_test: Set thread 3 priority to 64
poster_func: Thread 3 started
poster_func: Thread 3 semaphore value = -2
poster_func: Thread 3 posting semaphore
waiter_func: Thread 1 awakened
waiter_func: Thread 1 new semaphore value = -1
waiter_func: Thread 1 done
poster_func: Thread 3 new semaphore value = -1
poster_func: Thread 3 semaphore value = -1
poster_func: Thread 3 posting semaphore
waiter_func: Thread 2 awakened
poster_func: Thread 3 new semaphore value = 0
waiter_func: Thread 2 new semaphore value = 0
poster_func: Thread 3 done
waiter_func: Thread 2 done

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        4
mxordblk    67718    67f08
uordblks     c460     b530
fordblks    67ac0    689f0

user_main: timed semaphore test
semtimed_test: Initializing semaphore to 0
semtimed_test: Waiting for two second timeout
semtimed_test: PASS: first test returned timeout
BEFORE: (104 sec, 760000000 nsec)
AFTER:  (106 sec, 770000000 nsec)
semtimed_test: Starting poster thread
semtimed_test: Set thread 1 priority to 191
semtimed_test: Starting poster thread 3
semtimed_test: Set thread 3 priority to 64
semtimed_test: Waiting for two second timeout
poster_func: Waiting for 1 second
poster_func: Posting
semtimed_test: PASS: sem_timedwait succeeded
BEFORE: (106 sec, 770000000 nsec)
AFTER:  (107 sec, 780000000 nsec)

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         4        3
mxordblk    67f08    68f18
uordblks     b530     ac60
fordblks    689f0    692c0

user_main: condition variable test
cond_test: Initializing mutex
cond_test: Initializing cond
cond_test: Starting waiter
cond_test: Set thread 1 priority to 128
waiter_thread: Started
cond_test: Starting signaler
cond_test: Set thread 2 priority to 64
thread_signaler: Started
thread_signaler: Terminating
cond_test: signaler terminated, now cancel the waiter
cond_test:      Waiter  Signaler
cond_test: Loops        32      32
cond_test: Errors       0       0
cond_test:
cond_test: 0 times, waiter did not have to wait for data
cond_test: 0 times, data was already available when the signaler run
cond_test: 0 times, the waiter was in an unexpected state when the signaler ran

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        4
mxordblk    68f18    68710
uordblks     ac60     ac60
fordblks    692c0    692c0

user_main: pthread_exit() test
pthread_exit_test: Started pthread_exit_main at PID=41
pthread_exit_main 41: Starting pthread_exit_thread
pthread_exit_main 41: Sleeping for 5 seconds
pthread_exit_thread 42: Sleeping for 10 second
pthread_exit_thread 42: Still running...
pthread_exit_main 41: Calling pthread_exit()

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         4        3
mxordblk    68710    66f10
uordblks     ac60     cd30
fordblks    692c0    671f0

user_main: pthread_rwlock test
pthread_rwlock: Initializing rwlock
pthread_exit_thread 42: Exiting

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        4
mxordblk    66f10    66f10
uordblks     cd30     ac70
fordblks    671f0    692b0

user_main: pthread_rwlock_cancel test
pthread_rwlock_cancel: Starting test

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         4        3
mxordblk    66f10    69720
uordblks     ac70     a3a0
fordblks    692b0    69b80

user_main: timed wait test
thread_waiter: Initializing mutex
timedwait_test: Initializing cond
timedwait_test: Starting waiter
timedwait_test: Set thread 2 priority to 177
thread_waiter: Taking mutex
timedwait_test: Joining
thread_waiter: Starting 5 second wait for condition
thread_waiter: pthread_cond_timedwait timed out
thread_waiter: Releasing mutex
thread_waiter: Exit with status 0x12345678
timedwait_test: waiter exited with result=0x12345678

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    69720    68f18
uordblks     a3a0     ac70
fordblks    69b80    692b0

user_main: message queue test
mqueue_test: Starting receiver
mqueue_test: Set receiver priority to 128
receiver_thread: Starting
mqueue_test: Starting sender
mqueue_test: Set sender thread priority to 64
mqueue_test: Waiting for sender to complete
sender_thread: Starting
receiver_thread: mq_receive succeeded on msg 0
sender_thread: mq_send succeeded on msg 0
receiver_thread: mq_receive succeeded on msg 1
sender_thread: mq_send succeeded on msg 1
receiver_thread: mq_receive succeeded on msg 2
sender_thread: mq_send succeeded on msg 2
receiver_thread: mq_receive succeeded on msg 3
sender_thread: mq_send succeeded on msg 3
receiver_thread: mq_receive succeeded on msg 4
sender_thread: mq_send succeeded on msg 4
receiver_thread: mq_receive succeeded on msg 5
sender_thread: mq_send succeeded on msg 5
receiver_thread: mq_receive succeeded on msg 6
sender_thread: mq_send succeeded on msg 6
receiver_thread: mq_receive succeeded on msg 7
sender_thread: mq_send succeeded on msg 7
receiver_thread: mq_receive succeeded on msg 8
sender_thread: mq_send succeeded on msg 8
receiver_thread: mq_receive succeeded on msg 9
sender_thread: mq_send succeeded on msg 9
sender_thread: returning nerrors=0
mqueue_test: Killing receiver
receiver_thread: mq_receive interrupted!
receiver_thread: returning nerrors=0
mqueue_test: Canceling receiver
mqueue_test: receiver has already terminated

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        2
mxordblk    68f18    67718
uordblks     ac70     c4d0
fordblks    692b0    67a50

user_main: timed message queue test
timedmqueue_test: Starting sender
timedmqueue_test: Waiting for sender to complete
sender_thread: Starting
sender_thread: mq_timedsend succeeded on msg 0
sender_thread: mq_timedsend succeeded on msg 1
sender_thread: mq_timedsend succeeded on msg 2
sender_thread: mq_timedsend succeeded on msg 3
sender_thread: mq_timedsend succeeded on msg 4
sender_thread: mq_timedsend succeeded on msg 5
sender_thread: mq_timedsend succeeded on msg 6
sender_thread: mq_timedsend succeeded on msg 7
sender_thread: mq_timedsend succeeded on msg 8
sender_thread: mq_timedsend 9 timed out as expected
sender_thread: returning nerrors=0
timedmqueue_test: Starting receiver
timedmqueue_test: Waiting for receiver to complete
receiver_thread: Starting
receiver_thread: mq_timedreceive succeed on msg 0
receiver_thread: mq_timedreceive succeed on msg 1
receiver_thread: mq_timedreceive succeed on msg 2
receiver_thread: mq_timedreceive succeed on msg 3
receiver_thread: mq_timedreceive succeed on msg 4
receiver_thread: mq_timedreceive succeed on msg 5
receiver_thread: mq_timedreceive succeed on msg 6
receiver_thread: mq_timedreceive succeed on msg 7
receiver_thread: mq_timedreceive succeed on msg 8
receiver_thread: Receive 9 timed out as expected
receiver_thread: returning nerrors=0
timedmqueue_test: Test complete

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         2        2
mxordblk    67718    67718
uordblks     c4d0     c4d0
fordblks    67a50    67a50

user_main: sigprocmask test
sigprocmask_test: SUCCESS

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         2        2
mxordblk    67718    67718
uordblks     c4d0     c4d0
fordblks    67a50    67a50

user_main: signal handler test
sighand_test: Initializing semaphore to 0
sighand_test: Starting waiter task
sighand_test: Started waiter_main pid=61
waiter_main: Waiter started
waiter_main: Unmasking signal 32
waiter_main: Registering signal handler
waiter_main: oact.sigaction=0 oact.sa_flags=0 oact.sa_mask=0000000000000000
waiter_main: Waiting on semaphore
sighand_test: Signaling pid=61 with signo=32 sigvalue=42
waiter_main: sem_wait() successfully interrupted by signal
waiter_main: done
sighand_test: done

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         2        2
mxordblk    67718    67718
uordblks     c4d0     c500
fordblks    67a50    67a20

user_main: nested signal handler test
signest_test: Starting signal waiter task at priority 101
waiter_main: Waiter started
signest_test: Started waiter_main pid=62
waiter_main: Setting signal mask
signest_test: Starting interfering task at priority 102
waiter_main: Registering signal handler
interfere_main: Waiting on semaphore
waiter_main: Waiting on semaphore
signest_test: Started interfere_main pid=63
signest_test: Simple case:
  Total signalled 1240  Odd=620 Even=620
  Total handled   1240  Odd=620 Even=620
  Total nested    0    Odd=0   Even=0  
signest_test: With task locking
  Total signalled 2480  Odd=1240 Even=1240
  Total handled   2480  Odd=1240 Even=1240
  Total nested    0    Odd=0   Even=0  
signest_test: With intefering thread
  Total signalled 3720  Odd=1860 Even=1860
  Total handled   3720  Odd=1860 Even=1860
  Total nested    0    Odd=0   Even=0  
signest_test: done

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         2        3
mxordblk    67718    65200
uordblks     c500     cc60
fordblks    67a20    672c0

user_main: wdog test
wdog_test start...
wdtest_once 0 ns
wwwdddttteeesssttt___ooonnnccceee   000   nnnsss


wdtest_once 1 ns
wdtest_once 1 ns
wdtest_once 1 ns
wdtest_once 1 ns
wdtest_once 10 ns
wdtest_once 10 ns
wdtest_once 10 ns
wdtest_once 10 ns
wdtest_once 100 ns
wdtest_once 100 ns
wdtest_once 100 ns
wdtest_once 100 ns
wdtest_once 1000 ns
wdtest_once 1000 ns
wdtest_once 1000 ns
wdtest_once 1000 ns
wdtest_once 10000 ns
wdtest_once 10000 ns
wdtest_once 10000 ns
wdtest_once 10000 ns
wdtest_once 100000 ns
wdtest_once 100000 ns
wdtest_once 100000 ns
wdtest_once 100000 ns
wdtest_once 1000000 ns
wdtest_once 1000000 ns
wdtest_once 1000000 ns
wdtest_once 1000000 ns
wd_start with maximum delay, cancel OK, rest 2147483646
wdtest_recursive 1000000us
wd_start with maximum delay, cancel OK, rest 2147483646
wdtest_recursive 1000000us
wd_start with maximum delay, cancel OK, rest 2147483646
wdtest_recursive 1000000us
wd_start with maximum delay, cancel OK, rest 2147483646
wdtest_recursive 1000000us
recursive wdog triggered 6 times, elapsed tick 12
wdtest_recursive 10000000us
recursive wdog triggered 6 times, elapsed tick 12
wdtest_recursive 10000000us
recursive wdog triggered 6 times, elapsed tick 12
wdtest_recursive 10000000us
recursive wdog triggered 6 times, elapsed tick 12
wdtest_recursive 10000000us
recursive wdog triggered 6 times, elapsed tick 12
recursive wdog triggered 6 times, elapsed tick 12
recursive wdog triggered 6 times, elapsed tick 12
recursive wdog triggered 6 times, elapsed tick 12
wdog_test end...

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        3
mxordblk    65200    65200
uordblks     cc60     ced0
fordblks    672c0    67050

user_main: POSIX timer test
timer_test: Initializing semaphore to 0
timer_test: Unmasking signal 32
timer_test: Registering signal handler
timer_test: oact.sigaction=0x805526d oact.sa_flags=0 oact.sa_mask=aaaaaaaaaaaaaaaa
timer_test: Creating timer
timer_test: Starting timer
timer_test: Waiting on semaphore
timer_expiration: Received signal 32
timer_expiration: sival_int=42
timer_expiration: si_code=2 (SI_TIMER)
timer_expiration: ucontext=0
timer_test: sem_wait() successfully interrupted by signal
timer_test: g_nsigreceived=1
timer_test: Waiting on semaphore
timer_expiration: Received signal 32
timer_expiration: sival_int=42
timer_expiration: si_code=2 (SI_TIMER)
timer_expiration: ucontext=0
timer_test: sem_wait() successfully interrupted by signal
timer_test: g_nsigreceived=2
timer_test: Waiting on semaphore
timer_expiration: Received signal 32
timer_expiration: sival_int=42
timer_expiration: si_code=2 (SI_TIMER)
timer_expiration: ucontext=0
timer_test: sem_wait() successfully interrupted by signal
timer_test: g_nsigreceived=3
timer_test: Waiting on semaphore
timer_expiration: Received signal 32
timer_expiration: sival_int=42
timer_expiration: si_code=2 (SI_TIMER)
timer_expiration: ucontext=0
timer_test: sem_wait() successfully interrupted by signal
timer_test: g_nsigreceived=4
timer_test: Waiting on semaphore
timer_expiration: Received signal 32
timer_expiration: sival_int=42
timer_expiration: si_code=2 (SI_TIMER)
timer_expiration: ucontext=0
timer_test: sem_wait() successfully interrupted by signal
timer_test: g_nsigreceived=5
timer_test: Deleting timer
timer_test: done

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        4
mxordblk    65200    65200
uordblks     ced0     ceb0
fordblks    67050    67070

user_main: round-robin scheduler test
rr_test: Set thread priority to 1
rr_test: Set thread policy to SCHED_RR
rr_test: Starting first get_primes_thread
         First get_primes_thread: 75
rr_test: Starting second get_primes_thread
         Second get_primes_thread: 76
rr_test: Waiting for threads to complete -- this should take awhile
         If RR scheduling is working, they should start and complete at
         about the same time
get_primes_thread id=1 started, looking for primes < 10000, doing 10 run(s)
get_primes_thread id=2 started, looking for primes < 10000, doing 10 run(s)
get_primes_thread id=1 finished, found 1230 primes, last one was 9973
get_primes_thread id=2 finished, found 1230 primes, last one was 9973
rr_test: Done

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         4        4
mxordblk    65200    65200
uordblks     ceb0     bd10
fordblks    67070    68210

user_main: barrier test
barrier_test: Initializing barrier
barrier_test: Thread 0 created
barrier_func: Thread 0 started
barrier_test: Thread 1 created
barrier_func: Thread 1 started
barrier_test: Thread 2 created
barrier_func: Thread 2 started
barrier_test: Thread 3 created
barrier_func: Thread 3 started
barrier_test: Thread 4 created
barrier_func: Thread 4 started
barrier_test: Thread 5 created
barrier_func: Thread 5 started
barrier_test: Thread 6 created
barrier_func: Thread 6 started
barrier_test: Thread 7 created
barrier_func: Thread 7 started
barrier_func: Thread 0 calling pthread_barrier_wait()
barrier_func: Thread 1 calling pthread_barrier_wait()
barrier_func: Thread 2 calling pthread_barrier_wait()
barrier_func: Thread 3 calling pthread_barrier_wait()
barrier_func: Thread 4 calling pthread_barrier_wait()
barrier_func: Thread 5 calling pthread_barrier_wait()
barrier_func: Thread 6 calling pthread_barrier_wait()
barrier_func: Thread 7 calling pthread_barrier_wait()
barrier_func: Thread 7, back with status=PTHREAD_BARRIER_SERIAL_THREAD (I AM SPECIAL)
barrier_func: Thread 0, back with status=0 (I am not special)
barrier_func: Thread 1, back with status=0 (I am not special)
barrier_func: Thread 2, back with status=0 (I am not special)
barrier_func: Thread 3, back with status=0 (I am not special)
barrier_func: Thread 4, back with status=0 (I am not special)
barrier_func: Thread 5, back with status=0 (I am not special)
barrier_func: Thread 6, back with status=0 (I am not special)
barrier_func: Thread 7 done
barrier_func: Thread 0 done
barrier_func: Thread 1 done
barrier_test: Thread 0 completed with result=0
barrier_test: Thread 1 completed with result=0
barrier_func: Thread 2 done
barrier_func: Thread 3 done
barrier_test: Thread 2 completed with result=0
barrier_test: Thread 3 completed with result=0
barrier_func: Thread 4 done
barrier_func: Thread 5 done
barrier_test: Thread 4 completed with result=0
barrier_test: Thread 5 completed with result=0
barrier_func: Thread 6 done
barrier_test: Thread 6 completed with result=0
barrier_test: Thread 7 completed with result=0

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         4        6
mxordblk    65200    65200
uordblks     bd10     bd10
fordblks    68210    68210

user_main: scheduler lock test
sched_lock: Starting lowpri_thread at 97
sched_lock: Set lowpri_thread priority to 97
sched_lock: Starting highpri_thread at 98
sched_lock: Set highpri_thread priority to 98
sched_lock: Waiting...
sched_lock: PASSED No pre-emption occurred while scheduler was locked.
sched_lock: Starting lowpri_thread at 97
sched_lock: Set lowpri_thread priority to 97
sched_lock: Starting highpri_thread at 98
sched_lock: Set highpri_thread priority to 98
sched_lock: Waiting...
sched_lock: PASSED No pre-emption occurred while scheduler was locked.
sched_lock: Finished

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         6        6
mxordblk    65200    65200
uordblks     bd10     b440
fordblks    68210    68ae0

user_main: vfork() test
vfork_test: Child 98 ran successfully

Final memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       73f20    73f20
ordblks         3        6
mxordblk    69720    65200
uordblks     a3f0     cb78
fordblks    69b30    673a8
user_main: Exiting
ostest_main: Exiting with status 0
stdio_test: Standard I/O Check: fprintf to stderr

Here is the results of ostest when LAZYFPU is enabled:

tbd, currently crashes

Introduces LAZYFPU by enabling LSPEN and ASPEN for lazy stacking and
automatic state preservation. FPU registers are no longer stored in XCPT
registers (HW and SW) if LAZY FPU is enabled.

Signed-off-by: Matteo Golin <[email protected]>
@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Size: S The size of the change in this PR is small labels Feb 20, 2025
@nuttxpr
Copy link

nuttxpr commented Feb 20, 2025

[Experimental Bot, please feedback here]

This PR does not fully meet the NuttX requirements yet, specifically in the Testing section. While the testing logs before the change are provided, the logs after the change are missing, replaced with "tbd, currently crashes". This indicates the change introduced a critical bug and is not ready to be merged.

Here's a breakdown of what needs to be addressed:

  • Testing: The most crucial aspect is fixing the crash. The PR cannot be considered until the functionality works as intended. Once fixed, provide complete testing logs after the change. Ideally, these logs should demonstrate the expected improvement in context switching overhead. Consider including timings or performance metrics to quantify the impact of the change. The current logs only show memory usage, which is not directly relevant to demonstrating the benefits of LAZYFPU.
  • Impact: While the impact description mentions reduced context switching overhead, it lacks specifics. Quantify the performance improvement. For example, "Reduces context switching overhead by X% in the FPU test case". Also, the limited testing on STM32H7 only is acknowledged. While understandable, the PR should ideally strive for broader testing, or at least outline a plan for how compatibility with other ARMv7-M chips will be verified in the future.
  • Summary: The summary is good. However, specifying the magnitude of the context switching overhead reduction (once quantified through testing) would make it even stronger.

In short, this PR needs further work and testing before it can be merged. The primary focus should be on resolving the crash and then providing comprehensive testing data demonstrating the benefits of the LAZYFPU implementation.

/* Enable lazy stacking (LSPEN) and automatic state preservation (ASPEN).
*/

regval = getreg32(NVIC_FPCCR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use one arm_fpuconfig but add #ifdef/#else/#endif into the code fragment?

@acassis
Copy link
Contributor

acassis commented Feb 20, 2025

@linguini1 nice work!!! I was expecting it involved more assembly code, nice to know it is mostly register bits config.

@linguini1
Copy link
Contributor Author

@linguini1 nice work!!! I was expecting it involved more assembly code, nice to know it is mostly register bits config.

Oh it will, I believe I have to modify some context switching assembly. This is not complete yet.

@hartmannathan hartmannathan self-requested a review February 20, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: arm Issues related to ARM (32-bit) architecture Size: S The size of the change in this PR is small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants