Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference Result #13

Open
xiaoyaolovlife opened this issue Feb 5, 2025 · 9 comments
Open

Inference Result #13

xiaoyaolovlife opened this issue Feb 5, 2025 · 9 comments

Comments

@xiaoyaolovlife
Copy link

Great work! However, I encountered an issue: when I test your provided example data using the command
python Test.py --test_ts=0.5 --model_path=./PreTrained/EVDI-GoPro-Color.pth --test_path=./Database/GoPro-Color/ --save_path=./Result/EVDI-GoPro-Color/ --color_flag=1,
frame interpolation and deblurring work normally. But when I run it on the downloaded dataset, the results become strange. For the same scene, the deblurring effect is barely noticeable. I noticed that the resolutions of the two datasets differ. After adjusting the resolution, the deblurring improved, but the image colors became distorted. Could you please help me identify where the problem might be? Thank you!

example.png

Image

go-pro-test.png

Image

go-pro-test-change-resolution.png

Image

code
`class test_dataset(Dataset):
def init(self, data_path, num_bins, target_ts):
'''
Parameters
----------
data_path : str
path of target data.
num_bins : int
the number of bins in event frame.
target_ts : float
target reconstruction timestamps, normalized to [0,1].

    '''
    self.data_path = data_path
    self.data_list = util.get_filename(self.data_path, '.npz')
    self.data_len = len(self.data_list)
    self.num_bins = num_bins
    self.target_ts = target_ts
    self.roi_size = (160, 320)

def __len__(self):
    return self.data_len

def __getitem1__(self,ind):
    '''
    Parameters
    ----------
    ind : data index.

    Returns
    -------
    leftB : left blurry image.
    rightB : left blurry image.
    leftB_inp1 : first event segment for leftB.
    leftB_inp2 : second event segment for leftB.
    leftB_w1 : weight for first event segment (related to leftB).
    leftB_w2 : weight for second event segment (related to leftB).
    rightB_inp1 : first event segment for rightB.
    rightB_inp2 : second event segment for rightB.
    rightB_w1 : weight for first event segment (related to rightB).
    rightB_w2 : weight for second event segment (related to rightB).
    leftB_coef : coefficient for L^i_(i+1), i.e., \omega in paper.
    rightB_coef : coefficient for L^i_(i+1), i.e., 1-\omega in paper.
    save_prefix : prefix of image name.

    '''
    
    ## load data
    data = np.load(self.data_path + self.data_list[ind], allow_pickle=True)
    events = data['events'].item()
    leftB = data['blur1']
    exp_start_leftB = data['exp_start1']
    exp_end_leftB = data['exp_end1']
    span_leftB = (exp_start_leftB, exp_end_leftB)
    
    rightB = data['blur2']
    exp_start_rightB = data['exp_start2']
    exp_end_rightB = data['exp_end2']
    span_rightB = (exp_start_rightB, exp_end_rightB)
    
    img_size = leftB.shape[-2:]
    total_span = (exp_start_leftB, exp_end_rightB)
    
    ## generate target timestamps
    time_span = exp_end_rightB - exp_start_leftB
    ts = exp_start_leftB + time_span * self.target_ts # [0,1]

    ## for leftB
    leftB_inp1, leftB_inp2, leftB_w1, leftB_w2 = util.event2frame(events, img_size, ts, span_leftB, total_span, self.num_bins, 0, (0,0))
    leftB_inp1 = util.fold_time_dim(leftB_inp1)
    leftB_inp2 = util.fold_time_dim(leftB_inp2)
    
    ## for rightB
    rightB_inp1, rightB_inp2, rightB_w1, rightB_w2 = util.event2frame(events, img_size, ts, span_rightB, total_span, self.num_bins, 0, (0,0))
    rightB_inp1 = util.fold_time_dim(rightB_inp1)
    rightB_inp2 = util.fold_time_dim(rightB_inp2)
    
    ## recon fusion weight 
    leftB_coef, rightB_coef = adaptive_wei(ts,span_leftB,span_rightB)
    
    save_prefix = self.data_list[ind][:-4]
    
    return leftB_inp1,leftB_inp2,leftB,np.array(leftB_w1),np.array(leftB_w2), \
        rightB_inp1,rightB_inp2,rightB,np.array(rightB_w1),np.array(rightB_w2), \
             np.array(leftB_coef), np.array(rightB_coef), save_prefix

def __getitem__(self,ind):        
    ## load data
    data = np.load(self.data_path + self.data_list[ind], allow_pickle=True)
    events = data['events'].item()
    leftB = data['blur1']
    exp_start_leftB = data['exp_start1']
    exp_end_leftB = data['exp_end1']
    span_leftB = (exp_start_leftB, exp_end_leftB)
    
    rightB = data['blur2']
    exp_start_rightB = data['exp_start2']
    exp_end_rightB = data['exp_end2']
    span_rightB = (exp_start_rightB, exp_end_rightB)
    
    img_size = leftB.shape[-2:]
    total_span = (exp_start_leftB, exp_end_rightB)

    roiTL = (np.random.randint(0, img_size[0]-self.roi_size[0]+1), np.random.randint(0, img_size[1]-self.roi_size[1]+1)) # top-left coordinate
    leftB = leftB[:,roiTL[0]:roiTL[0]+self.roi_size[0], roiTL[1]:roiTL[1]+self.roi_size[1]]
    rightB = rightB[:,roiTL[0]:roiTL[0]+self.roi_size[0], roiTL[1]:roiTL[1]+self.roi_size[1]]
    img_size = leftB.shape[-2:]
    ## generate target timestamps
    time_span = exp_end_rightB - exp_start_leftB
    ts = exp_start_leftB + time_span * self.target_ts # [0,1]

    ## for leftB
    leftB_inp1, leftB_inp2, leftB_w1, leftB_w2 = util.event2frame(events, img_size, ts, span_leftB, total_span, self.num_bins, 0, (0,0))
    leftB_inp1 = util.fold_time_dim(leftB_inp1)
    leftB_inp2 = util.fold_time_dim(leftB_inp2)
    
    ## for rightB
    rightB_inp1, rightB_inp2, rightB_w1, rightB_w2 = util.event2frame(events, img_size, ts, span_rightB, total_span, self.num_bins, 0, (0,0))
    rightB_inp1 = util.fold_time_dim(rightB_inp1)
    rightB_inp2 = util.fold_time_dim(rightB_inp2)
    
    ## recon fusion weight 
    leftB_coef, rightB_coef = adaptive_wei(ts,span_leftB,span_rightB)
    
    save_prefix = self.data_list[ind][:-4]
    
    return leftB_inp1,leftB_inp2,leftB,np.array(leftB_w1),np.array(leftB_w2), \
        rightB_inp1,rightB_inp2,rightB,np.array(rightB_w1),np.array(rightB_w2), \
             np.array(leftB_coef), np.array(rightB_coef), save_prefix`
@XiangZ-0
Copy link
Owner

XiangZ-0 commented Feb 6, 2025

Hi, thank you for your interest in our work!
By "the downloaded dataset", could you let me know which dataset you used for the test (is it Ev-REDS from GEM)? And which method did you use to change the resolution for the "go-pro-test-change-resolution.png" example? If possible, could you share the npz files for the "go-pro-test.png" and "go-pro-test-change-resolution.png" examples so that I can test them on my side? Thanks :)

@xiaoyaolovlife
Copy link
Author

Thank you for your reply!
the code I've changed is

Image

the data I've used is
链接: https://pan.baidu.com/s/1AxFuIaJHBnekr-2Kr_uZ_g?pwd=5fb3 提取码: 5fb3

@XiangZ-0
Copy link
Owner

XiangZ-0 commented Feb 7, 2025

Thanks a lot for sharing! I think the problem is caused by the resolution change code in the red box. The current resolution change code randomly crops the high-resolution image to 160x320, resulting in a different field of view from the events. The correct way is to downscale the high-resolution image to 160x320 with image downsampling techniques like bicubic interpolation. For example, you can replace the red box part with the following:

leftB = cv2.resize(leftB.transpose(1,2,0), (320, 160), interpolation=cv2.INTER_CUBIC).transpose(2,0,1) # downsacle leftB
rightB = cv2.resize(rightB.transpose(1,2,0), (320, 160), interpolation=cv2.INTER_CUBIC).transpose(2,0,1) # downscale rightB

You might also need to add one line import cv2 to import the cv2 package here. Then it should work. :)
Hope this helps!

@xiaoyaolovlife
Copy link
Author

Yeah! It works! Thank you for your help!

@xiaoyaolovlife
Copy link
Author

By the way,this means that directly inputting images at their original resolution into the model may not yield good results, possibly because the pre-trained model does not generalize well across different resolutions. I'm not sure if I understand this correctly. I've seen your GEM work on resolution generalization, which is excellent, but it seems to only handle deblurring and not frame interpolation. I'm curious if there are any plans to release an advanced version of this work (evdi++). I'm really looking forward to it!

@XiangZ-0
Copy link
Owner

XiangZ-0 commented Feb 8, 2025

Glad to hear that works :)
Yes your understanding is correct. GEM is designed only for deblurring, but it might be possible to combine the key ideas of EVDI and GEM to achieve your goal. We are currently working on EVDI++ and might take this aspect into account. Thanks a lot for the advice!

@xiaoyaolovlife
Copy link
Author

Another question I'd like to ask is: Is it necessary to have both left and right view blurry images during the inference process? I noticed that using two left-view blurry images can also achieve deblurring.

@xiaoyaolovlife
Copy link
Author

But the result has a difference
left and right

Image
only left

Image

@XiangZ-0
Copy link
Owner

Nice observation! Yes, it’s not mandatory to use left and right images as inputs. In principle you can use arbitrary two images as inputs because the final result is fused in a hand-crafted manner. But we found that using both left and right images achieves the overall best performance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants