ERP Core N400 in MNE-Python: part III

In the last post on adapting ERP CORE’s N400 pipeline to MNE-Python, we did the ICA-related steps to correct blinking and horizontal eye-movement artfacts from the signal. Here we do some final steps in cleaning the data: interpolation of bad channels, epoching and rejection of epochs with possible artifacts. These steps are decipted in the Elist_Bin_Epoch script of ERP CORE resources (script 5).

Contrary to previous posts, I won’t make the whole process in a single participant and show the loop for all participants at the end. That is because the result for some af the steps are saved as intermediate stages of the preprocessing pipeline. I should also tell you up-front that I wasn’t able to find implemented equivalents for some of the procedures ERP CORE uses to define artifacts. That forced me to implement a python version of what they do. Therefore, this post is a little longer the the other ones, so buckle up! On the bright side, at the end we will have the data ready to start looking for those N400 ERPs.


Our first step here is to interpolate bad channels. Bad channels for each participant are listed in the file “Interpolate_Channels_N400.xlsx” provided by ERP CORE. Below, I load the information in that file, make a list of bad channels for each participant and add that list to their['bads'].

Load the information:

import os
os.chdir('D:\\EEGdata\\Erp CORE\\N400')

import mne
# import some methods directly so we can call them by name
from import read_raw
from mne.epochs import read_epochs

import numpy as np
import pandas as pd

%matplotlib qt
%gui qt
# load as DataFrame
interpolate_path = "D:\\EEGdata\\Erp CORE\\N400\\EEG_ERP_Processing\\Interpolate_Channels_N400.xlsx"
chans_interpolate = pd.read_excel(interpolate_path)
# Turn strings to a list of strings for each participant
chans_interpolate = chans_interpolate.loc[:,"Name of Interpolated Channel"].str.split(" ")

Now I’ll deal with some nuisences in the file: there is an asterisk at the beginning of the channel list for each participant, and Fp channels are in upper-case. We remove the asterisk and correct the captalization.

# make function to remove asterisk and correct 
def remove_star(item):
    # if item is a list
    if isinstance(item, list):
        # if list is longer than 1
        if len(item) > 1:
            # remove asterisk
            clean_names = [item[0].lstrip("*")]
            # add channel to a list clean_names
            # look for Fp channels and correct captalization
            for n, elec in enumerate(clean_names):
                if elec =='FP1' or elec =='FP2':
                    clean_names[n] = elec.capitalize() 
            clean_names = [item[0].lstrip("*")]
            if clean_names[0] =='FP1' or clean_names[0] =='FP2':
                clean_names[0] = clean_names[0].capitalize()
        clean_names = np.nan
    return clean_names
# apply function 
chans_interpolate = chans_interpolate.apply(remove_star)
0           [F8]
4          [Fp2]
5     [FC4, PO4]
10          [P7]
16          [F8]
Name: Name of Interpolated Channel, dtype: object

We have already indentified the channels to be interpolated, however if we try to do the interpolation now MNE will throw us an error because we have more EEG channels than mapped on the 3D digitation. The channels HEOG_left, HEOG_right and VEOG_lower are currently identified as EEG channels, but they are actually extra channels we used to make the proper HEOG and VEOG channels. We will drop them, since they already served their purpose.

Let’s do it for all participants and save an intermediate (“interpol”) raw file:

for subj in range(1,41):
    raw_correct_path = "D:\\EEGdata\\Erp CORE\\N400\\mne\\raw_corrected\\N400_ERP_CORE_{}_correct-raw.fif".format(subj)
    raw_correct = read_raw(raw_correct_path, preload = True)
    #only do interpolation if there is at least one channel to interpolate
    if  not chans_interpolate.isna()[subj-1]:['bads'].extend(chans_interpolate[subj-1])
        # drop extra channels
        raw_correct.drop_channels(['HEOG_left', 'HEOG_right', 'VEOG_lower'])
        raw_correct.drop_channels(['HEOG_left', 'HEOG_right', 'VEOG_lower'])
    interpol_path = "D:\\EEGdata\\Erp CORE\\N400\\mne\\raw_interpol\\N400_ERP_CORE_{}_interpol-raw.fif".format(subj), overwrite=True)

Event list for epochs

Now let’s make our event list and event_ids for epoching.

As we did at the back at beginning of preprocessing (in this post), we will get the events from the annotations in the data. Until now, we have two types of annotation descriptions in the data: “BAD_seg” and numbers corresponding to triggers used in the experiment. The ERP CORE file “N400_Event_Code_Scheme.xlsx” has the meaning for each code. For instance, the code 121 refers to a Priming stimuli which is unrelated to the target stimulus and comes from the second list of words; while the code 201 means that the participant gave a correct answer.

We are going to change these codes to a descriptive account of each event. A nice feature of MNE is that we can make hierarchical ids. We will follow the hierarchy shown in the N400_Event_Code_Scheme file: Word Type > Relatedness > Word List.

First, we will create a dictionary with the correspondence between codes and events to properly identify them later. We will be able to use these ids to select epochs afterwards.

# load data
subj = 1
interpol_path = "D:\\EEGdata\\Erp CORE\\N400\\mne\\raw_interpol\\N400_ERP_CORE_{}_interpol-raw.fif".format(subj)
raw_interpol = read_raw(interpol_path)
# make dict with correspondences between codes and hierarchical ids
description_dict = {'111' : 'Prime/Related/L1',
             '112' : 'Prime/Related/L2',
             '121' : 'Prime/Unrelated/L1',
             '122' : 'Prime/Unrelated/L2',
             '211' : 'Target/Related/L1',
             '212' : 'Target/Related/L2',
             '221' : 'Target/Unrelated/L1',
             '222' : 'Target/Unrelated/L2',
             '201' : 'Hit',
             '202' : 'Miss',
             'BAD_seg': 'BAD_seg'      

We will change the description list with some Pandas functionality. It is beyond this tutorial to teach Pandas’ skills, but it is nice for you to know that, although most of the backbone of MNE is in Numpy, it also allows exportation to Pandas’ data structures.

In summary, what we do in the one-liner below is make a copy of the annotations descriptions while transforming it to a Pandas series, then we make the substituitions according to our description dictionary and turn it back to a numpy array.

raw_interpol.annotations.description = pd.Series(raw_interpol.annotations.description).map(description_dict).to_numpy()
# substitute annotations description

Now we are almost ready to make the events numpy array we will need to epoch the data. We will use the events_from_annotations function. One last step before that: create a dictionary with correspondences between hierarchical ids and trigger numbers.

You may be wondering why we would do that when we just made the reverse dictionary on our last step. The thing is that MNE’s event list is a numeric array composed of columns instant, blank column and event code, and the default behavior of events_from_annotations is to assign the event codes as [0,1,2,3…]. However, I want to keep the original codes from ERP CORE, so I need to use the events_id input with a dict showing the correspondences. I should say that for the rest of our MNE operations the numeric codes themselves are actually rather irrelevant, what really matters are the correspondences in event_ids. I am just being picky to keep as close as possible to the original ERP CORE material.

event_ids = {'Prime/Related/L1': 111,
             'Prime/Related/L2': 112,
             'Prime/Unrelated/L1' : 121,
             'Prime/Unrelated/L2' : 122,
             'Target/Related/L1' : 211,
             'Target/Related/L2' : 212,
             'Target/Unrelated/L1' : 221,
             'Target/Unrelated/L2' : 222,
             'Hit' : 201,
             'Miss' : 202}
events, event_ids = mne.events_from_annotations(raw_interpol, event_ids)

So, our events look like this (instant, blank column, event code):

array([[  3058,      0,    202],
       [  4145,      0,    202],
       [  5195,      0,    112],
       [144827,      0,    121],
       [145156,      0,    221],
       [145325,      0,    201]])

Epoching and artfact rejection

The ERP CORE pipeline is very throrough in its epoch dropping due to ocular artifact detection. There are different thresholds for epoch dropping for: (1) EEG channels, (2) the VEOG channel and (3 and 4) for both variations of the HEOG channel (with and without ICA applied). In MNE, we can set epoch rejection thresholds for each channel type by setting the reject input when we create the epochs.

However, as far as I understand from the documentation, MNE uses a simple peak-to-peak algorithm to do its automatic rejection. ERP CORE pipeline uses such an algorithm only for rejection on the scalp EEG channels. For VEOG and uncorrected HEOG channels it uses a windowed peak-to-peak algorithm and for ICA corrected HEOG it uses a step algorithm (both as implemented in the ERP LAB toolbox).

Since ERP CORE gives us the parameters, and the algorithms themselves are quite straightforward, I have implemented them below. I make use of the nifty feature of MNE that allows us to loop through epochs. On each epoch I run the algorithms on the corresponding channels and, if some amplitude is above the threshold, it is marked as bad with a MW_Reject annotation. The value compared to the treshold on each algorithm are the following:

  • Step algorithm: $\bmod (mean(window1)-mean(window2))$
  • Peak-to-peak algorithm: $\bmod (max(window1)-min(window2))$

Below, we (1) load the the information we need for the artfact rejection procedure, (2) create the dictionaries we need for marking the events in the experiment (as shown above); then, in the loop, we (3) add the annotations for epoching, (4) do the epoching with mne.Epochs(), (5) run the artfact rejection algorithm and (6) save the epochs.

#get eog channels numbers
subj = 1
interpol_path = "D:\\EEGdata\\Erp CORE\\N400\\mne\\raw_interpol\\N400_ERP_CORE_{}_interpol-raw.fif".format(subj)
raw_interpol = read_raw(interpol_path)
eog_nums = [np.where(np.array(raw_interpol.ch_names) == 'VEOG')[0].item(0),
            np.where(np.array(raw_interpol.ch_names) == 'HEOG')[0].item(0),
            np.where(np.array(raw_interpol.ch_names) == 'HEOG_ICA')[0].item(0)]

#get veog moving windows parameters from files
veog_path = "D:\\EEGdata\\Erp CORE\\N400\\EEG_ERP_Processing\\AR_Parameters_for_MW_CRAP_N400.xlsx"
veog_mw_params = pd.read_excel(veog_path)
#convert time columns to seconds
veog_mw_params.iloc[:,3:] = veog_mw_params.iloc[:,3:].apply(lambda x: x*1e-3)
#convert threshold to volts
veog_mw_params.Threshold = veog_mw_params.Threshold*1e-4

# the same for non ICA corrected heog moving windows
heog_path = "D:\\EEGdata\\Erp CORE\\N400\\EEG_ERP_Processing\\AR_Parameters_for_MW_Blinks_N400.xlsx"
heog_mw_params = pd.read_excel(heog_path)
heog_mw_params.iloc[:,3:] = heog_mw_params.iloc[:,3:].apply(lambda x: x*1e-3)
heog_mw_params.Threshold = heog_mw_params.Threshold*1e-4

#... and for ICA corrected HEOG moving windows
heog_ica_path = "D:\\EEGdata\\Erp CORE\\N400\\EEG_ERP_Processing\\AR_Parameters_for_SL_HEOG_N400.xlsx"
heog_ica_mw_params = pd.read_excel(heog_ica_path)
heog_ica_mw_params.iloc[:,3:] = heog_ica_mw_params.iloc[:,3:].apply(lambda x: x*1e-3)
heog_ica_mw_params.Threshold = heog_ica_mw_params.Threshold*1e-4

#put moving windows dfs in a list
mw_params = [veog_mw_params, heog_mw_params, heog_ica_mw_params]

# make dict with correspondences between codes and hierarchical ids
description_dict = {'111' : 'Prime/Related/L1',
                    '112' : 'Prime/Related/L2',
                    '121' : 'Prime/Unrelated/L1',
                    '122' : 'Prime/Unrelated/L2',
                    '211' : 'Target/Related/L1',
                    '212' : 'Target/Related/L2',
                    '221' : 'Target/Unrelated/L1',
                    '222' : 'Target/Unrelated/L2',
                    '201' : 'Hit',
                    '202' : 'Miss',
                    'BAD_seg': 'BAD_seg'      

 #create correspondence between description and trigger number
event_ids = {'Prime/Related/L1': 111,
             'Prime/Related/L2': 112,
             'Prime/Unrelated/L1' : 121,
             'Prime/Unrelated/L2' : 122,
             'Target/Related/L1' : 211,
             'Target/Related/L2' : 212,
             'Target/Unrelated/L1' : 221,
             'Target/Unrelated/L2' : 222,
             'Hit' : 201,
             'Miss' : 202}

# loop
for subj in range(1,41):
    interpol_path = "D:\\EEGdata\\Erp CORE\\N400\\mne\\raw_interpol\\N400_ERP_CORE_{}_interpol-raw.fif".format(subj)
    raw_interpol = read_raw(interpol_path)

    #add event descriptions to annotations
    raw_interpol.annotations.description = pd.Series(raw_interpol.annotations.description).map(description_dict).to_numpy()
    #create events and event ids
    events, event_ids = mne.events_from_annotations(raw_interpol, event_ids)
    #change heog channel types
    raw_interpol.set_channel_types({'HEOG_ICA':'emg', 'HEOG':'misc'})
    epochs = mne.Epochs(raw_interpol, 
                        baseline = (-.2,0),
                        tmax = .8,
                        reject = dict(eeg  = 200e-6)
    epos_to_drop = []
    # run through every epoch
    for i in range(len(
        # for each channel of interest
        for eog_chan_num, mw_df in zip(eog_nums, mw_params): #
            # get moving windows parameters
                w_size = mw_df['Window Size'].iloc[subj-1]
                w_start = mw_df['Time Window Minimum'].iloc[subj-1].round(2)
                w_start_last = np.round(mw_df['Time Window Maximum'].iloc[subj-1] - mw_df['Window Size'].iloc[subj-1],2)
                w_stop = mw_df['Time Window Maximum'].iloc[subj-1].round(2)
                w_stop_first = np.round(w_start + w_size,2)
               # create an array with starting and stopping times of every window
                mw_starts = np.arange(w_start,
                mw_stops = np.arange(w_stop_first,
        # only run if channel is not empty, e.g. already marked as bad
            if epochs[i].get_data()[:,eog_chan_num,:].shape[0] != 0:
                #get channel data
                chan_data = epochs[i].get_data()[:,eog_chan_num,:].flatten()
                #make list with data for each window
                mws = [ chan_data[epochs.time_as_index(start).item(0) : epochs.time_as_index(stop).item(0)] for start, stop in zip(mw_starts, mw_stops)]

               # if channel is veog or uncorrected heog do peak-to-peak artfact detection
                if eog_chan_num < 32:
                    mws = pd.DataFrame(mws).T
                    peak2peak = mws.apply(lambda x: np.abs(x.max() - x.min()))
                    if any(peak2peak > mw_df['Threshold'].iloc[subj-1]):
                # if channel is heog_ica do step artfact detection
                    steps = [True for i in range(len(mws)-1) if np.abs(mws[i].mean()-mws[i+1].mean()) > mw_df['Threshold'].iloc[subj-1]] 
                    if steps:
    # if any epoch was set as bad add it to drop list with label MW Reject
    if epos_to_drop: 
    #Save epochs                        
    epochs_path = "D:\\EEGdata\\Erp CORE\\N400\\mne\\epochs\\N400_ERP_CORE_{}-epo.fif".format(subj)
    print('Subject {}'.format(subj)) # line to keep up with analysis as it runs, overwrite=True)

Lastly, let’s check the percentage of dropped epochs for each participant.

percent_dropped = pd.DataFrame(data = {'Subj': np.repeat(np.nan, 40),
                                       'Dropped': np.repeat(np.nan, 40)}) 
for subj in range(1,41):
    epochs_path = "D:\\EEGdata\\Erp CORE\\N400\\mne\\epochs\\N400_ERP_CORE_{}-epo.fif".format(subj)
    epochs = read_epochs(epochs_path)

    percent_dropped['Subj'].iloc[subj-1] = subj
    percent_dropped['Dropped'].iloc[subj-1] = np.round(epochs.drop_log_stats(),2)
count    40.00
mean     13.14
std      18.53
min       3.88
25%       4.97
50%       5.80
75%       9.51
max      85.64
Name: Dropped, dtype: float64

Dropped epochs per participant range from roughly 4% to 85%. Here, we will set consider participants with more than 30% of epochs dropped as participants with too many artfacts to be used. The ones that do not pass this filter are below:

bad_subjs = percent_dropped.loc[percent_dropped.Dropped >30].Subj

We have 5 unsuable participants. This is different from ERP CORE’s original preprocessing results, where they lose a single participant. However, I wasn’t able to exactly reproduce the ICA step in the pipeline, so that is probably where we are falling a little behind. Anyway, 35 participants are more than enough for us to grasp the N400 effects in the following analyses. So, enough preprocesing, let’s get to those ERPs in the next post!

Let’s finish contemplating some hard earned epochs:

epochs.plot(n_epochs = 4)


Dropped 0 epochs: 
Channels marked as bad: []
Mateus Silvestrin
Mateus Silvestrin
PhD candidate

PhD candidate at the Federal University of ABC. Works in the Timing Lab e colaborates with the Developmental, Affective and Social Neuroscience Lab.