📄Fill missing values in MS Excel data using Python for Excel
On various ocassions, we have data with missing values. This kind of data messes up our analysis or whatever we are working on. Even worse, if this data needs to be exported to a database such as SQL server or MS Access, it will result in a lot of errors. If a field in database is defined as number, and it is blank, it will result in errors during upload process. However, if it null values are allowed and gets uploaded, that is even far worse, because at a later stage, there will be multiple errors during calculations.
There are many ways we can fill the missing data. After introduction of Python for Excel, we can use Python to fill the missing data as well. Filling the missing data using Python is more effecient, specially if an advanced method is used where data is guessed and filled with the most appropriate guess.
Suppose we have a data [1,2,N/A,4,5] where `3` is missing. We can either use the dataframe.fillna(0), and fill the missing values with zeros. This will make sure that the data is properly uploaded into a database, and no errors are faced during any calculation. Missing data will have no impact, as it is filled with zeros. Our revised data will be [1,2,0,4,5]. You can fill missing values with any value you want using this function. Just replace `0` with that number.
Another method is to use dataframe.bfill(), which will fill all missing values, with the next value. In that case the revised data will be [1,2,4,4,5]. Or you can use dataframe.ffill(), which will fill all missing values, with the previous value. The revised data will be [1,2,2,4,5]. If the data is in sequence, the missing value should be closer to the one before or after the missing value, which serves the purpose in some cases.
You can also use the dataframe.interpolate() which will accurately guess the missing value in this case and the revised data will be [1,2,3,4,5]. So using python you can use any number of methods for data cleansing and filling up the missing values.