Importing data into Python
In this post, we will learn:
- How to import data into python
- How to import time series data
- How to handle different time series formats while importing
A) Importing Normal Data
Suppose you have a data file saved in csv format on your computer. How to import this into Python? I saved this following data set on my computer under the name: datasheet.csv.
Open Jupyter notebook. Open a new python 3 notebook. import pandas as pd. Get the file path of our data file. One easy way is to right click on the csv file and click properties, from there copy-paste the location into Python.
Add the filename at the end (datasheet.csv in this case). Change all the backslashes ‘\’ to forward slashes ‘/’. Provide a name under which the imported data set is stored in python (I used the name: newdata).
To know that data has been imported, use newdata.head() to display the first five observations. As you can see the first column (before Age column) is the index column (0, 1, 2, 3…).
B) Importing Time Series data
1) When ‘time’ data is in single column in mm-dd-yyyy format
If I have the time series data file in csv format, how to import it?
Use the following code:
index_col = 0 means treat the first column as the index.
To check whether the data type is datetime, you may use the following code. As you can see, data type has been correctly read as dtype=’datetime64[ns]’.
2) When ‘time’ data is in single column in dd-mm-yyyy format
If the time series is in dd-mm-yyyy format, then use the dayfirst = True option.
3) When ‘time’ data is in multiple columns
What if our time column is in separate columns: Date in one column, month in column and year in other?
Provide column name numbers and parse_dates will combine the columns.
Summary
Now we learnt how to import different types of data into python.
Do you have any questions? I will try to answer them to the best of my abilities.