Pandas Library in Python
Python is one of the data processing tools that is general purpose or can be used flexibly to complete a lot of work. Starting from data processing, making Machine Learning and models, deploying models, creating web and mobile-based applications, and much more. Of course, to be able to get a lot of work done, Python is supported by many available libraries.

Pandas Library in Python
Although in Python there are many libraries that can be used, the libraries that are commonly used in solving Data Science problems can be counted on the fingers. One such library is Pandas. Pandas is open source which means that it can be used by anyone freely and for free, where this library can be used to provide various data structures and manipulate data. The Pandas library is basically built on top of the NumPy library.
let’s see the discussion….
1. History of Pandas
Initially, Pandas was developed for the first time in 2008 by Wes McKinney. At that time he worked at AQR Capital Management. He’s trying to convince AQR to let him make Pandas open source. In 2012, another AQR employee, namely Chang She, joined as the main contributor to these two libraries. Pandas continues to be developed to answer user needs. Over time, many versions of Pandas have been released. Until now the latest version of Pandas is 1.4.1.
2. Advantages of Pandas
Pandas is one of the libraries that is still used today, even this library can be called a basic library so that it will continue to be used in the data processing process. But it also does not escape the advantages that Pandas has. Some of these advantages are:
- Fast and efficient in the process of data manipulation and analysis.
- Can load data originating from different file objects.
- Easy handling of missing data (represented as NaN) in both floating point and non-floating point data.
- Easily resize data, where columns can be inserted and removed from DataFrames and higher dimensional objects.
- Can be used to join and merge datasets.
- Able to do reshaping and pivoting datasets
- Provides time series functionality.
- Powerful group based functionality to perform split-apply-combine operations on data sets.
3. Pandas Relationship and Data Science
Pandas is one of the libraries that can be used to complete Data Science work. But why do you think libraries whose function is only for data manipulation are so important in Data Science? This is because Pandas will be used in conjunction with other libraries that are closely related to Data Science. In addition, Pandas itself is built on top of the Numpy library, so many NumPy structures are also used and replicated in Pandas.
The data generated by Pandas is often used as input for planning visualizations in Matplotlib functions, statistical analysis in SciPy, and Machine Learning algorithms in Scikit-learn. While Pandas can be run in a variety of text editors, it is better to run it using Jupyter Notebook because Jupyter is given the ability to execute code in specific cells instead of executing the entire file. Jupyter also provides an easy way to visualize Pandas data frames and plots.
4. Start Using Pandas
The very first step to using Pandas is that we have to make sure whether this library is installed and stored in the Python folder or not. If it’s not already installed, we can install it using the pip command. Type the command cmd in the search box and locate the folder using the cd command where the python-pip file has been installed. After finding it, type the command:
pip install pandas
After successfully installed on the system, then to work with Pandas we have to import the library to call it.
import pandas as pd
Pandas generally provides two data structures for manipulating data, namely:
- Series, is a labeled one-dimensional array where this array can hold any type of data (integer, string, float, python object, etc.).
- DataFrame, is a tabular data structure in two dimensions that can change size and is potentially heterogeneous with axis labels (rows and columns)
更多推荐

所有评论(0)