banner



How To Find Oak Trees On Google Earth

Panda
You tin can download the Jupyter notebook of this tutorial hither.


In this blog postal service, I will show you lot how to select subsets of information in Pandas using [ ], .loc, .iloc, .at, and .iat.  I volition exist using the vino quality dataset hosted on the UCI website. This information record 11 chemic properties (such as the concentrations of carbohydrate, citric acid, booze, pH, etc.) of thousands of red and white wines from northern Portugal, equally well as the quality of the wines, recorded on a scale from 1 to 10. We will merely await at the information for red wine.

First, I import the Pandas library, and read the dataset into a DataFrame.

import_pandas_1

Here are the get-go five rows of the DataFrame:

wine_df.head()

Pandas dataframe head

I rename the columns to brand it easier for me telephone call the column names for future operations.

wine_df.columns = ['fixed_acidity', 'volatile_acidity', 'citric_acid', 'residual_sugar', 'chlorides', 'free_sulfur_dioxide', 'total_sulfur_dioxide','density','pH','sulphates', 'booze', 'quality' ]

Unlike ways to select columns


Selecting a single column

To select the first column 'fixed_acidity', you can laissez passer the column name as a string to the indexing operator.

You tin perform the same chore using the dot operator.

Selecting multiple columns

To select multiple columns, y'all tin can pass a list of column names to the indexing operator.

wine_four = wine_df[['fixed_acidity', 'volatile_acidity','citric_acid', 'residual_sugar']]

Alternatively, you lot can assign all your columns to a list variable and laissez passer that variable to the indexing operator.

cols = ['fixed_acidity', 'volatile_acidity','citric_acid', 'residual_sugar']
wine_list_four = wine_four[cols]

Selecting columns using "select_dtypes" and "filter" methods

To select columns usingselect_dtypes method, you should first notice out the number of columns for each data types.

selecting columns using dtypes

In this instance, there are xi columns that are float and one cavalcade that is an integer. To select but the float columns,  usewine_df.select_dtypes(include = ['float']). The select_dtypes method takes in a listing of datatypes in its include parameter. The list values can be a string or a Python object.

Yous can also use the filter method to select columns based on the cavalcade names or index labels.

filter_method for selecting columns

In the above instance, the filter method returns columns that contain the exact string 'acid'. The similar parameter takes a string as an input and returns columns that has the string.

You can utilise regular expressions with the regex parameter in the filter method.

regular_exp_filter

Hither, I first rename the ph and quality columns. Then, I pass the regex parameter to the filter method to observe all the columns that has a number.

Irresolute the order of your columns


I would similar to modify the order of my columns.

Changing the order of columns

wine_df.columns shows all the column names. I organize the names of my columns into three list variables, and concatenate all these variables to get the terminal column lodge.

reordering columns in pandas

I use the Set module to check if new_cols contains all the columns from the original.

Then, I laissez passer the new_cols variable to the indexing operator and shop the resulting DataFrame in a variable "wine_df_2". Now, the wine_df_2DataFrame has the columns in the gild that I wanted.

pass the names of the columns

Selecting rows using .iloc and loc


Now, permit'due south see how to utilise .iloc and loc for selecting rows from our DataFrame. To illustrate this concept better, I remove all the duplicate rows from the "density" column and modify the index ofwine_dfDataFrame to 'density'.

selecting rows

To select the 3rd row in wine_dfDataFrame, I pass number ii to the .iloc indexer.

selecting rows using iloc

To practise the same thing, I utilise the .loc indexer.

selecting rows using loc

To select rows with different index positions, I pass a list to the .iloc indexer.

I laissez passer a listing of density values to the .iloc indexer to reproduce the above DataFrame.

loc to reproduce the dataframe

Y'all can use slicing to select multiple rows . This is similar to slicing a list in Python.

The above operation selects rows 2, 3 and 4.

Yous tin can perform the same matter using loc.

list slicing using loc

Hither,  I am selecting the rows betwixt  the indexes 0.9970and 0.9959.

Selecting rows and columns simultaneously

You have to pass parameters for both row and column inside the .iloc and loc indexers to select rows and columns simultaneously. The rows and column values may be scalar values, lists, piece objects or boolean.

Select all the rows, and 4th, 5th and seventh column:

To replicate the to a higher place DataFrame, pass the column names as a list to the .loc indexer:

columns and rows using loc

Selecting disjointed rows and columns

To select a particular number of rows and columns, you tin can do the following using .iloc.

disjointed rows using iloc

To select a item number of rows and columns, you can do the following using .loc.

selecting particular rows using loc

To select a single value from the DataFrame, yous tin do the following.

selecting a single scalar value

You tin employ slicing to select a particular column.

slicing to selecting rows and columns

To select rows and columns simultaneously, you need to understand the use of comma in the foursquare brackets. The parameters to the left of the comma always selects rows based on the row alphabetize, and parameters to the right of the comma ever selects columns based on the column index.

If you lot desire to select a fix of rows and all the columns, you don't demand to utilize a colon post-obit a comma.

no need to use comma

iloc - for selecting all columns and selected number of rows

Selecting rows and columns using "get_loc" and "index" methods


rows and columns using get_loc

In the in a higher place example, I utilise theget_loc method to find the integer position of the column 'volatile_acidity' and assign it to the variable col_start. Again, I use theget_locmethod to observe the integer position of the column that is 2 integer values more than  'volatile_acidity' column, and assign it to the variable called col_end.I then utilise the ilocmethod to select the kickoff 4 rows, and col_start and col_endcolumns.  If you pass an index label to theget_locmethod, information technology returns its integer location.

You lot tin perform a very similar functioning using .loc.  The following shows how to select the rows from iii to vii, along with columns "volatile_acidity" to "chlorides".

index and getloc

Subselection using .iat and at

Indexers, .iat and .at, are much more faster than .iloc and .loc for selecting a single chemical element from a DataFrame.

density val

subselection using iat and at

subselection using iat and at part 2

I volition be writing more than tutorials on manipulating information using Pandas. Stay Tuned!

References:

  • Pandas Cookbook
  • Python for Data Analysis

Related:

  • Pandas Crook Sheet: Information Science and Information Wrangling in Python
  • Pandas DataFrame Indexing
  • Become a Pro at Pandas, Python's Information Manipulation Library

Source: https://www.kdnuggets.com/2019/06/select-rows-columns-pandas.html

Posted by: dennytheept.blogspot.com

0 Response to "How To Find Oak Trees On Google Earth"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel