python. the dataset is a big excel file with 1972 rows and 8 columns. i need...

Question

Question

In [5] df.shape Out[5](1972, 8) In [6]:df.columns Out [ 6] Index([ Category, currency, sellerRating, Duration, OpenPr

4 ] Category currency sellerRating Duration endDay ClosePrice OpenPrice Competitive? 0 Music/Movie/Game US Mon 0.01 3249 5 0.

python.

the dataset is a big excel file with 1972 rows and 8 columns.

i need the lines of code that does the following.

-Rearrange columns to look like this: 'Category', 'currency', 'endDay', 'sellerRating', 'Duration', 'OpenPrice', 'ClosePrice', 'Competetive'

-Create dummies for 'Category', 'currency', and 'endDay' columns

-Normalize all the numeric columns (Competetive is not numeric, it is categorical). Your final dataset should contain all the dummies as well as normalized numeric values plus 'Competetive?' column (33 columns).

In [5] df.shape Out[5](1972, 8) In [6]:df.columns Out [ 6] Index([ 'Category', 'currency', 'sellerRating', 'Duration', OpenPrice', 'endDay', closePrice', Competitive?'], dtype'object') In [7]:df.dtypes object object Out[7]Category currency sellerRating int64 Duration int64 endDay object ClosePrice float64 OpenPrice Competitive? dtype: object float64 int64
4 ] Category currency sellerRating Duration endDay ClosePrice OpenPrice Competitive? 0 Music/Movie/Game US Mon 0.01 3249 5 0.01 Music/Movie/Game US 3249 5 Mon 0.01 0.01 2 Music/Movie/Game US 3249 Mon 0.01 0.01 5 3 Music/Movie/Game US 3249 Mon 0.01 5 0.01 4 Music/Movie/Game US 3249 5 Mon 0.01 0.01 O O O o o LO LO

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

NOTE: I Hope the below solution meets your requirements as per the Question. Please Do UPVOTE if it Does. :)

The solution is performed on the basis of sample data shown in the question.

According to question , total_columns(15) = actual_columns(8) + dummy_columns(3) + normalised_columns(4)

In order to increase to columns you can just add the column names in the cell where it has to be modified.

## Python Code of the Cells

#!/usr/bin/env python
# coding: utf-8

# In[1]:

import pandas as pd
data = [['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0]
]
df = pd.DataFrame(data)
df.columns = ['Category','currency','sellerRating','Duration','endDay','closedPrice','OpenPrice','Competitive?']
print(df)

# In[2]:

df.shape

# In[3]:

df.columns

# In[4]:

df.dtypes

# In[5]:

df

# In[6]:

df = df.reindex(['Category','currency','endDay','sellerRating','Duration','OpenPrice','closedPrice','Competitive?'], axis = 1)
df

# In[7]:

df = pd.concat([df,pd.get_dummies(df[['Category','currency','endDay']])], axis = 1)
df

# In[8]:

df['Normalised_sellerRating'] = df['sellerRating']/df['sellerRating'].max()
df['Normalised_Duration'] = df['Duration']/df['Duration'].max()
df['Normalised_OpenPrice'] = df['OpenPrice']/df['OpenPrice'].max()
df['Normalised_closedPrice'] = df['closedPrice']/df['closedPrice'].max()
df

# In[9]:

df.shape

## Screenshots of the Cells in Jupyter notebook:

Add a comment

Answer 2

python. the dataset is a big excel file with 1972 rows and 8 columns. i need...

Homework Answers

Add Answer to:
python. the dataset is a big excel file with 1972 rows and 8 columns. i need...

Post as a guest

Earn Coins

python. the dataset is a big excel file with 1972 rows and 8 columns. i need...

Homework Answers

Add Answer to: python. the dataset is a big excel file with 1972 rows and 8 columns. i need...

Post as a guest

Earn Coins

Add Answer to:
python. the dataset is a big excel file with 1972 rows and 8 columns. i need...