Question

In [5] df.shape Out[5](1972, 8) In [6]:df.columns Out [ 6] Index([ Category, currency, sellerRating, Duration, OpenPr

4 ] Category currency sellerRating Duration endDay ClosePrice OpenPrice Competitive? 0 Music/Movie/Game US Mon 0.01 3249 5 0.

python.

the dataset is a big excel file with 1972 rows and 8 columns.

i need the lines of code that does the following.

-Rearrange columns to look like this: 'Category', 'currency', 'endDay', 'sellerRating', 'Duration', 'OpenPrice', 'ClosePrice', 'Competetive'

-Create dummies for 'Category', 'currency', and 'endDay' columns

-Normalize all the numeric columns (Competetive is not numeric, it is categorical). Your final dataset should contain all the dummies as well as normalized numeric values plus 'Competetive?' column (33 columns).

In [5] df.shape Out[5](1972, 8) In [6]:df.columns Out [ 6] Index([ 'Category', 'currency', 'sellerRating', 'Duration', OpenPrice', 'endDay', closePrice', Competitive?'], dtype'object') In [7]:df.dtypes object object Out[7]Category currency sellerRating int64 Duration int64 endDay object ClosePrice float64 OpenPrice Competitive? dtype: object float64 int64
4 ] Category currency sellerRating Duration endDay ClosePrice OpenPrice Competitive? 0 Music/Movie/Game US Mon 0.01 3249 5 0.01 Music/Movie/Game US 3249 5 Mon 0.01 0.01 2 Music/Movie/Game US 3249 Mon 0.01 0.01 5 3 Music/Movie/Game US 3249 Mon 0.01 5 0.01 4 Music/Movie/Game US 3249 5 Mon 0.01 0.01 O O O o o LO LO
0 0
Add a comment Improve this question Transcribed image text
Answer #1

NOTE: I Hope the below solution meets your requirements as per the Question. Please Do UPVOTE if it Does. :)

The solution is performed on the basis of sample data shown in the question.

According to question , total_columns(15) = actual_columns(8) + dummy_columns(3) + normalised_columns(4)

In order to increase to columns you can just add the column names in the cell where it has to be modified.

## Python Code of the Cells

#!/usr/bin/env python
# coding: utf-8

# In[1]:


import pandas as pd
data = [['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0],
['Music/Movie/Game', 'US', 3249, 5, 'Mon', 0.01, 0.01, 0]
]
df = pd.DataFrame(data)
df.columns = ['Category','currency','sellerRating','Duration','endDay','closedPrice','OpenPrice','Competitive?']
print(df)


# In[2]:


df.shape


# In[3]:


df.columns


# In[4]:


df.dtypes


# In[5]:


df


# In[6]:


df = df.reindex(['Category','currency','endDay','sellerRating','Duration','OpenPrice','closedPrice','Competitive?'], axis = 1)
df


# In[7]:


df = pd.concat([df,pd.get_dummies(df[['Category','currency','endDay']])], axis = 1)
df


# In[8]:


df['Normalised_sellerRating'] = df['sellerRating']/df['sellerRating'].max()
df['Normalised_Duration'] = df['Duration']/df['Duration'].max()
df['Normalised_OpenPrice'] = df['OpenPrice']/df['OpenPrice'].max()
df['Normalised_closedPrice'] = df['closedPrice']/df['closedPrice'].max()
df


# In[9]:


df.shape

## Screenshots of the Cells in Jupyter notebook:

Add a comment
Know the answer?
Add Answer to:
python. the dataset is a big excel file with 1972 rows and 8 columns. i need...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT