Question

How can you handle duplicate values in a dataset for a variable in Python ? How...

How can you handle duplicate values in a dataset for a variable in Python ?
How can you find a missing value in a huge dataset using Python ?

0 0
Add a comment Improve this question Transcribed image text
Answer #1
  1. In Python, we have ‘pandas’ package which contains many functions which would help us to remove duplicate values in a variable or an attribute of a dataset. The functiondrop_duplicates can be used along with the dataframe. For example, if mobile number variable has duplicate values in customer details data set, then we can remove it with the below Python code:

customer_details_unique = customer_details.drop_duplicates(['mobile_number'])

We can also make use of duplicated() to identify the duplicate values in a variable.

  1. Again, we can make use of Pandas package. It contains various predefined functions which helps us to easily find out if the dataset has missing values in large dataset. It uses isnull() function to identify the missing values. For example,

data = pandas.DataFrame(customers)

data.isnull()

Using this result, we can filter the missing values. We can either remove it, or estimate the value of a missing value from machine learning algorithms or fill it with mean of a result.

Please do comment for any queries

Add a comment
Know the answer?
Add Answer to:
How can you handle duplicate values in a dataset for a variable in Python ? How...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT