This community-built FAQ covers the “Splitting by Index” exercise from the lesson “Data Cleaning with Pandas”.

This exercise can be found in the following Codecademy content:

Practical Data Cleaning

Hi there,
I was wondering if there is a smarter (quicker) way to erase the column ‘gender_age’ in this exercise, instead of rewriting all the column headers again.
E.g. is there a .drop_columns() ? I tried it but it didn’t work out.

I was also thinking about that so I did some searching. It looks like we can use drop function in several ways to reach this. For example:

students = students.drop('gender_age', axis=1)
students = students.drop(columns='gender_age')

I have just typed “print(students.columns())” for task 1 within this lesson, and had an error message that reads “‘Index’ object is not callable”. Despite the error message the lesson has marked the first task as complete, and allowed me to progress to the next task in the lesson. I have had this error on several lessons, but don’t understand what it means. Please can someone help by explaining the error? thanks.

I would like to know this too.

When trying it without .str like:
students[‘gender’] = students[‘gender_age’][0:1]

Returns M14 for the first row and then NaNs. It is as if Python interprets the [0:1] as an index for the column . What is it about str that make Python look at the contents of the row instead?

The issue is that these items are inside a container of some form (a pd.Series or pd.DataFrame ). Actually accessing the values requires the right use of syntax. For pandas slicing a series acts a bit like a less robust .iloc as described in the docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-ranges

Using students["gender_age"][0:1] returns a pandas Series object that has been effectively sliced by row (not a single string object and not a Series where every string has been sliced). This behaviour is similar to Python’s normal positional indexing/slicing, e.g. with a list the subscript [3] would give you the fourth element but the subscript [3:4] (slicing) would give you a list object containing the third element.

So your new column "gender" is basically being populated by a Series with a single element so every other element is just filled in as effectively missing. Hopefully that explains the NaN s.

As it’s not uncommon to want to access the underlying data pandas sometimes offers a useful tool to access the contents of a Series (certain types only) using what it calls accessors- https://pandas.pydata.org/pandas-docs/stable/reference/series.html#accessors

A bit like a vectorised getter as you can work with the actual contents of a Series without using standard looping methods. So it’s not Python itself but an option Pandas added in a few special cases, the link would provide much more detail.

Nice. I didn’t know about drop.

I did a more complicated way but used it as a method to refresh my knowledge of list comprehensions.

students = students[[ c for c in students.columns if c != 'gender_age']]

Hi All

This may sound very shallow or naive but what is the difference between .str and str()

Thanks in advance