添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Learn to insert data into SQL databases like a pro!

One of the key roles of a data scientist is to extract patterns and insights from raw data. Since much of the world’s government and corporate data is organized in relational databases, it makes sense that data scientists need to know how to work with these database structures. Writing SQL queries to insert, extract, and filter data in databases is a key skill for anyone interested in data analytics or data science.

SQL (Structured Query Language) is based on E. F. Codd’s Relational model and algebra to manage the relational databases. It’s a database query language used to create, insert, query, and manipulate the relational database and used by a large number of applications.

Although it has been around for decades, learning SQL is still a critical skill for modern data scientists , and really anyone who works with data at all, because SQL is used in all kinds of relational database software, including MySQL, SQL Server, Oracle, and PostgreSQL.

In this tutorial, we’ll learn about SQL insertion operations in detail. Here is the list of topics that we will learn in this tutorial:

  • SQL Insertion
  • Inserting records into a database
  • Inserting Pandas DataFrames into a database using the insert command
  • Inserting Pandas DataFrames into a database using the to_sql() command
  • Reading records from a database
  • Updating records in a database
  • Want to reach a higher level of SQL skill? Sign up for free and check out Dataquest’s SQL courses for thorough, interactive lessons on all the SQL skills you’ll need for data science.

    SQL Insertion

    SQL Insertion is an essential operation for data workers to understand. Inserting missing data or adding new data is a major part of the data cleaning process on most data science projects.

    Insertion is also how most data gets into databases in the first place, so it’s important anytime you’re collecting data, too. When your company gets new data on a customer, for example, chances are than a SQL insert will be how that data gets into your existing customer database.

    In fact, whether or not you’re aware of it, data is flowing into databases using SQL inserts all the time! When you fill out a marketing survey, complete a transaction, file a government form online, or do any of thousands of other things, your data is likely being inserted into a database somewhere using SQL.

    Let’s dive into how we can actually use SQL to insert data into a database. We can insert data row by row, or add multiple rows at a time.

    Inserting records into a database

    In SQL, we use the INSERT command to add records/rows into table data. This command will not modify the actual structure of the table we’re inserting to, it just adds data.

    Let’s imagine we have a data table like the one below, which is being used to store some information about a company’s employees.

    Now, let’s imagine we have new employees we need to put into the system.

    This employee table could be created using the CREATE TABLE command, so we could use that command to create an entirely new table. But it would be very inefficient to create a completely new table every time we want to add data! Instead, let’s use the INSERT command to add the new data into our existing table.

    Here’s the basic syntax for using INSERT in SQL:

    We start with the command INSERT INTO followed by the name of table into which we’d like to insert data. After the table name, we list the columns of new data we're inserting column by column, inside parentheses. Then, on the next line, we used the command VALUES along with the values we want to insert (in sequence inside parentheses.

    So for our employee table, if we were adding a new employee named Kabir, our INSERT command might look like this:

    Inserting Records Into a Database From Python

    Since we’re often working with our data in Python when doing data science, let’s insert data from Python into a MySQL database. This is a common task that has a variety of applications in data science.

    We can send and receive data to a MySQL database by establishing a connection between Python and MySQL. There are various ways to establish this connection; here, we will use pymysql for database connectivity.

    Here are the broad steps we’ll need to work through to get pymysql connected, insert our data, and then extract the data from MySQL:

    Let’s walk through this process step by step.

    Step 1: Import the pymysql module.

    # Import pymysql module
    import pymysql

    Step 2: Create connection a to the MySQL database

    Create a connection using pymysql ‘s connect() function with the parameters host, user, database name, and password.

    (The parameters below are for demonstration purposes only; you’ll need to fill in the specific access details required for the MySQL database you’re accessing.)

    # Connect to the database
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='12345',
                                 db='employee')

    Step 3: Create a cursor using the cursor() function.

    This will allow us to execute the SQL query once we’ve written it.

    cursor = connection.cursor()

    Step 4: Execute the required SQL query

    Commit the changes using the commit() function, and check the inserted records. Note that we can create a variable called sql , assign our query’s syntax to it, and then pass sql and the specific data we want to insert as arguments to cursor.execute() .

    Then, we’ll commit these changes using commit() .

    # Create a new record
    sql = "INSERT INTO employee (EmployeeID, Ename, DeptID, Salary, Dname, Dlocation) VALUES (%s, %s, %s, %s, %s, %s)"
    # Execute the query
    cursor.execute(sql, (1008,'Kabir',2,5000,'IT','New Delhi'))
    # the connection is not autocommited by default. So we must commit to save our changes.
    connection.commit()

    Let’s do a quick check to see if the record we wanted to insert has actually been inserted.

    We can do this by querying the database for the entire contents of employee , and then fetching and printing those results.

    # Create a new query that selects the entire contents of employee
    sql = "SELECT * FROM employee"
    cursor.execute(sql)
    # Fetch all the records and use a for loop to print them one line at a time
    result = cursor.fetchall()
    for i in result:
        print(i)
    (1001, 'John', 2, 4000, 'IT', 'New Delhi') 
    (1002, 'Anna', 1, 3500, 'HR', 'Mumbai') 
    (1003, 'James', 1, 2500, 'HR', 'Mumbai') 
    (1004, 'David', 2, 5000, 'IT', 'New Delhi') 
    (1005, 'Mark', 2, 3000, 'IT', 'New Delhi') 
    (1006, 'Steve', 3, 4500, 'Finance', 'Mumbai') 
    (1007, 'Alice', 3, 3500, 'Finance', 'Mumbai') 
    (1008, 'Kabir', 2, 5000, 'IT', 'New Delhi')

    It worked! Above, we can see the new record has been inserted and is now the final row in our MySQL database.

    Step 5: Close the database connection

    Now that we’re done, we should close the database connection using close() method.

    # Close the connection
    connection.close()

    Of course, it would be better to write this code in a way that could better handle exceptions and errors. We can do this using try to contain the body of our code and except to print errors if any arise. Then, we can use finally to close the connection once we’re finished, regardless of whether try succeeded or failed.

    Here’s what that looks like all together:

    import pymysql
        # Connect to the database
        connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='12345',
                                 db='employee')
        cursor=connection.cursor()
        # Create a new record
        sql = "INSERT INTO employee (EmployeeID, Ename, DeptID, Salary, Dname, Dlocation) VALUES (%s, %s, %s, %s, %s, %s)"
        cursor.execute(sql, (1009,'Morgan',1,4000,'HR','Mumbai'))
        # connection is not autocommit by default. So we must commit to save our changes.
        connection.commit()
        # Execute query
        sql = "SELECT * FROM employee"
        cursor.execute(sql)
        # Fetch all the records
        result = cursor.fetchall()
        for i in result:
            print(i)
    except Error as e:
        print(e)
    finally:
        # close the database connection using close() method.
        connection.close()
    ((1001, 'John', 2, 4000, 'IT', 'New Delhi'), (1002, 'Anna', 1, 3500, 'HR', 'Mumbai'), (1003, 'James', 1, 2500, 'HR', 'Mumbai'), (1004, 'David', 2, 5000, 'IT', 'New Delhi'), (1005, 'Mark', 2, 3000, 'IT', 'New Delhi'), (1006, 'Steve', 3, 4500, 'Finance', 'Mumbai'), (1007, 'Alice', 3, 3500, 'Finance', 'Mumbai'), (1008, 'Kabir', 2, 5000, 'IT', 'New Delhi'), (1009, 'Morgan', 1, 4000, 'HR', 'Mumbai'), (1009, 'Morgan', 1, 4000, 'HR', 'Mumbai'))

    Inserting Pandas DataFrames Into Databases Using INSERT

    When working with data in Python, we’re often using pandas , and we’ve often got our data stored as a pandas DataFrame. Thankfully, we don’t need to do any conversions if we want to use SQL with our DataFrames; we can directly insert a pandas DataFrame into a MySQL database using INSERT .

    Once again, we’ll take it step-by-step.

    Step 1: Create DataFrame using a dictionary

    We could also import data from a CSV or create a DataFrame in any number of other ways, but for the purposes of this example, we’re just going to create a small DataFrame that saves the titles and prices of some data science texbooks.

    # Import pandas
    import pandas as pd
    # Create dataframe
    data = pd.DataFrame({
        'book_id':[12345, 12346, 12347],
        'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
        'price':[29, 23, 27]
    

    Step 2: Create a table in our MySQL database

    Before inserting data into MySQL, we’re going to to create a book table in MySQL to hold our data. If such a table already existed, we could skip this step.

    We’ll use a CREATE TABLE statement to create our table, follow that with our table name (in this case, book_details), and then list each column and its corresponding datatype.

    Step 3: Create a connection to the database

    Once we’ve created that table, we can once again create a connection to the database from Python using pymysql.

    import pymysql
    # Connect to the database
    connection = pymysql.connect(host='localhost',
                             user='root',
                             password='12345',
                             db='book')
    # create cursor
    cursor=connection.cursor()

    Step 4: Create a column list and insert rows

    Next, we’ll create a column list and insert our dataframe rows one by one into the database by iterating through each row and using INSERT INTO to insert that row’s values into the database.

    (It is also possible to insert the entire DataFrame at once, and we’ll look at a way of doing that in the next section, but first let’s look at how to do it row-by-row).

    # creating column list for insertion
    cols = ",".join([str(i) for i in data.columns.tolist()])
    # Insert DataFrame recrds one by one.
    for i,row in data.iterrows():
        sql = "INSERT INTO book_details (" +cols + ") VALUES (" + "%s,"*(len(row)-1) + "%s)"
        cursor.execute(sql, tuple(row))
        # the connection is not autocommitted by default, so we must commit to save our changes
        connection.commit()

    Step 5: Query the database to check our work

    Again, let’s query the database to make sure that our inserted data has been saved correctly.

    # Execute query
    sql = "SELECT * FROM book_details"
    cursor.execute(sql)
    # Fetch all the records
    result = cursor.fetchall()
    for i in result:
        print(i)
    (12345, 'Python Programming', 29) 
    (12346, 'Learn MySQL', 23) 
    (12347, 'Data Science Cookbook', 27)

    Once we’re satisfied that everything looks right, we can close the connection.

    connection.close()

    Inserting Pandas DataFrames into a Database Using the to_sql() Function

    Now let’s try to do the same thing — insert a pandas DataFrame into a MySQL database — using a different technique. This time, we’ll use the module sqlalchemy to create our connection and the to_sql() function to insert our data.

    This approach accomplishes the same end result in a more direct way, and allows us to add a whole dataframe to a MySQL database all at once.

    # Import modules
    import pandas as pd
    # Create dataframe
    data=pd.DataFrame({
        'book_id':[12345,12346,12347],
        'title':['Python Programming','Learn MySQL','Data Science Cookbook'],
        'price':[29,23,27]
    

    Import the module sqlalchemy and create an engine with the parameters user, password, and database name. This is how we connect and log in to the MySQL database.

    # import the module
    from sqlalchemy import create_engine
    # create sqlalchemy engine
    engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                           .format(user="root",
                                   pw="12345",
                                   db="employee"))

    Once we’re connected, we can export the whole DataFrame to MySQL using the to_sql() function with the parameters table name, engine name, if_exists, and chunksize.

    We’ll take a closer look at what each of these parameters refers to in a moment, but first, take a look at how much simpler it is to insert a pandas DataFrame into a MySQL database using this method. We can do it with just a single line of code:

    # Insert whole DataFrame into MySQL
    data.to_sql('book_details', con = engine, if_exists = 'append', chunksize = 1000)

    Now let’s take a closer look at what each of these parameters is doing in our code.

  • book_details is the name of table into which we want to insert our DataFrame.
  • con = engine provides the connection details (recall that we created engine using our authentication details in the previous step).
  • if_exists = 'append' checks whether the table we specified already exists or not, and then appends the new data (if it does exist) or creates a new table (if it doesn’t).
  • chunksize writes records in batches of a given size at a time. By default, all rows will be written at once.
  • Reading Records from a Database

    Once we’ve used SQL inserts to get our data into the database, we’ll want to be able to read it back! So far in this tutorial, we’ve checked our SQL inserts by simply printing the entire database, but obviously this is not a viable option with larger databases where you’d be printing thousands of rows (or more). So let’s take a more in-depth look at how we can read back the records we’ve created or inserted into our SQL database.

    We can read records from a SQL database using the SELECT command. We can select specific columns, or use * to select everything from a given table. We can also select to return only records that meet a particular condition using the WHERE command.

    Here’s how the syntax for these commands looks:

    We start with a SELECT clause, followed by list of columns, or * if we want to select all columns.Then we’ll use a FROM clause to name the table we’d like to look at. WHERE can be used to filter the records and followed by a filter condition, and we can also use ORDER BY to sort the records. (The WHERE and ORDER BY clauses are optional).

    With larger databases, WHERE is useful for returning only the data we want to see. So if, for example, we’ve just inserted some new data about a particular department, we could use WHERE to specify the department ID in our query, and it would return only the records with a department ID that matches the one we specified.

    Compare, for example, the results of these two queries using our employee table from earlier. In the first, we’re returning all the rows. In the second, we’re getting back only the rows we’ve asked for. This may not make a big difference when our table has seven rows, but when you’re working with seven thousand rows, or even seven million, using WHERE to return only the results you want is very important!

    If we want to do this from within Python, we can use the same script we used earlier in this tutorial to query these records. The only difference is that we’ll tell pymysql to execute the SELECT command rather than the INSERT command we used earlier.

    # Import module
    import pymysql
    # create connection
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='12345',
                                 db='employee')
    # Create cursor
    my_cursor = connection.cursor()
    # Execute Query
    my_cursor.execute("SELECT * from employee")
    # Fetch the records
    result = my_cursor.fetchall()
    for i in result:
        print(i)
    # Close the connection
    connection.close()
    (1001, 'John', 2, 4000, 'IT', 'New Delhi') 
    (1002, 'Anna', 1, 3500, 'HR', 'Mumbai') 
    (1003, 'James', 1, 2500, 'HR', 'Mumbai') 
    (1004, 'David', 2, 5000, 'IT', 'New Delhi') 
    (1005, 'Mark', 2, 3000, 'IT', 'New Delhi') 
    (1006, 'Steve', 3, 4500, 'Finance', 'Mumbai') 
    (1007, 'Alice', 3, 3500, 'Finance', 'Mumbai') 
    (1008, 'Kabir', 2, 5000, 'IT', 'New Delhi') 
    (1009, 'Morgan', 1, 4000, 'HR', 'Mumbai') 
    (1009, 'Morgan', 1, 4000, 'HR', 'Mumbai')

    Above, we’ve selected and printed the entire database, but if we wanted to use WHERE to make a more careful, limited selection, the approach is the same:

    my_cursor.execute("SELECT * FROM employee WHERE DeptID=2")

    Updating Records in the Database

    Often, we’ll need to modify the records in the table after creating them.

    For example, imagine that an employee in our employee table got a promotion. We’d want to update their salary data. The INSERT INTO command won’t help us here, because we don’t want to add an entirely new row.

    To modify existing records in the table, we need to use the UPDATE command. UPDATE is used to change the contents of existing records. We can specify specific columns and values to change using SET, and we can also make conditional changes with WHERE to apply those changes only to rows that meet that condition.

    Now, let’s update the records from our employee table and display the results. In this case, let’s say David got the promotion — we’ll write a query using UPDATE that sets Salary to 6000 only in columns where the employee ID is 1004 (David’s ID).

    Be careful — without the WHERE clause, this query would update every record in the table, so don’t forget that!

    After executing the above query, the updated table would look like this:

    Conclusion

    In this tutorial, we’ve taken a look at SQL inserts and how to insert data into MySQL databases from Python. We also learned to insert Pandas DataFrames into SQL databases using two different methods, including the highly efficient to_sql() method.

    Of course, this is just the tip of the iceberg when it comes to SQL queries. If you really want to become a master of SQL, sign up for free and dive into one of Dataquest’s interactive SQL courses to get interactive instruction and hands-on experience writing all the queries you’ll need to do productive, professional data science work.

    Also check out some of our other free SQL-related resources:

  • Do you need a SQL certification?
  • SQL interview questions to prep for job interviews
  • Our SQL cheat sheet
  •