I’ve recently started my journey learning about data science and deep learning techniques. The great thing for me is these disciplines tie my formal education and work experience together nicely: Theoretical Mathematics + Computer Science + Full Stack development. Enough about me, lets get to it.
One of my projects needs a database refactoring into a time-series analytics database. First, I’m moving the existing database to the cloud(Amazon RDS) and changing the schema a bit. I’m taking csv dumps and loading them into a stock PostgreSQL database. From there I’m going to explore different databases and different schemas.
I initially started writing bash scripts to perform the loads, calling postgres \copy command. This approach works but found the ability of bash to query the database lacking. Mainly, when querying with bash there didn’t seem to be a way to map the rows into key-value pairs. Then I realized no self-respecting data scientist would ever use bash and started using python(another thing I’m learning).
My next hurdle was calling the postgres \copy command using psycopg2(postgres python library). The docs and blogs didn’t seem to answer some fundamental questions(or my mastery of skimming docs is lacking):
Does psycopg2.copy_from use the COPY(requires access to the database server) or \copy(doesn’t)? I’m assuming it uses \copy since it works from AWS instances connecting to a Postgres RDS instance
How do you handle csv files with the first line as he header? (see code below)? Basically, think of reading a file one line at a time the same way you use a
cursor
when iterating over rows of a database result set.
Loading CSV files with column information using psycopg2.copy_from
in_file = open(in_file_nm, 'r')
# FIRST LINE IN FILE ARE COLUMN NAMES
# TURN FIRST LINE INTO AN ARRAY
columns = in_file.readline().strip('\n').split(',')
# LOAD COPY STARTING AT SECOND LINE IN FILE
cur.copy_from(in_file, 'your_table_name', sep=',', columns=columns)
Hope this helps
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy