Analysis of PubChem Chemical Database

Required Availability
Fall 2018 | Spring 2019 | Spring 2018 | Summer 2018
Course Credit?
No
Paid Position?
No
Description

PubChem is one of the largest chemical structure databases in the world, containing about 100 million unique chemical compound records that represent nearly one terabyte of data. The database is open and available for download and reuse. We will extract out key data from the records, and write custom analysis scripts to characterize the database (e.g. number of unique structural geometries, bonding characteristics, atom types, and more). Students will learn how to work with big datasets, regular expression pattern matching, and data visualization. Students will be actively engaged in all aspects of the project including design, development of scripting code, analysis of data, and data management. Students will have the opportunity to be co-authors on any scholarship produced including both presentations and peer-reviewed journal articles.

Special Directions

Simply contact me to talk about your interest!

Special Skills

There is no required experience; Prof. Scalfani will provide foundational knowledge to students. We will use a combination of Matlab and R. Must have motivation, interest, and commitment to research.


Contact Phone #
205-348-5806
Contact Email
vfscalfani@ua.edu
Research Website
http://orcid.org/0000-0002-7363-531X

You need to be logged in to apply!


Login