Analysis of PubChem Chemical Database
Course Credit?
NoPaid Position?
NoFaculty
Vincent ScalfaniDescription
PubChem is one of the largest chemical structure databases in the world, containing about 100 million unique chemical compound records that represent nearly one terabyte of data. The database is open and available for download and reuse. We will extract out key data from the records, and write custom analysis scripts to characterize the database (e.g. number of unique structural geometries, bonding characteristics, atom types, and more). Students will learn how to work with big datasets, regular expression pattern matching, and data visualization. Students will be actively engaged in all aspects of the project including design, development of scripting code, analysis of data, and data management. Students will have the opportunity to be co-authors on any scholarship produced including both presentations and peer-reviewed journal articles.
Special Directions
Simply contact me to talk about your interest!
Special Skills
There is no required experience; Prof. Scalfani will provide foundational knowledge to students. We will use a combination of Matlab and R. Must have motivation, interest, and commitment to research.