Big Data

In Development

Learn to work with large files of data and analyze them using programming. The Social Security Admin. has files of names for every year since 1897. They are comma-separated value (CSV) files with the name, gender and number of people with that name (roughly 30,000+ names for each year). Use a Java program that will answer the following questions:

1. Is your name in the top 10? 100? 500? 1,000?
2. What number is it?
3. Are there any derivations/spellings close to your name that are higher (e.g. Caitlin, Caitlyn, Kaitlin, Katelynn, etc?)
4. What was the count/percentage of people with your name in that year?
5. Is your name's popularity increasing or decreasing since then?

You will learn to handle CSV files, convert data between CSV and tab-delimited formats, remove duplicates, count, search, sort and compare data. You will also learn about XML and JSON. You will learn the concepts of big data and data mining as well as how to write apps that are mashups of data obtained from websites.