Sunday 22 March 2015

Publicly Availabe Dataset

This is a page where I have compiled public datasets that I have come across. 


  1. http://hadoopilluminated.com/hadoop_illuminated/Public_Bigdata_Sets.html
  2. http://www.researchpipeline.com/mediawiki/index.php?title=Main_Page 
  3. http://www.scaleunlimited.com/datasets/public-datasets/
  4. http://kevinchai.net/datasets  (good)
  5. http://snap.stanford.edu/data/ 
  6. http://labrosa.ee.columbia.edu/millionsong/
  7. http://blogs.msdn.com/b/avkashchauhan/archive/2012/04/12/processing-million-songs-dataset-with-pig-scripts-on-apache-hadoop-on-windows-azure.aspx
  8. https://stackoverflow.com/questions/10843892/download-large-data-for-hadoop
  9. http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
  10. http://archive.ics.uci.edu/ml/datasets.html
  11. http://lemurproject.org/clueweb09/
  12. http://stackoverflow.com/questions/2674421/free-large-datasets-to-experiment-with-hadoop

  13. http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free (good)
  14. http://rishavrohitblog.blogspot.in/2013/02/sample-datasets.html
  15. http://blog.cloudera.com/blog/2013/02/how-to-resample-from-a-large-data-set-in-parallel-with-r-on-hadoop/

  16. http://www.hadoopinrealworld.com/using-million-song-dataset-in-hadoop/
  17. http://www.datawrangling.com/some-datasets-available-on-the-web/ (good)
  18. spatialhadoop.cs.umn.edu/datasets.html
  19. https://www.datadr.org/doc/airline.html
  20. https://ibmhadoop.challengepost.com/details/data
  21. http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/ 

0 comments:

Post a Comment