|
Datasets
Page history
last edited
by Ariel Waldman 8 years, 11 months ago
All Sciences
Archaeology
- Open Context: publishes open archaeological data (and images and field-notes) from excavations, surveys, museum collections and government offices around the world. A new version of the site with updated APIs is in development+testing. Open Context is not like most repositories that serve files of datasets. Rather Open Context is a large integrated database for interactions over the Web, including use as a Linked Data provider.
Artificial Intelligence ('Real-World' Knowledge, Natural-Language Semantics, Reasoning)
-
COGBASE: Nearly all available open-source real-world knowledge available in one combined place, translated into a format easy to use for decisionmaking. Helps understand text, reason about the causes/consequences of events, and more.
Astronomy, Astrophysics & Space Exploration
Biology + Life Sciences
- Macauley Library - world's largest archive of animal sounds and videos
- Wildlife: BBC Wildlife Finder the resources are available as RDF/XML (eitehr add .rdf to the end of the URL or via conneg). details of the ontology here: http://purl.org/ontology/wo/ and some background here: http://www.slideshare.net/derivadow/apis-and-apis-a-wildlife-ontology.
- NBN Gateway is used to explore UK biodiversity data. It contains over 50 million species records covering England, Scotland, Wales and Northern Ireland. Data are available via the website, web services or as tab delimited files. Support is provided via the community forum.
- Bio2RDF has lots of biological information as linked data (RDF).
- U.S. biology datasets (via Data.gov)
- Birds: Avian Knowledge Network. Bird monitoring data resources represent arguably the most comprehensive time-series environmental data in existence. These data, gathered by hundreds of independent projects, have collected an estimated 60 million records over the past 100 years. Lots of range/distribution data. Download prepackaged data sets or query the database.
- Birds: International Ornithological Congress world bird checklist. Download in Excel, CSV or XML.
- Fish: Fishbase. Checklist of fish available as CSV, Tab-delimited, or Excel.
- Movebank is a free, online database of animal tracking data hosted by the Max Planck Institute for Ornithology.
- IUCN Redlist of Threatened Species: contains assessments for 49,000 species of which spatial data exists for about 25,000 species. ESRI shapefiles.
- Amphibians, Birds, Mammals: Species distribution grids from SEDAC. Data are available for global amphibian distributions, and for birds and mammals in the Americas. .BIL images.
- Birds: RSPB garden birdwatch 2010 results: Results from a survey to count the number of birds in your garden. Localised to the United Kingdom. Downloadable as spreadsheets.
- Open 23AndMe raw genotyping datasets: SNPedia has links to a number of data sets that people have decided to share. We can probably pool some more from amongst ourselves. Is there a larger repository of open data sets somewhere
- The 1000 Genomes Project: The 1000 Genomes Project is the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. As with other major human genome reference projects, data from the 1000 Genomes Project will be made available quickly to the worldwide scientific community through freely accessible public databases. The goal of the 1000 Genomes Project is to find most genetic variants that have frequencies of at least 1% in the populations studied.
- UCSC Genome Browser: This UCSC Genome Browser contains the current human reference genome, neanderthal genome, as well as data from the ENCODE project.
- International Cancer Genome Consortium: comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe
- ClinVar: database of human variation associated with clinical/disease phenotypes
- Exome Variant Server (EVS): exome sequencing database
- Human Microbiome Project: human microbiome datasets
- Human Metabolome Database: freely available electronic database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education.
- BioChemWeb: List of online databases in Biochemistry, Moleculer Biology and Cell Biology
- Genome Interpretation/Annotation Tools:
- Trait Association Databases:
- Population/Ancestry Databases:
- Blogs/Communities:
- Miscellaneous:
Chemistry
Computer Science + Web Data
Earth Sciences, Climate & Environment
- Data.nasa.gov - directory of NASA-related datasets
- NASA World Wind
- U.S. Earth science datasets (via Data.gov)
- RealClimate.org List of Data Sources on climate change - RealClimate is a blog run by climate scientists to respond to claims by "deniers". They also have a wiki which lists the names and details of pushers of "climate-related nonsense".
- Mineral Resources Data System (MRDS). MRDS describes metallic and nonmetallic mineral resources throughout the world. Included are deposit name, location, commodity, deposit description, geologic characteristics, production, reserves, resources, and references. It includes the original MRDS and MAS/MILS data.
- OneGeology. Global geological maps.
- AMEEdiscover: database of global emissions standards and methodologies, integrated with AMEE's API.
- World Ozone and Ultraviolet Radiation Data Centre: ten years of Swiss ozonesonde data. (direct FTP link ftp://ftp.tor.ec.gc.ca/pub/woudc/Archive-NewFormat/OzoneSonde_1.0_1/STN156/ECC/ ). Rights notice: "The data contained within the WOUDC Data Archive are free and unrestricted. For Scientific purposes, access to these data is unlimited and provided without charge. By their use you accept that an offer of co-authorship will be made through personal contact with the data providers or owners whenever substantial use is made of their data. In all cases, an acknowledgement must be made to the data providers or owners and to the data centre when these data are used within a publication."
- OpenScience CodeFest https://nceas.github.io/open-science-codefest/ and a GitHub ticket describing Earth Science data sets they are hacking: https://github.com/NCEAS/open-science-codefest/issues/26 for lists
Glaciology
- Publication: "Open Access Data in Polar and Cryospheric Remote Sensing" http://www.mdpi.com/2072-4292/6/7/6183
- http://www.antarcticglaciers.org/antarctica/antarctic-datasets/ List of glaciology data sets
- National Snow and Ice Data Center (NSIDC) data list: http://nsidc.org/data
- The Arctic Data Gateway. Topics include: Agriculture, Atmosphere, Biological Classification, Biosphere, Climate Indicators, Cryosphere, Human Dimensions, Land Surface, Oceans, Paleoclimate, Solid Earth, Terrestrial Hydrosphere https://www.aoncadis.org/home.htm
- Weather stations on Greenland: http://promice.org/DataDownload.html
- Greenland Albedo in NetCDF format from the DarkSnow project, a crowd-funded Greenland expedition: http://bprc.osu.edu/~jbox/hack_me/
Oceanography
- Shipwrecks within the U.S. maritime boundaries, maintained by NOAA
Geography
Medicine and Health Sciences
Neuroscience
- OpenfMRI.org: OpenfMRI.org is a project dedicated to the free and open sharing of functional magnetic resonance imaging (fMRI) datasets, including raw data.
- 1000 Functional Connectomes Project and International Neuroimaging Data-sharing Initiative (http://fcon_1000.projects.nitrc.org/ <-- for some reason the wiki doesn't like this URL, I recommend you copy and paste it): neuroimaging scans from 1000s of subjects. Includes resting state functional magnetic resonance imaging (fMRI) data, structural MRIs, and diffusion tensor imaging (DTI). While the FCP dataset is mostly from healthy controls, and includes very little phenotypic data, the INDI dataset is well phenotyped and includes data for several patient populations such as ADHD, epilepsy and cocaine addiction.
- ADHD-200 preprocessed data: preprocessed resting state fMRI and structural MRI data from ~ 700 typically developing children and ~ 400 children with ADHD released through INDI. The goal of this project is to release data in a form that is more accessible to those without functional neuroimaging expertise.
- Multi-Modal MRI Reproducibility Resource: scan-rescan imaging sessions from 21 healthy volunteers (no history of neurological disease). Imaging modalities include MPRAGE, FLAIR, DTI, resting state fMRI, B0 and B1 field maps, ASL, VASO, quantitative T1 mapping, quantitative T2 mapping, and magnetization transfer imaging. This is intended to be a resource for statisticians and imaging scientists to be able to quantify the reproducibility of their imaging methods using data available from a generic "1 hour" session at 3T.
- brainmap.org: BrainMap is an online database of published functional neuroimaging (fMRI and PET) experiments with coordinate-based (x,y,z) activation locations in Talairach space. The goal of BrainMap is to provide a vehicle to share methods and results of studies in specific research domains, such as language, memory, attention, emotion, and perception. BrainMap can also be used to perform meta-analyses of similar research studies.
- Open Connectome Project: "Collectively reverse engineering the brain one synapse at a time." Transmission electron microscopy images of mouse visual cortex.
- Allen Brain Atlas: A growing collection of online public resources integrating extensive gene expression and neuroanatomical data, complete with a novel suite of search and viewing tools.
-
Human Connectome Project - Comprehensively mapping human brain circuitry in a target number of 1200 healthy adults using cutting-edge methods of noninvasive neuroimaging.
-
Human brain diffusion-weighted MRI - Stanford study data
Particle Physics
Publications
Other lists
Datasets
|
Tip: To turn text into a link, highlight the text, then click on a page or file from the list above.
|
|
|
|
|
Comments (1)
Bru said
at 4:09 am on Mar 30, 2010
Anybody knows if CERN is going to release some of the data from the LHC? that would be awesome, and bleeding edge (writing this comment while following the webcast for the first beam http://webcast.cern.ch/lhcfirstphysics/ )
You don't have permission to comment on this page.