![Page 2: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/2.jpg)
Big Data Landscape
Large Hadron Collider:- Uses: Grid- Volume: ~15 PB per year (~4PB @ SURFsara)- Type of data: structured
![Page 3: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/3.jpg)
Next Generation Sequencing (GoNL):- Uses: Grid, Cloud, Cluster- Volume: ~100 GB to 300 TB- Type of data: various formats and noise
Big Data Landscape
![Page 4: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/4.jpg)
Big Data Landscape
Information retrieval and NLP- Uses: Hadoop, Cloud- Volume: ~70 TB- Type of data: Text, unstructured
http://bit.ly/173ddfz
![Page 5: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/5.jpg)
Where having and exploiting data leads to insights:
- Brainscanr- Healthmap
Effectiveness of Data
![Page 6: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/6.jpg)
• Lots of open data:- Open data Nederland- CitySDK- Community of Amsterdam- Rijkswaterstaat- Twitter- Facebook- Google
• Different formats:- Excel files- JSON- Webservices
• Different quality:- Noise- Missing values- Availability
(Open) Data Sources
![Page 7: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/7.jpg)
Capacity:
• CPU cores
• Hard drive space
• Network bandwidth
Solutions:
• Scale up: get faster tools
• Scale out: work with more tools
Complexity:
• Data:- Noise, missing data- Formats- Access
• Distributed computing- Failures- Parallel programming
Solutions:
• Data: deal with it
• Distributed computing:- Super/Cluster computer- Grid- Hadoop
Computing Big Data
![Page 8: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/8.jpg)
Computing Big Data
![Page 9: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/9.jpg)
Computing Big Data
![Page 10: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/10.jpg)
SURFsara provides:
1. Infrastructure: Supercomputer, clusters, grid, cloud, hadoop
2. Support: development, parallelization, consultancy
3. R&D: piloting new technologies
4. Hosting datasets for common use
What SURFsara Offers
![Page 12: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/12.jpg)
www.sendsteps.comPrepare to react; keep your phone ready!
TXT 1
2
Text to +316 4250 0030
Type Session <space> WS4 <space> your answer
Internet 1
2
Go to sendc.com
Log in with Session
Posting messages is anonymousNo additional charge per message
3 Type WS4 <space> your answer
![Page 13: Big Data Infrastructure for Scientific Computing](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815aa4550346895dc833aa/html5/thumbnails/13.jpg)
What kind of technologies would you consider using in order to deal with technical Big Data challenges?
Internet Go to sendc.com and log in with Session Type WS4 <space> Your answer
TXT Send to 06 4250 0030: Session Type WS4 <space> Your answer