€¦ · web viewto achieve this we utilize aws x1.32xlarge memory optimized compute cluster at...

27
Make DASH a household name in three different $20B industries We propose to run a scientific project promoting medical freedom that will shock 3 large and rapidly growing industries using the DASH blockchain. This project will leave DASH with a unique and charitable reputation in three fast growing and data intensive markets. This will test the DASH blockchain as a platform that could see a tripling of its transaction volume if others see the value in this approach. Make history in the Cannabis industry Make history in the Scientific Publishing market Make history in the Genomics industry $71K (160 DASH): 30 DASH/month. 5 Months *This is an expansion of a very popular and successful proposal (1058:105 Yes:No) https://www.dashcentral.org/p/MedicinalGenomics How? Sequence a controversial Cannabis genome and publish it using a novel crypto-incentivized, crypto-recorded peer review process that will cut out the $25B Scientific publishing market. Sequence the genome with the latest Next Generation Sequencing tools to encourage this 20% compound annual growth rate (CAGR) market to take notice of crypto-sponsored and crypto-published genomics. Publish the genome of a cannabis plant that is the center point of overly broad and controversial cannabis patents to assist in challenging these detrimental IP land grabs on medicinal plants. Publish this on the DASH Blockchain to set a new standard for a $25B/year peer review platform. If the idea of crypto-incentivized, crypto-recorded peer

Upload: others

Post on 07-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

Make DASH a household name in three different $20B industries

We propose to run a scientific project promoting medical freedom that will shock 3 large and rapidly growing industries using the DASH blockchain. This project will leave DASH with a unique and charitable reputation in three fast growing and data intensive markets. This will test the DASH blockchain as a platform that could see a tripling of its transaction volume if others see the value in this approach.

Make history in the Cannabis industry Make history in the Scientific Publishing market Make history in the Genomics industry

$71K (160 DASH): 30 DASH/month. 5 Months*This is an expansion of a very popular and successful proposal (1058:105 Yes:No)https://www.dashcentral.org/p/MedicinalGenomics

How?Sequence a controversial Cannabis genome and publish it using a novel crypto-incentivized, crypto-recorded peer review process that will cut out the $25B Scientific publishing market.

Sequence the genome with the latest Next Generation Sequencing tools to encourage this 20% compound annual growth rate (CAGR) market to take notice of crypto-sponsored and crypto-published genomics.

Publish the genome of a cannabis plant that is the center point of overly broad and controversial cannabis patents to assist in challenging these detrimental IP land grabs on medicinal plants. Publish this on the DASH Blockchain to set a new standard for a $25B/year peer review platform.

If the idea of crypto-incentivized, crypto-recorded peer review catches on, DASH could see a tripling of its transaction volume. The scientific publishing market is a $25B market. There are 2.5M scientific publications every year costing an average of $3k per submission in publication costs and the submission frequency is doubling every 9 years. This $7.5B review market is prime for cutting out the expensive middleman as it is cracking at the seams under ancient Gutenberg incentive models. The internet has made the cost of publication near zero and DASH has already solved the censorship resistance problem, consensus, trust and permanence problems as well. The journals are no longer needed and the market is sprung to remove their parasitic copyright enabled grasp on scientific data and truth dissemination.

Today, reviewers are usually anonymous and free. They have the accuracy of a twitter troll. Journals copyrigh the scientific work themselves and charge the researchers or

Page 2: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

the public to access it later, even if the work is publically funded. Half of the papers published can’t be reproduced and the most prestigious journals have the highest retraction rates. The incentives are broken. Why pay the journals for such poor results? This is not how science should function. Researchers should be able to pay reviewers directly. Cut the copyright out and create a market for good and bad review. These reviews and manuscripts can be published on the DASH blockchain. With the addition of a price signal to the cost of Peer Review, all of the historical misalignments change. Good reviewers will build reputation and demand higher prices. Reviewers simply “rubber stamping” their friends work will be discounted with transparent non-anonymous review. Public and immutable reputation systems change the game theoretics and align incentives for accuracy.

With this simple change, new tiers of review can evolve. Platinum, gold and bronze review will speciate to afford a market where people can pay to have the work partially or fully reproduced. DASH enables this market as a financial vehicle for the reputation and payment system. It enables the notarization engine for the peer review and DASH drive uniquely delivers a distributed drive system for critical scientific data.

There is a strong argument for scientific data to be on a blockchain. The current scientific databasing model often reflects a tragedy of the commons where we experience a 17% annual data decay despite large government

subsidies to manage these databases. Blockchains create a monetary alignment delivering 0% decay per year. In addition to data decay, the databases lack pricing signals and thus are very cumbersome to use as they are not designed with customer pricing signals guiding the feature releases. The process to submit data to NCBI often requires a PhD-day and the uptime of these databases are frequently threatened by budget ceiling debates. The import friction seen in these socialized databases translates into key data from manuscripts being omitted from the submission process and rusting on unreliable servers in the form of “supplemental data”. Keynesian backed databases are not a reliable store for the scientific record.

https://www.cell.com/current-biology/fulltext/S0960-9822(13)01400-0https://peerj.com/blog/post/115284879109/authors-can-now-directly-submit-to-peerj-from-biorxiv/http://www.cdnsciencepub.com/blog/21st-century-science-overload.aspx

If the $7.5B peer reviewed publications get converted to Crypto-bounties and paid in DASH to 2-3 reviewers per journal article, this alone could triple the transaction volume of DASH.

Page 3: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

2.5M publications per year = ~7,000 publications/day * 2-3 reviewers/manuscript to be bountied in DASH. This delivers an estimated $5B-$7B in velocity of DASH. This does not include the notarizations for the review process or the pre-print server market.The growth of the pre-print servers are now hosting 1200 new manuscripts per month and would likely require 100 notarizations per article to include the

manuscript, figures, and all communications between the authors and editors. If DASH were to capture the whole market (unlikely with this proposal alone), it would entail hashing 12K-24K transactions per month. It is

important to underscore that this proposal is a proof of concept. Getting BioRXIV, F1000 and PLoS to adopt such an approach is more likely to happen if we demonstrate it can be done.

The remaining $17.5B in the scientific publishing market is advertisements and subscription fees. Once DASH has the eyeballs for the scientific content and trust delivery platform, both of these markets can be reached with DASH currency and DASH drive.

The DASH investment in alternative media to promote truth in our media is greatly augmented if we are concurrently seen as a beacon of scientific truth.

DASH already has large investments in the Cannabis industry with Alt36, Kannapedia.net and CannMed funding. Sequencing the genome of the most ominous patent issuance will make DASH a household name in the Cannabis Industry.

The genomics market is a quickly emerging personalized medicine market. It changes the medical paradigm of one-sized fits all drugs and herd medicine to one that is more individualistic. Political structures prevent cannabis genomics from being funded. This maket is very data and CPU intensive and various Genomic Blockchain companies are getting funded to use ETH (Nebula, Shivom, LunaDNA, Zenome, Encrypgen). This is the right time to get those companies thinking about the more appropriate architecture of DASH for this $20B market.

Page 4: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

The pre-proposal project is already capturing headlines at Dash Force News, The Cannabis Connection, The Tatiana Show, and the Future Tech Podcast.

https://www.youtube.com/watch?v=pAGwhueEub8 April 27th - http://kscocom/shows/43015-the-cannabis-connection https://www.futuretechpodcast.com/podcasts/kevin-mckernan-medicinal-

genomics-pioneering-study-cannabis-genetics-dna-based-measurement-microbial-risks/

https://www.dashforcenews.com/dash-gaining-traction-in-cannabis-industry-proposal-seeks-funding-for-genome-research/

Future expose is planned at The Broad Institute Cegx conference, The society for in-vitro biology, and CannMed 2018

• https://www.gexcsymposia.com.• https://sivb.org/meetings/home/scientific-program-3.html• Cannmedevents.com

Technical ReviewAbstractDASH has the potential to transform scientific publishing with transparent timestamps, monetization, and censorship resistant publishing. There is a unique opportunity to showcase this potential by taking a highly visible scientific project and leveraging DASH to fund, review, and host the scientific publication of a controversial genome. The ideal genome is the most popular genome that continues to fail to get public funding despite overwhelming public support. We propose to have the DASH network partially fund, peer review and publish the most comprehensive (25X improvement) Cannabis Genome Map ever created and demonstrate why this is an ideal time to make such a move. The economic impact of deciphering the human genome led to a multi-billion dollar human genomics market. We believe the cannabis genome is the next most impactful genome to be completed and decentralized governance may be required to get it done. A better cannabis reference will assist in more rapid molecular breeding programs, better understanding of hermaphroditism, autoflowering genetics and genotype to chemotype predictive markers. It will accelerate more effective breeding and growth of the industry. Demonstrating crypto-incentivized peer review on the DASH network will showcase DASH as leader in the pursuit of truth and censorship resistant publishing. This will be historic.

32 DASH per month for 5 Months. 160 Total

Background

Page 5: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

In the past 10 years DNA sequencing technology has decreased in cost nearly 100,000 fold putting Moore’s law to shame. Despite the remarkable achievements in next generation sequencing, the Cannabis genome has remained mostly unreachable to mankind. In 2011, Medicinal Genomics published the

first draft of the Chemdawg Cannabis genome on the Amazon cloud1. Shortly after this publication van Bakel et al published another draft genome for Purple Kush and compared it to hemp and Chemdawg2. These genome references both ended up in over 100,000 pieces (contigs or contiguous sequences) as opposed to the 20 chromosomes it should assemble into. This nonetheless enabled cannabis fingerprinting, marker assisted selection, cannabis microbiome studies, and the hunt for novel terpenoid synthase genes2-7. Despite the progress enabled with a draft reference, a fully contiguous genome assembly is required to perform epigenetic studies regarding the mechanisms of hermaphroditism, autoflowering genetics, pest resistance and a better understanding of the expression of rare cannabinoids.

The cannabis genome assembly challenge is largely due to the fact that the diploid genome is 65% AT and over 1% polymorphic. For reference, the human genome is 56% AT and has a polymorphism rate 10 fold lower (0.1%). The polymorphism rate is the rate at which the maternal and paternal genomes differ. Each cell in the plant has 2 genomes mixed (one from the mother and one from the father) and the process of isolating DNA can’t tease these apart. As a result one is left with 2 near identical genomes mixed together for sequencing. Since one cannot read telomere to telomere (from the end of one chromosome to the other) with current DNA sequencing technologies, DNA molecules are randomly broken up into smaller but overlapping pieces and read redundantly. This is a process known as Shotgun Sequencing (https://en.wikipedia.org/wiki/Shotgun_sequencing). With enough overlapping molecules, the sequence can be read and reassembled computationally. This is where read length becomes critically important as the longer the read, the more signature you have and the more overlap can be found in repeat rich, entropy poor genomes like Cannabis.

Page 6: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

To further illustrate the computational complexity, imagine a billion piece jigsaw puzzle where 65% of the pieces are Red and Green and where it has been mixed with another puzzle of near identical nature with the expectation of 1 puzzle piece out of 100 being altered. DNA assembly is the job of sorting those two puzzles out into their own respective correct assemblies. For 800 Megabase (Mb) genomes it remains a computationally difficult task in 2018.

This extreme base imbalance combined with a highly polymorphic genome has evaded improvement since 2011. While a few attempts at using low coverage Pacific Bioscience sequence have been submitted to NCBI, those attempts appear to be missing 25% of the genome (only 550Mb where 2011 assemblies point to 750Mb total genome size). These later assemblies do showcase longer contiguity as expected with the longer read length platforms employed. There have also been a few “science by press release” suggestions of other companies working on this but no data has been publically released to verify these claims.

We believe the technical solution to this problem is to use multiple sequencing platforms to error correct and expand the assembly (PacBio, Illumina, MinION), but also higher depth Pacific Bioscience sequencing (50X) and multiple Dovetail libraries to span long chromosomal segments together. Medicinal Genomics owns both MinION and Illumina sequencing platforms and currently outsources the Pacific Bioscience sequence. DNA sequence data from all 3 platforms has already

Figure 2- Shotgun Sequencing overview taken from: https://en.wikipedia.org/wiki/DNA_sequencing_theory#Sequencing_as_a_covering_problem

Page 7: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

been contributed to the project (see links below). The Dovetail libraries have never been deployed on Cannabis and we believe these novel libraries combined with the more modern long read platforms will bring the highest resolution cannabis map ever created8, 9.

The failure to assemble a better cannabis reference has occurred despite the most exciting improvement in read length chemistry in DNA sequencing since its invention by Fred Sanger in 1977. Since the invention of DNA sequencing, read lengths have been limited to 1000 base (bp) reads for 40 years. In 2013, Pacific Biosciences and Oxford Nanopore began making revolutionary improvements with single molecule DNA sequencing by pushing the read lengths out to 40,000bp and 400,000bp respectively. While these read lengths were impressive, they were largely achieved by using noisy single molecule sequencing techniques (10-20% error rates). In 2018 these long read platforms are beginning to improve their raw read accuracy to a point where complex plant genomes are now within reach. The accurate long read sequencers have fundamentally changed the face of DNA sequencing. It is the most disruptive achievement in genomics history next to the sequencing of the human genome itself.

https://www.youtube.com/watch?v=v8p4ph2MAvI

These longer read platforms do have some limitations. The most significant one relates to the length of DNA one can purify from a cell. Manual pipetting often hydrodynamically shears DNA into 30,000bp fragments. The most careful techniques have achieved 1Mb reads but to get 10Mb to 100Mb information from DNA, chromatin cross-linking libraries like Dovetail Chicago libraries are required (Figure 2B). A descriptor from Dovetail.

“Dovetail Genomics is the only provider of The Dovetail™ Chicago and Dovetail™  Hi-C Library Preparation  with HiRise™  Software.  Dovetail™ Chicago and Dovetail™ Hi-C library preparation both use proximity ligation methodology which allows you to generate chromosome level assemblies that are highly contiguous and extremely accurate.  With the Dovetail™ Chicago libray prep. the resulting proximity-ligation reads are mapped back to the draft assembly, and provide long range sequence data for increasing assembly contiguity and improving overall contig order and orientation accuracy. With the Dovetail Hi-C we, follow a similar protocol, but start with intact chromosomes in situ.  This allows us to create assemblies up to full chromosomes in length. The chemistry and the software were developed side by side so the software is optimized for accurate genome assembly (not just a publically available tool with unknown specificity). In just weeks, you can have a complete reference genome that is highly contiguous, highly accurate, and ready to provide vital insights for your research.”

Page 8: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

In the spirit of all parties having some skin in the game, Dovetail genomics has seen this proposal and offered a 7.2% discount for the project.

I also have published work generating nucleosome positioning maps with Nobel Laureate Dr. Andrew Fire10.

Applications of these technical improvements to cannabis genomics have been slow to mature primarily because most labs that can afford such sequencers have federal overhangs that prevent them from working on a federally illegal plant. Likewise, expensive projects that are simply put public don't always attract private businesses to fund them to completion. If start-ups don't see immediate commercialization of such a public resource, the investors usually tire of the academic trophy pursuit and ask for the entrepreneurs to focus on more immediate products. Private cannabis companies can also be difficult to fund in the cannabis space due to federal banking regulations making the cannabis genome project one that has stalled for 7 years during

DNA sequencings most impressive improvements.

To date, Medicinal Genomics has funded over $300,000 worth of public genomic sequencing in Cannabis and we have been productively organizing others to contribute to the public domain. We have a very strong open source ethos and have even funded other commercial competitors in the space to make their data public. The Lynch et al paper is an example of multiple small and potentially competitive startups in the cannabis space contributing data, time and money to making many cannabis genotypes public. I personally funded the open access fee for this manuscript and many genomic sequences used in the study.

Today, the field of cannabis genomics is mostly utilizing the more affordable short read sequencers (Illumina) to perform “resequencing” of genomes to fingerprint the plants for defensive IP purposes. Kannapedia.net was built to leverage this. All of the data on Kannapedia.net would be vastly improved if mapped against a much more contiguous and complete cannabis reference sequence. We experimented with all public cannabis references available in 2017 and the references from 2011 still map the most reads and are what is currently being used in Kannapedia.net.

Kannapedia.net notarization was recently migrated to the DASH blockchain thanks to another DASH proposal and we hope to complete this migration by having the most complete reference genome be DASH funded and hosted. This DASH

Page 9: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

notarization migration occurred within 30 days of DASH funding and has been noticed in the cannabis community11, 12. DASH is the lead sponsor for CannMed 2018 and this will be a prime conference to reveal progress on the assembly.

To address this scientific challenge, we propose the utilization of the DASH network to drive the assembly of an improved Cannabis Genome reference in a manner that has multiple other parties with “Skin In The Game”. This Project will demonstrate DASH cryptocurrency can fund critical and popular medically relevant science where the political system will not. This will showcase cryptocurrency as being responsive to popular scientific demands and in a manner that demonstrates more distributed storage on DASH Drive than what is traditionally seen at NCBI and other tax run genomic databases. Current databases at NCBI are prone to government shutdown during budget ceiling debates and there is reasonable apprehension in the cannabis field using this as an exclusive store of Cannabis data. Databases that have a pricing signal are also more disaster resistant. While we don't object to using these public resources, we believe there is good reason to mirror these data in other places given the current administrations change in opinion on Cannabis. Given the genome sequence materializes faster than DASH Drive, we have placed contingency hosting funds on MEGA.nz in the budget.

The project will also demonstrate that the scientific peer review process can be dis-intermediated with cryptocurrency. By utilizing crypto incentivized peer review (RAIDreviews.org), we can dis-intermediate pay wall journals and the tired copyright system. These crypto-bounty based audits on the project also enhance transparency and accountability with the DASH network. We propose two such audits for this project where $1000 crypto bounty will be offered to run peer review diagnostics on the state of the assembly.

The project can be incrementally monitored for success or failure and will provide high visibility of the DASH DAO to the scientific community. We believe this is increasingly important as Facebook, Google, YouTube and other entities ban crypto advertisements.

The best advertisement is the work you stand up for. It is cultural marketing. Leading by example speaks louder than ‘ad’ dollars on a surveillance platform. Instead of Crypto being blamed for the drug trade, turn the tables and demonstrate that crypto is actually on the right side of history funding the very science needed to advance the public discussion. Cannabis legalization has been more popular than any of the last 3 US presidents. When people see a currency that reacts to their opinion faster than the political system can react with fiat monetary policies, they may begin to question the legitimacy of their monetary system and look to DASH as a solution. The cannabis industry is a receptive audience to more just alternatives than the unconstitutional banking coercion they struggle with everyday.

Benefits to the Community

Page 10: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

Recent genomic surveys on cannabis have demonstrated gene duplications in the genes of most medical interest. Single molecule sequencing of THCA Synthase implies 4-6 copies THCA synthase exist in many Type I cannabis genomes 13-15. Likewise, many of the genes in the terpenoid and cannabinoid synthesis pathway appear replicated and hyper-polymorphic. This is unsurprising given the intense selective pressure placed on these genes. Breeders have been selecting for increased terpenoid and cannabinoid expression since prohibition began prosecuting on total plant weight and this selection has resulted in plants with replicated gene content in these pathways. A perfected cannabis reference genome would resolve these duplication events into their precise chromosomal locations and enable the classification of variants that drive higher chemical expression.

The main economic benefit of a better cannabis reference is related to the price of DNA sequencing. Long Read technologies required to build the first map of cannabis can be 100X more expensive than shorter read resequencing platforms. Once the human genome was in less than 1000 pieces, cheaper short read sequencers could be used to scan for most of the clinical utility. This is reported to have created 310,000 jobs and over $796B of economic gain16. While this report suffers from some Keynesian logic a more conservative estimate is the anticipated annual $26B DNA Sequencing market in 2025 growing with a 20% CAGR. The current DNA sequencing market is mostly human sequencing with Oncology, Clinical investigation, reproductive health and organ transplant HLA typing taking over 75% the total DNA sequencing market17.

This makes the much more affordable target capture and resequencing efforts available on the human genome, far more challenging and less effective in the cannabis genome today. Building a better public reference would bring all researchers in the cannabis space the capacity to more effectively utilize much cheaper short read sequencers and help to decentralize cannabis genetics. Likewise, resolving these medically relevant copy number alterations would help link genotypes to predicted chemotypes and enable many more researchers to better annotate these important medicinal pathways.

Strategy and Methods High Molecule Weight (HMW) DNA purification is critical to enabling long read sequencers. Your DNA read length is limited to the size of the molecules you present the long read sequencers. We have perfected DNA purification methods in collaboration with New England Biolabs and they have already run ~10Gb of Pacific Biosciences sequence on the Blueberry Cheesecake genome (link below). This was a voluntary contribution to the project and demonstrates collaborators with substantial “skin in the game”. Another 35Gb of Pacific Bioscience sequence is needed and the DNA is ready to go once funding is in place. These methods usually produce 10-30kb reads that need to be assembled into a draft assembly.

The Blueberry Cheesecake genome was selected as it is a hermaphrodite prone Type II cannabis plant that is Terpinolene and Beta caryophyllene dominant. To

Page 11: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

date, cannabis sequencing efforts have primarily focused on THCA dominant Type I plants (Chemdawg, LA Confidential, Purple Kush) and CBDA dominant Type III plants (Cannatonic, USO-31, Finola), while Type II plants have been the subject of controversial cannabis patents. Terpinolene has published benefits in anxiety and cancer18, 19. Beta caryophyllene is often characterized as a non-cannabis exclusive cannabinoid with CB2 receptor affinity20, 21. For more information regarding Cannabis plant type nomenclature see “The inheritance of chemical phenotype in Cannabis sativa L”.22.

The other critical factor in sequencing a plant genome, is access to milligrams quantities of purified DNA. We have this DNA already purified. The moment you run out of DNA, your sequencing project is over. As a result it is critical to have access to excessive plant DNA to make multiple long read libraries.

Additionally, small amounts of DNA are available from multiple siblings of this plant. These have been preliminarily sequenced with Medicinal Genomics StrainSEEK system. This test sequences 820,000bp – 3.2Mb in these plants to establish a fingerprint for phylogenetic comparison. By doing so we can see that these lines are more homozygous than the average sample we tend to sequence for customers. Assemblies are most successful with less heterozygous genomes.

Page 12: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

Draft AssemblyDNA assembly is a RAM intensive process. For complex plant genomes, over 512Gb RAM is usually required. To achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or $4500 per attempt. The Canu assembler was chosen based on its publish success but also simplicity in install and assembly monitoring features. It is important to have Canu real time feedback on compute jobs that run for days or weeks at $13/hour. Canu also enable “restart from” command line control of the process, greatly limiting Compute costs. Dr. Harkins is listed as an assembly consultant at $500/hour. We estimate 3 hours of work for every assembly setup and attempt. Five incremental assemblies are budgeted for.

Page 13: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

University of Florida has also offered a cheaper alternative than AWS given they are credited for the project. We will attempt the more affordable offer first to keep costs low. Dr. Harkins can accept DASH directly. Dovetail will also be attempting an Assembly improvement with their HiRise software.

Dr. Harkins resume:• https://mega.nz/#!cBx0FTBZ!h7TJ5MsU8qzp1lrONBInKbC6iiEETy3kSLiw2YM4FwI• http://canu.readthedocs.io/en/latest/quick-start.html• https://aws.amazon.com/ec2/pricing/on-demand/

Assembly polishingSingle molecule sequencers often produce single nucleotide insertion or deletion errors (indels) that can be polished with highly accurate short read sequencer like the Illumina HiSeq. Medicinal Genomics will provide HiSeq data to help error correct the Pacific Biosciences long read data. In addition to indel polishing, repeat structures longer than the 30Kb readlength will leave branches or forks in the assembly graph. To resolve these repeat structures, Dovetail libraries will be made. These libraries leverage chromatin cross-linking and can scaffold genomes into Megabase sizes often providing full chromosome assemblies. These libraries are expensive to make and will be approached incrementally to constrain costs.

• https://dovetailgenomics.com/wp-content/uploads/2018/02/Hi-C-kit_productHighlight.pdf

Project success and monitoringMonitoring the success of DASH proposals can be challenging in highly specialized fields. To address this we propose a mid-project crypto bounty incentivized peer review be performed with the project. These RAID reviews (http://raidreviews.org) can be completed in under a week and provide valuable independent insight to the DASH community on the status of a given project. In the case of the pilot review for the Kratom genome project, 0.1BTC was used to incentivize a 4 day peer review of genome assembly status. In addition to a mid project peer review, we plan to follow the spirit of the Bermuda Principles in adopting rapid public data release of all raw DNA sequence generated thus enabling even our competitors to pick up the data and compete with us. Pre-Print publications may be pursued in addition to full publication in a respected journal. Only open access journals will be considered and the Crypto-Bounty peer review process can be compared to the more traditional methods. Traditional methods may take 4-5 months compared to the weeks seen in the crypto-bounties projects. I will also be speaking at 2 or more conferences before CannMed and would plan to speak on this topic. One talk is at The Broad Institute in Cambridge MA. Another is the Society for In vitro-Biology.

• https://www.gexcsymposia.com.• https://sivb.org/meetings/home/scientific-program-3.html

Page 14: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

RisksThere is risk that this genome assembly is harder than we are anticipating but we have significantly derisked this after 7 years of experience assembling various cannabis genomes and other model genomes. We have more experience in Cannabis genomics than any team on earth and believe this is the right strategy and the technology is finally here to do it right.

There is risk others will publish an assembly first. There is evidence of 3-4 other small private laboratories attempting to improve the genome reference sequence (Phylos, Steephill labs, Anandia labs, SunRise genetics). All are placing their DNA sequencing efforts in government databases that go offline every time there is a government budget crisis. None of these labs have government funding. Very few own their own DNA sequencers. As a result, their efforts are likely limited by the capacity to move plant tissue to various service providers required to make this project mature. These labs are mostly focused on sequencing either Type I or Type III plants. No one to date is sequencing a Type II plant (expresses both THCA and CBDA).

Even if these labs succeed, the field will require more than one Cannabis reference genome and the cross comparisons of each dataset will be highly valuable. It is likely Type I-V cultivars will each need their own reference sequence. We suspect the industry will also sequence hemp cultivars for seed and fibre production. Ruderalis and landrace sativa cultivars will likely be sequenced just to understand the historical genetics that may have been lost due to prohibition. Likewise, male, female and hermaphroditic genomes will eventually be sequenced. In summary, the field is so young and desperate for data that there is limited risk of being ‘scooped’. A collaborative comparison study is more likely to emerge than a zero sum game.

The process of having this funded, reviewed and stored by a cryptocurrency blockchain is unique and will make a resounding impression in public opinion regarding cryptocurrencies delivering the truth in scientific fields where the political process has failed us.

Pilot GoalsImprove the Cannabis Genome Reference contiguity 50 Fold (N50s of 4Kb to 200Kb)Pacific Bioscience sequence coverage of 50X should deliver 50-100Kb N50s.Dovetail Chicago Libraries should increase contiguity to 200Kb.Presentation of project status at CannMed 2018, Oct 22-24 (DASH is a Sponsor).

CostsThis project will take time and resourced from Medicinal Genomics. We have not placed any of these management fees into the proposal as we believe in having Skin in the Game. For major item we have provided links to the quotes from various providers.50Gb (10 Chips) of Pacific Biosciences Sequel data ($17,300)

Page 15: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

HWM DNA Purification Amplicon Express ($3,550)• https://mega.nz/#!kYRBFBZT!X4xYQ6WXvuXxwsUOPwwZEi8jEk_JisxGBjYN2e1Ubu8

Dovetail Chicago and HiC Library generation & High Rise Assembly($11,450)• https://mega.nz/#!IBQAQIjB!A4Tsxz8vXPJxc20B-qa1qt6_676coPAluCbaV8ngjFM• https://mega.nz/#!sIpQCCLA!8peWWhKFRFPyY5aqS44oEGtFW2EaJblgq8CgCiNYIas

Dovetail Illumina Library Sequencing at Medicinal Genomics ($25,000) ($10,000)DNA Assembly using Canu at AWS ($30,000 or $6,000 x 5 attempts) ($12,000)Optional Oxford Nanopore scaffolding with MinION 1D^2 reads ($10,000)RAID review (2x $1000)Final Peer Review and Publication ($3500)Contingency Data Hosting Fees in the event DASH Drive isn’t ready ($1500) 5 DASH Proposal fee ($2550)Travel and consulting ($550)

$71,275 At $475/DASH = 150 DASH or 30 for 5 months.

We propose a DASH price that is halfway between the DASH Mayer Multiple of $600USD/DASH and current day price of $375 to derisk currency volatility.

Page 16: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

https://cryptovoices.com/200-day-moving-average-price-multiple/

Future DirectionsWe envision a future proposal to encompass the build of a more scalable genome browser than Kannapedia.net. Upgrades to this would include more decentralized store of the data, with higher searchability and better genome annotation viewers. Until more information regarding the finished genome size and DASH Drives architecture are in hand this is difficult to price into todays work. Preliminary estimates are in $60K range but we are still scoping and quoting this aspect of the work.

Donations already made by collaborators with Skin in The GameThis proposal marks the first time this Cannabis genomic data has ever been placed public. We have also attached a Case Study where this strategy was used to improve a more complex genome than Cannabis (Plantain genome). This case study implies a very high likelihood of success for the strategy selected.

10Gb of Pacific Biosciences sequence from NEB• https://mega.nz/#!dZpHECbQ!MQGCf0AcQ3g-LyzlnmWsjdbLWgLeg_X0x1Q8CAvB2B0• https://mega.nz/#!4Mx0mDLb!

_FQ2CA13nmzu8w4ieK7MEgbtcMqwMp8MYkWCCAcF4Y0

6 StrainSEEKs of Blueberry Cheesecake siblings• http://www.kannapedia.net/strains/rsp10573/• http://www.kannapedia.net/strains/rsp10684/• http://www.kannapedia.net/strains/rsp10682/• http://www.kannapedia.net/strains/rsp10683/

http://www.kannapedia.net/strains/rsp10670/• http://www.kannapedia.net/strains/rsp10671/

1 flow cell of MinION (1Gb) of Oxford Nanopore data.• https://s3.amazonaws.com/nanopore-cannabis/B2C2_34dir.fastq

Dovetail genomics provided a 7.2% Discount after reading the proposal• https://mega.nz/#!sIpQCCLA!8peWWhKFRFPyY5aqS44oEGtFW2EaJblgq8CgCiNYIas

Case Study on the Plantain genome project• https://mega.nz/#!ZIJywYAa!b2-qmYsEoDRMAlfQNubINxxGVBxk3sWCw0cR2O7cBlc

Recipient track recordMedicinal Genomics was a fortunate recipient for a small DASH grant that has successfully been used to enable Kannapedia.net to migrate its Genomics notarization service over to the DASH blockchain. This was achieved with Stampd.io in 30 days from funding. The other DASH funds are being successfully utilized in CannMed 2018 sponsorship. These funds have enabled successful organization of an outstanding speaker list and further recruitment of interested sponsors. We

Page 17: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

currently have a track record of being very responsive to DASH inquiries and delivering on goal.

• http://www.cannmedevents.com/#speakers

The lead organizer of this proposal was the Team Leader for R&D at the Whitehead Institute/MIT center for genome research. He is an author on the Human Genome Project23 and well published in the DNA sequencing field having invented and developed two of the Next Generation Sequencing platforms (SOLiD and Ion Torrent)24-26. Tim Harkins, PhD. is an assembly adviser to the project and is equally well published in the genomics field with focus on long read DNA sequencing technology development and utilization.

My biography can be found here:• https://mega.nz/#!4V5FTKSA!i5wYrDI9TTXXjMMjyZ8ZirWCdjBw5susPg8TdAeepTQ

1. Stafford. Sequencing the Cannabis Genome: Impact, History, and Future. American Botanical Council. 2011;8(10).2. van Bakel H, Stout JM, Cote AG, Tallon CM, Sharpe AG, Hughes TR, et al. The draft genome and transcriptome of Cannabis sativa. Genome biology. 2011 Oct 20;12(10):R102. PubMed PMID: 22014239. Pubmed Central PMCID: 3359589.3. McKernan K, Spangler J, Helbert Y, Lynch RC, Devitt-Lee A, Zhang L, et al. Metagenomic analysis of medicinal Cannabis samples; pathogenic bacteria, toxigenic fungi, and beneficial microbes grow in culture-based yeast and mold tests. F1000Research. 2016;5:2471. PubMed PMID: 27853518. Pubmed Central PMCID: 5089129.4. McKernan K, Spangler J, Zhang L, Tadigotla V, Helbert Y, Foss T, et al. Cannabis microbiome sequencing reveals several mycotoxic fungi native to dispensary grade Cannabis flowers. F1000Research. 2015;4:1422. PubMed PMID: 27303623. Pubmed Central PMCID: 4897766.5. McKernan KJ. The chloroplast genome hidden in plain sight, open access publishing and anti-fragile distributed data sources. Mitochondrial DNA Part A, DNA mapping, sequencing, and analysis. 2016 Nov;27(6):4518-9. PubMed PMID: 26486305.6. Booth JK, Page JE, Bohlmann J. Terpene synthases from Cannabis sativa. PloS one. 2017;12(3):e0173911. PubMed PMID: 28355238. Pubmed Central PMCID: 5371325.7. Lynch. Genomic and chemical diversity of cannabis. Critical Reviews in Plant Sciences. 2016;35(5-6).8. Research signals arrival of a complete human genome. PHYSORG.https://phys-org.cdn.ampproject.org/c/s/phys.org/news/2018-03-human-genome.amp.

Page 18: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

9. Jain M, Olsen HE, Turner DJ, Stoddart D, Bulazel KV, Paten B, et al. Linear assembly of a human centromere on the Y chromosome. Nature biotechnology. 2018 Mar 19. PubMed PMID: 29553574.10. Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H, et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome research. 2008 Jul;18(7):1051-63. PubMed PMID: 18477713. Pubmed Central PMCID: 2493394.11. Amirault. BLOCKCHAIN TECHNOLOGY IS A MODERN REQUIREMENT FOR THE CANNABIS INDUSTRY. Growers Network. 2018;http://growersnetwork.org/market-projections/blockchain-technology-modern-requirement-cannabis-industry/.12. Scott. Bitcoin 'Too Slow' for cannabis industry until block size changes. Bitcoinistcom. 2017;http://bitcoinist.com/bitcoin-slow-cannabis-block-size/.13. Weiblen GD, Wenger JP, Craft KJ, ElSohly MA, Mehmedic Z, Treiber EL, et al. Gene duplication and divergence affecting drug content in Cannabis sativa. The New phytologist. 2015 Dec;208(4):1241-50. PubMed PMID: 26189495.14. Onofri C, de Meijer EPM, Mandolino G. Sequence heterogeneity of cannabidiolic- and tetrahydrocannabinolic acid-synthase in Cannabis sativa L. and its relationship with chemical phenotype. Phytochemistry. 2015 Aug;116:57-68. PubMed PMID: 25865737.15. Kevin J McKernan YH, Vasisht Tadigotla, Stephen McLaughlin, Jessica Spangler, Lei Zhang, Douglas Smith. Single molecule sequencing of THCA synthase reveals copy number variation in modern drug-type Cannabis sativa L. BioRXIV. 2015.16. Tripp. Economic Impact of the Human Genome Project. https://wwwbattelleorg/docs/default-source/misc/battelle-2011-misc-economic-impact-human-genome-projectpdf?sfvrsn=6. 2011.17. Research GV. Next Generation Sequencing (NGS) Market Size & Forecast By Application https://wwwgrandviewresearchcom/industry-analysis/next-generation-sequencing-market. Jan, 2017.18. Aydin E, Turkez H, Tasdemir S. Anticancer and antioxidant properties of terpinolene in rat brain cells. Arhiv za higijenu rada i toksikologiju. 2013 Sep;64(3):415-24. PubMed PMID: 24084350.19. Okumura N, Yoshida H, Nishimura Y, Kitagishi Y, Matsuda S. Terpinolene, a component of herbal sage, downregulates AKT1 expression in K562 cells. Oncology letters. 2012 Feb;3(2):321-4. PubMed PMID: 22740904. Pubmed Central PMCID: 3362481.20. Legault J, Pichette A. Potentiating effect of beta-caryophyllene on anticancer activity of alpha-humulene, isocaryophyllene and paclitaxel. The Journal of pharmacy and pharmacology. 2007 Dec;59(12):1643-7. PubMed PMID: 18053325.21. Gertsch J, Leonti M, Raduner S, Racz I, Chen JZ, Xie XQ, et al. Beta-caryophyllene is a dietary cannabinoid. Proceedings of the National Academy of Sciences of the United States of America. 2008 Jul 1;105(26):9099-104. PubMed PMID: 18574142. Pubmed Central PMCID: 2449371.

Page 19: €¦ · Web viewTo achieve this we utilize AWS x1.32xlarge memory optimized compute cluster at $13.38/hour. Each assembly attempt is expected to take 336 hours on this platform or

22. de Meijer EP, Bagatta M, Carboni A, Crucitti P, Moliterni VM, Ranalli P, et al. The inheritance of chemical phenotype in Cannabis sativa L. Genetics. 2003 Jan;163(1):335-46. PubMed PMID: 12586720. Pubmed Central PMCID: 1462421.23. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. PubMed PMID: 11237011.24. McKernan KJ, Spangler J, Helbert Y, Zhang L, Tadigotla V. DREAMing of a patent-free human genome for clinical sequencing. Nature biotechnology. 2013 Oct;31(10):884-7. PubMed PMID: 24104751.25. McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y, Tsung EF, et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome research. 2009 Sep;19(9):1527-41. PubMed PMID: 19546169. Pubmed Central PMCID: 2752135.26. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011 Jul 20;475(7356):348-52. PubMed PMID: 21776081.