I have had deep concerns about PCR since the start. Unfortunately, when I have presented the available evidence, in the form of the Instand report, which showed that positives for SARS-CoV-2 were being triggered from common cold samples, few would listen. Now we are in 2023 and not much has changed: they still deny the results from this report and defend PCR. Same when I show that much of the UK covid PCR testing was testing for one gene - I get barely a shrug. The whole edifice rests on sand.

Expand full comment

This work seems very important to me... I hope to have time to study it soon. Bravo for going in this direction! Genomic technology and its limitations must be unpacked. It has captured too much territory.

Expand full comment
Jul 27·edited Jul 27

Why are the ends of the genome important Ben? There are a lot of really great biological answers to this question. They are anchor points for viral proteins. The length of the end of the genome determines the stability of the genome. There are tertiary structures that interact with the rest of the genome.

But you never answered this because you have no basis or understanding of biology.

I showed you a paper that clearly mapped out the 5' and 3' regions. Of course it's going to take quite a bit more bioinformatics to tackle that paper then you are capable of, so you move the goal posts to "everything has to be sequenced in a single read"

When I show you a paper with a single read spanning 95% of the genome, that's not good enough, because virus denier reasons. Never mind that with in the single read the alignment to the assembled reads are exactly the same (invaliding what? three posts of yours casting doubt on assembly algorithms)

Your next substack should do the same analysis as in the entire paper:


Then at the end apologize for being so wrong.

Expand full comment

Neil Ferguson can show you the value of modeling with fabricated data.

Your article lacks citation to work that is already done in this field. It leaves your readers with the impression that you haven’t read it and are likely unversed in the stew you are brewing.

The manufacturers are likely referring to the fact that heat and Magnesium are more random than RNAseIII based methods but they are not purely random.

Also the fragmentation isn’t the only step in that long process that can introduce bias.

Had you read the vast literature on RNA-seq instead of asking a leading question of a manufacturer of a kit (who will of course tell you it’s random) you would have gotten the answers shown in this paper, which is that there are many sources or bias outside the fragmentation step.

This doesn’t address the fact that you claimed those end sequences didn’t exist and yet they were readily found by just using BWA-mem.

This is an elaborate smoke screen to the fact that you didn’t know how to map reads and made bold claims from a Substack with no supporting citations and a failure to double check your own work.


Expand full comment

I think you'll find the tail will always mutate to ensure transmission & to find it's spillover host.

Expand full comment

In the Typescript code you used to simulate the reads, the reverse read is always the exact reverse complement of the forward read, so the reverse read always starts at the exact same position of the reference as the forward read. However that's not usually the case in the real reads by Wu et al., and when I used Bowtie2 to align Wu et al.'s reads against Wuhan-Hu-1, the average absolute difference in the starting positions of the paired reads was about 50 bases:

brew install bowtie2 samtools

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR109/081/SRR10971381/SRR10971381_{1,2}.fastq.gz

curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=MN908947.3'>sars2.fa

bowtie2-build sars2.fa{,2}

bowtie2 -p4 --no-unal -x sars2.fa -1 SRR10971381_1.fastq.gz -2 SRR10971381_2.fastq.gz|samtools sort ->sorted26.bam

samtools view sorted26.bam|awk '{x=$4-$8;if(x<0)x=-1;s+=x}END{print s/NR}'

In the SAM format, the 4th field shows the starting position of the main read and the 8th field is the starting position of the paired mate. The 9th field shows the template length, which is the length between the start of the main read and the end of the paired mate. The Samtools manual uses the term template length as synonymous with the absolute value of the insert size, even though sometimes a differentiation is made where the term insert size is considered to include the adapters and the fragment size is not, and it's also possible for the aligned reads in a SAM file to include the adapters: https://www.biostars.org/p/95803/.

When I aligned Wu et al.'s reads against Wuhan-Hu-1, if template lengths of zero are excluded, the mode value of the template length was -186 for the forward reads but 186 for the reverse reads: https://media.discordapp.net/attachments/1093243194231246934/1127505846037921832/1.png. The forward reads usually had a negative template length, because in the sequencing protocol used by Wu et al., the forward reads are sense for the cDNA and therefore antisense for RNA, so the reverse reads are actually what people would intuitively consider the forward reads. However in your simulated reads, the template length is always the same as the read length or negative read length. And in Wu et al.'s reads, the reads which match the beginning of the genome of SARS2 all come from the reverse reads, and the reads which match the longest part of the poly(A) tail come from the forward reads, which is another aspect of the real reads which is not simulated accurately by your Typescript code.

Your simulated reads also don't reproduce these sequencing artifacts that McKernan mentioned: "Reads don't map to the margin of a reference because the given insert size you have selected in library construction won't exist. If you have 200bp inserts, you won't get the 1st ~50bases of the reference. This is known by all in space except Ben who thinks it's a discovery. [...] Secondly, there are biological reasons why those pieces of DNA aren't captured. The 5' end has a 5'cap which can't be cloned unless it's decapped. The 3' end is a variable poly A tail which won't sequence or amplify without polymerase slippage. reduced end seq is expected." (https://twitter.com/Kevin_McKernan/status/1674041259931979777)

Expand full comment