I have a table of patient data with bacterial isolates at BL (baseline) and FU (follow-up) time points. Can someone teach me how to retrieve the patients who have isolates at BOTH baseline and follow-up time points. In this example, it will be patients 5 and 6.
Patient Time_point isolate
1 FU A3147327
2 FU 6058fd73
3 BL 9c46dfec
4 FU 3e1e8c80
5 FU1 f1c7ef1b
5 BL 3ec00a86
6 BL de156dd6
7 FU 7fd176c3
8 BL F0401825
5 FU2 358b642e
6 FU fdc68092
10 BL 03a5a342
It would help to know more about your schema, your data and the problem to solve to come up with a probably better solution but this should do the job:
SELECT patient FROM isolates a WHERE time_point = 'BL'
AND EXISTS (SELECT patient FROM isolates b WHERE b.patient = a.patient AND time_point like "FU%");
First we’re looking for all the patients that have a “BL” time point and return the patients, that also have any time point beginning with “FL”.
Thank you for your prompt response and the codes. It seemed to work but I have to check a little further.
My scheme actually contains 19 tables (https://hcvdb.med.ubc.ca/about/database-schema/) but I simplified the question so that I won’t confuse other people. By the way, the isolates are actually hepatitis C virus, not bacteria
I adapted your codes and retrieved the patients and their baseline isolates. Although it did not give me exactly what I wanted, it did give me a path for further optimization. What I want is to retrieve all of the followings:
(1) study_participant_id (same as patient)
(2) isolates from BL
(3) isolates from FU (FU, FU1, FU2)
The codes I wrote below only generated the patients with the isolates from BL, not from both BL and FU. I need to further optimize the codes but it is a start.
Thanks a lot for your help - it made my day :). It is extremely helpful.
Kind regards
Anita
My codes:
use shared;
SELECT study_participant_id, seq_kind, case_id
from cases a
inner join clinicalisolate
using (case_id)
where seq_kind = ‘BL’ and
exists (
select study_participant_id
from cases b
inner join clinicalisolate
using (case_id)
where a.study_participant_id = b.study_participant_id and
seq_kind = ‘FU’
);
When asking for your schema I was just asking for the CREATE TABLE statements of the tables you need to query to see which indexes we could use.
Your query would just consider seq_kinds of "FU"not “FU1”, “FU2”,… That’s why I used the like "FU%" condition. And I don’t think you need all that inner joins in your sub-queries. Concentrate on retrieving the correctcase_ids for your problem. You can retrieve additional data in a second step.
Thanks, Sam.
That’s basically the path I am going to take - adapt your script, then further modify it to retrieve both the FU and BL isolates.
Your FU% is actually what I need but I was just playing around this morning to see if it works.
I need to inner join to my other tables because the patient ID and the isolate ID are actually in two different tables in my database. In addition, I also need to retrieve the actual sequences (yet from another table) not just the isolate_ID. I still have a lot to optimize but I feel that your script inspired me to start in the right direction.