Karczewski lab

Interpreting genetic variation using large genomic datasets.

Research

At the Karczewski lab at Massachusetts General Hospital and the Broad Institute, our research is focused on assembling and analyzing massive public datasets of genetic variation, and developing novel strategies using these to aid in the interpretation of putative disease variants, in order to better distinguish causal disease variants and improve our understanding of human biology.

Broadly speaking, we are interested in genome sequencing and its future role in our daily lives. With the age of rapidly decreasing sequencing costs, it is not difficult to imagine an age where personal genetic information plays an important role in medicine and daily life. We are constantly discovering more and more of the genetic basis of diseases, but much work has yet to be done in fully explaining the genetic components of disease and other phenotypes.

A non-exhaustive list of projects

Interpreting genetic variants:
- leveraging the demographic history of human populations
- using first principles of the genome (LOFTEE)
Understanding the function of genes and the genome:
- through the lens of natural selection
- using rare variant associations
- using functional genomics data
- using directional perturbation of genes

People

Konrad Karczewski

Assistant Professor

I am a computational biologist working on interpreting genetic variation from large-scale datasets. As exome and genome data grow to massive scales, the methods and analytical frameworks need to scale at the same rate, and interpreting results from large-scale analyses is an exciting challenge. My group builds methods to interpret genetic variation, to learn about the function of human genes and the regulation of the genome as a whole. I earned a B.A. in Molecular Biology from Princeton University and a Ph.D. in Biomedical Informatics from Stanford University.

@konradjk

Computational Scientists

Bram Gorissen

Bram is a computational scientist in the Karczewski Lab at the Broad Institute. He received his PhD in operations research from Tilburg University. He was an assistant professor at VU Amsterdam where he created and gave courses on big data and on convex optimization. He developed Nymph, the fastest exact inverse planning algorithm for radiation therapy, which is currently used by Massachusetts General Hospital. Previously he worked in the Fishell lab on, i.a., discovering enhancers.

Fellows

Jeremy Guez

Jeremy is a Postdoctoral Research Fellow in Karczewski Lab at ATGU and the Broad Institute of MIT and Harvard. He holds a BS degree in Biology from Sorbonne University, a MS in Genetics from Paris-Saclay University and a PhD in Population Genetics from the French National Museum of Natural History. During his PhD, he studied the impacts of cultural transmission of reproductive success (CTRS) on population genetics and worked on its inference from human genomics data using machine learning methods. He is interested in research at the intersection of medical genomics, population genetics and machine learning.

Mohamed El-Brolosy

Mohamed El-Brolosy is a junior fellow of the Harvard Society of Fellows, and a visiting scientist at Jonathan Weissman's lab at the Whitehead Institute. Prior to that, he obtained his PhD at the Max Planck Institute for Heart and Lung Research with Didier Stainier. Mohamed is broadly interested in genetic robustness. Having discovered the phenomenon of Transcriptional Adaptation to mutations in his graduate studies, he is hoping to understand its potential therapeutic implications, how it affects the landscape of genetic diseases, and how different kinds of mutations within a given gene can lead to different phenotypic outcomes.

Alejandro Martinez-Carrasco

Alejandro is a Postdoctoral Research Fellow in the Karczewski Lab at the ATGU and the Broad Institute of MIT and Harvard. He earned his PhD in Bioinformatics from University College London, where he studied the genetic factors influencing longitudinal traits in Parkinson’s disease. His research involved developing frameworks to perform downstream functional analyses by integrating multi-omic data. Currently, his work focuses on the intersection of bioinformatics and deep learning. He is particularly interested in leveraging deep learning models to predict pathogenicity and understand how genetic and multi-omic variations contribute to disease etiology.

Henry Taylor

Henry obtained his PhD from the University of Cambridge as an NIH-Cambridge scholar, where he studied the genetic underpinnings of type 2 diabetes using both molecular studies of pancreatic beta cells and large-scale analyses in diverse biobanks. Prior to his PhD, Henry graduated from Duke University with a degree in computational biology. Jointly mentored by Drs. Melina Claussnitzer, Ben Neale, and Konrad Karczewski, Henry continues to explore the regulatory mechanisms of diabetes-associated loci using natural variation screens of adipocyte differentiation. Outside the lab, Henry will take any excuse to be outdoors and, as a result, is remarkably average at many activities. He particularly enjoys hiking and traveling.

Graduate Students

Raunak Kundagrami

Raunak is a PhD student in Harvard Biophysics Program, in the Karczewski lab at the Broad Institute, and in the Sunyaev lab at Harvard Medical School. He holds bachelors degrees in mathematics and biochemistry from the University of Chicago. While at Chicago, he worked on population genetic models of two locus evolution and also studied the co-evolution across taxa of the neuronal Robo-NELL receptor-ligand pair. He is interested in using theory and mathematics to answer questions in genetics and evolution, and during his PhD plans to study the functional consequences of natural selection on non-coding DNA.

Sophie Parsa

Sophie is a computational associate in the Karczewski lab and a medical student in the Health Sciences and Technogogy program at Harvard Medical School. She is interested in creating multimodal disease prediction models, integrating multi-omics with routine clinical data, to enhance patient care. Before coming to Boston, she completed a BS in Computer Science and a MS in Biomedical Informatics, both at Stanford University.

Associate Computational Scientists

Wenhan Lu

Wenhan is an Associate Computational Biologist in the Karczewski and Neale Labs at the Broad Institute. She earned her BS Degree in Mathematics at Nankai University and MS Degree in Biostatistics at Yale University. She is interested in developing statistical methods to reveal the underlying messages from large-scale biomedical data, as well as building pipelines for the quality control of large data and data virtualization.

Thomas Opsomer

Thomas is a Computational Associate in the Karczewski Lab at the Broad Institute. He holds an engineering degree from École des Ponts et Chaussées and a Master's in Mathematics and Machine Learning (MVA) from ENS Paris-Saclay. Before joining the Broad, he co-founded an NLP-focused AI startup and later transitioned into biology as a research engineer at the Muséum National d'Histoire Naturelle in Paris, working on sequence-to-function modeling and epigenomics. His current interests lie at the intersection of deep learning and genomics, spanning gene regulation and functional variant interpretation.

Greg Rohlicek

Greg is a Computational Asssociate in the Karczewski lab at the Broad Institute. Greg holds a B.Eng in Software Engineering from McGill University and an MS in Operations Research from Northeastern University. He is passionate about applying optimization and classical statistics (and their software implementations thereof) to enhance machine learning methods used to assess biomedical data. Prior to joining Broad, Greg worked as a research associate at a startup specializing in LLM-based document analysis, and in between his undergrad and master's degrees Greg worked as a Patent Technology Specialist writing patents for software systems and medical devices.

Trisha Karani

Trisha is a bioinformatics specialist in the Karczewski lab at MGH and Broad. She holds a BS degree in Biomedical Engineering and Computer Science (and is currently completing her MS in Computer Science) from Johns Hopkins University. She is passionate about fostering collaboration between engineers and physicians and is broadly interested in applying machine learning to develop better tools and insights on problems in biomedicine. Previously, she has interned at Mayo Clinic and AbbVie developing deep learning models that improve medical research workflow. In her free time, Trisha enjoys playing tennis, watching good shows, and exploring cafes in Boston.

Associate software engineers

Riley Grant

Riley is an associate software engineer working on the Genome Aggregation Database’s (gnomAD’s) Web Browser features and UI. His interests lie in the usage of technology to benefit the public, through sharing of information and functionality while protecting user privacy and data. He received his M.S. in Computer Science from Northeastern University.

Alumni

Julia Sealock (friend of the lab)
Siwei Chen (postdoctoral fellow, co-advised with Ben Neale)
Hannah Jacobs (graduate student, co-advised with Chris Burge)
Rahul Gupta (friend of the lab, Harvard-MIT Health Sciences and Technology program)
Ulrik Stoltze (visiting PhD student, University Hospital Copenhagen)
Adrian Janucik (visiting Masters student, Medical Univ of Bialsytok)
Hannah Nicholls (visiting Masters student, QMUL)

Core tenets

We are committed to training the next generation of computational scientists. A crucial component of this training is building a collaborative spirit in the team, promoting a happy and healthy environment where everyone can excel, and ensuring that we move the science forward together in a rigorous fashion.

What we do

Our core mission involves the use of massive datasets to learn about human disease and the biology of the genome. We value high quality data and code, ensuring reproducibility, and openness to advance human genetics.

Big data

The onslaught of genetic data has arrived, and we are fortunate to work in a time when massive data sizes enable rigorous approaches and robust statistics. However, these data volumes also require special handling: at the Karczewski lab, we take a "cloud-first" approach to computational biology. As we are managing datasets in the 1TiB range (and some crossing the 100TiB threshold), scalable computation is a must. We value our mutually-beneficial relationship with the Hail team, where we build our pipelines in Hail and feed back our progress or issues, and they enable our ideas for massive-scale analysis.

High quality data

One of the cornerstones of efficient scientific progress in a field is the veracity of the literature, which creates confidence and builds trust. On the other hand, high dimensional datasets that represent millions to quadrillions of measurements have an inherent error rate. Understanding the error modes for each dataset, developing methods to address them, and faithfully reporting our results, including all raw data and code (see Reproducibility below), is a crucial step in ensuring high quality. This is an iterative process that occasionally takes longer than we'd like, but the end result is something we can all be proud of.

Reproducibility

Similarly, as our data grows and computational pipelines become more complex, having a foundation of public code creates a record of our analytical approach. Partially for the community, and partially for ourselves, the ability to reproduce each step of an analysis promotes code reusability and pays dividends on the inevitable need to rerun code after peer review, dataset updates, or that one additional QC step upstream. Additionally, public reproducible code leads to fewer mistakes due to out-of-sync intermediate files, and ensures that even when mistakes do occur, it is easy to document what steps are affected and to what degree, allowing for rapid correction of the scientific record if needed.

Open science

The publication policy at ATGU pledges that in the spirit of rapid open science, we will submit all manuscripts to a preprint server at the time of journal submission. Further to this, the large-scale datasets that we manage are property of the community, not the individual researcher that happened to aggregate them or run a particular analysis on them. As stewards of these data, we commit to release intermediate products of the data on completion of quality control rather than the time of publication, if allowed by regulatory bodies. This may include hosting datasets, primarily for initial releases of data, or browser frameworks to enable exploration of the data.

How we do it

Equally important to the science we advance is the manner in which we perform the research and the development of each lab member. As every career level and trajectory will be different, I encourage each lab member to reach out to define a set of goals, so that we can work together to achieve them. Lab members are expected to pursue rigorous research and effective dissemination of this research, which often involves publications, though other forms of communication, including code release and browser development are also encouraged.

Collaboration/new directions

The ATGU and the Broad Institute are highly dynamic and collaborative environments, where we work with experts in computational biology, statistical genetics, population genetics, scalable computing, medical genetics, and more. As all trainees build a research program, we encourage discussion of the ideas, methods, and results with those in the local and broader environments, in order to advance the science as effectively as possible. Credit is infinitely divisible, and collaborations lead to new ideas and additional publications for all involved. Trainees should feel empowered to form collaborations, and I am happy to advise how to navigate these. Similarly, pursuing new directions that may or may not be related to your primary projects can be a valuable endeavor, and I encourage spending 10-20% of your time on average on exploring new avenues, which may also include consulting for other companies. Feel free to discuss these directions with me so that we can ensure your time is spent effectively.

Work life balance

While our work is important, I believe that a happy and healthy team is an effective team. To this end, I encourage each member of the team to identify and adopt their most efficient style, and to respect the choices of others. I expect an amount of work requisite with what is written in your contract, but understand that different stages of work and life may require different considerations. To this end, I am flexible on working hours, and though Slack messages may come at all times or on all days, no one should feel pressured to respond to messages outside of their working hours. Occasional exceptions may arise, such as a lead up to conferences like ASHG, but in these cases, taking a long-needed break after the event is encouraged.

Communicating science

A critical aspect of being a successful scientist in the modern era is the ability to communicate our research to others in our field, other scientists, and the public. I will provide opportunities for lab members to present their research internally and externally, and I commit to helping each individual craft a narrative for their research as they present to the public. I also encourage public discussion of research before publication (see Open Science, above), as well as teaching opportunities as desired.

Mentoring

For individuals further along in their training, mentoring junior researchers is a key component to a well-rounded training. I am happy to discuss an arrangement that suits everyone's research questions and career trajectories.

Selected Publications

Full list available at Google Scholar:

Karczewski KJ*, Gupta R*, Kanai M*, Lu W, Tsuo K, Wang Y, et al., "Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects." Nature Genetics. 2025 Sep 18. doi: 10.1038/s41588-025-02335-7.

Chen S*, Francioli LC*, Goodrich JE, Collins RL, Wang Q, Alföldi J, ..., Karczewski KJ. "A genomic mutational constraint map using variation in 76,156 human genomes." Nature. 2023 Dec 6. doi: 10.1038/s41586-023-06045-0.

Karczewski KJ*, Solomonson M*, Chao KR*, Goodrich JK, Tiao G, Lu W, et al., "Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,854 UK Biobank exomes." Cell Genomics. 15 Aug 2022. doi: 10.1016/j.xgen.2022.100168.

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al., "The mutational constraint spectrum quantified from variation in 141,456 humans." Nature. 2020 May 27. doi: 10.1038/s41586-020-2308-7. Flagship article for the gnomAD package. See Press below.

Karczewski KJ, and Martin AR. "Analytic and Translational Genetics." Annu Rev Biomedical Data Sci. 2020. doi: 10.1146/annurev-biodatasci-072018-021148.

Karczewski KJ and Snyder M. "Integrative Omics for Health and Disease." Nat Rev Genet. 2018 May; 19(5):299-310. doi: 10.1038/nrg.2018.4.

Karczewski KJ, Weisburd B, Thomas B, Ruderfer DM, Kavanagh D, Hamamsy T, et al., "The ExAC Browser: Displaying reference data information from over 60,000 exomes." Nucleic Acids Res. 2017 Jan 4; 45(D1):D840-D845. doi: 10.1093/nar/gkw971. Epub 2016 Nov 28. (bioRxiv. doi: 10.1101/070581. 2016 Aug 19.)

Lek M, Karczewski KJ*, Minikel EV*, Samocha KE*, Banks E, Fennell T, et al., "Analysis of protein-coding genetic variation in 60,706 humans." Nature. 2016 Aug 17; 536(7616):285-91. doi: 10.1038/nature19057. (bioRxiv. doi: 10.1101/030338. 2015 Oct 30).

Karczewski KJ, Snyder M, Altman RB, Tatonetti NP. "Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association." PLoS Genetics. 10(2): e1004122. doi:10.1371/journal.pgen.1004122.s012

Karczewski KJ*, Dudley JT*, Kukurba KR, Chen R, Butte AJ, Montgomery SB, Snyder M. "Systematic functional regulatory assessment of disease-associated variants." Proc Natl Acad Sci U S A. Epub 2013 May 20. doi: 10.1073/pnas.1219099110.

Dudley JT and Karczewski KJ. Exploring Personal Genomics. January 2013. Oxford University Press.

Karczewski KJ*, Tirrell RP*, Tatonetti NP, Dudley JT, Cordero P, Salari K, et al., "Interpretome: A Freely Available, Modular, and Secure Personal Genome Interpretation Engine." Pac Symp Biocomput. Epub 2011 Oct 25. 17:339-350(2012).

Karczewski KJ, Tatonetti NP, Landt SG, Yang X, Slifer T, Altman RB, Snyder M. "Cooperative Transcription Factor Associations Discovered using Regulatory Variation." Proc Natl Acad Sci U S A. 2011 Aug 9;108(32):13353-8. doi: 10.1073/pnas.1103105108. Epub 2011 Jul 26.

Press

gnomAD

Full collection of gnomAD papers in Nature, Nature Medicine, and Nature Communications.
Coverage in Cosmos magazine, El mundo, Medical Xpress, and genomeweb.
Other press coverage from Estonia, Spain, and France.
News and Views and Editorials in Nature.
Research highlight in Nature Reviews Genetics.

Public Projects

gnomAD: The Genome Aggregation Database, dataset and browser.

Genebass: Rare variant associations for >3k phenotypes in the UK Biobank.

LOFTEE: loss-of-function variation annotation.

Hail (contributor): open-source library for large-scale data analysis.

Contact

Email: konradjkarczewski@gmail.com

Follow me on Twitter, Github, or see my Amazon Author Page.