As a scientist, I identify as a “research software engineer” (RSE). An RSE is a scientist who focuses on the software tools for research.
My main scientific project is the software CrystFEL, which addresses these needs. It does semi-automatic processing of large numbers of scattering patterns, and allows the positions and intensities of the Bragg peaks to be boiled down to something which can easily be used to work out the structure. I'm very happy to be able to say that CrystFEL is free and open source software! If you want, you can even download some real data and take it for a spin yourself. Start here if you want to try, though you may run out of disk space…
My research is about using beams of radiation, usually X-rays, to probe the structure of matter at atomic lengthscales. I develop methods and software for analysing data from scattering experiments, where radiation is scattered by tiny crystals of matter into geometric patterns of spots.
One of the biggest applications for this is to determine the structure of biological molecules such as proteins. These are interesting because their properties are determined not so much by their composition (which elements they contain, and in what proportions), but rather by their structure. Take this protein, for example:
This is one of the proteins involved in the infection cycle of HIV. The different coloured spheres represent different atoms - mainly carbon, oxygen and nitrogen. When the virus replicates, the all of proteins in the virus are made in one long chain of atoms which gets cut up into smaller chains. This protein is the “molecular scissors” which does that cutting. See that hole in the middle? When it's in action, the long protein chain fits in there, and the surrounding atoms cut the pieces apart into separate molecules. The biological details are not the focus of my work, so don't ask me for more details than that - but you can find more information here and here if you're interested.
It's possible to reduce or prevent the replication of the virus by plugging up that hole with a specially designed molecule. These molecules, known as protease inhibitors, are a major part of our incredible progress in defeating the virus in the last decades.
So, that's why knowing biological molecular structures is important. But how is it done? There are lots of methods used in modern science, with various degrees of “directness”. Perhaps the most direct is electron cryo-microscopy, where an electron microscope is used to take pictures of individual molecules. This technique has developed a lot in the last few years.
The most common way, however, is X-ray crystallography. Here, you first make a crystal out of the protein. To a scientist, the word “crystal” has a specific meaning. It means that the molecules are arranged in a regular, repeating lattice. A crystal must have translational symmetry, which means that it would look the same if you were to shift it, without changing its orientation, a certain amount in any direction, it would look the same (apart from the edges moving a little). Like this:
This picture shows many copies of a human receptor protein forming a crystal. The protein is one of the molecules involved in embryonic development, part of making your body the shape it is. Note that as well as the “translational” symmetry, there are other types of symmetry present as well. Technical info here.
Ordered matter can scatter X-rays (and other radiation - see later) very efficiently, to form a pattern of small dots (called “Bragg peaks”) which can be easily recorded by a suitable X-ray detector. Analysing the positions of the dots reveals the directions of the translational symmetry of the lattice. Measuring the intensities of the peaks allows the other types of symmetry to be discovered, and eventually leads to working out the entire structure.
While the technique of X-ray crystallography is quite old, in recent years there's been a lot of interest in using high-powered X-ray sources called free-electron lasers. These are really huge machines based on linear particle accelerators. In this video, you can see the entire length of the European XFEL, a new machine which was installed in Hamburg a few years ago:
This is where I come in! The particular “quirks” of the crystallographic data measured with such machines require new approaches to handling the data. Not least of these is the sheer volume of it, which can exceed a hundred terabytes for one experiment running over only a few days.