The first image of a black hole was published on April 10, 2019. The black hole, M87* at the center of the Messier 87 galaxy is located 53 million light-years away from Earth. NASA says a black hole is an extremely dense object from which no light can escape. Anything that comes within a black hole’s “event horizon,” will be consumed, because of the black hole’s unimaginably strong gravity.

By its very nature, a black hole cannot be seen, the bright ring in the picture is the event horizon, the point where an object approaching a black hole is unable to escape its gravitational pull. Objects that pass into the event horizon go through spaghettification, a process, first described by Stephen Hawking, where gravitational forces stretch the object out like a piece of pasta. The M87* image shows a silhouette of the black hole against the glow of the event horizon captured by researchers at the Event Horizon Telescope (EHT).
The EHT is the brainchild of Shep Doeleman, the director of EHT and astronomer at the Harvard-Smithsonian Center for Astrophysics. It is a virtual global array of eight ground-based radio telescopes. The EHT captured around 3.5 PB of data for the black hole image in April 2017. It then took two years to correlate the data to form the image. The EHT team not only had to figure out intergalactic science but also massive information technology problems. The researchers had to solve IT problems pretty typical for enterprise IT professionals, only bigger.
According to an article at SearchDataBackup each EHT telescope can record data at a rate of 64 Gbps, and each observation period can last more than 10 hours. The author calculated that each site generated around half a petabyte of data per run. The distributed locations included volcanoes in Hawaii and Mexico, mountains in Arizona and the Spanish Sierra Nevada, the Chilean Atacama Desert, and Antarctica. The sites were kept in sync using precise atomic clocks and GPS systems to carefully time the observations.
The data from each telescope was recorded at 16 Gbps and distributed among a total of 32 hard disk drives grouped into 4 modules of 8 disks each. The EHT can record a total rate at each site of 64 Gbps by using 4 units in tandem.

One problem EHT ran into was the failure rate of traditional hard drives in the extreme telescope locations. ComputerWorld reports that 28 of 32 conventional hard drives failed at the Sierra Negra telescope, on the top of an extinct volcano in Mexico.
SearchDataBackup says the solution was helium hard drives. The hermetically sealed helium drives are self-contained environments, so they could survive the extreme environments in which EHT’s telescopes operated. EHT first deployed helium hard drives in 2015. EHT data scientist Lindy Blackburn told SearchDataBackup that EHT now uses about 1,000 helium drives with up to 10 TB of capacity from Western Digital, Seagate, and Toshiba. He told SearchDataBackup,
The move to helium-sealed drives was a major advancement for the EHT … Not only do they perform well at altitude and run cooler, but there have been very few failures over the years. For example, no drives failed during the EHT’s 2017 observing campaign.
The amount of data collected by EHT was too much to send over the Internet so the researchers went old-school and used FedEx sneakernet style to send the data to be processed. Geoffrey Bower an astronomer in Hawaii told ScienceNews that mailing the disks is always a little nerve-wracking. So far, there have been no major shipping mishaps. But the cost and logistics involved with tracking and maintaining a multi-petabyte disk inventory is also challenging. Therefore, EHT is always on the lookout for another method to move petabyte-scale data.

`
SearchDataBackup points out that normally the cloud would be a good option for long-term storage of unifying data sourced from multiple, globally distributed endpoints. However, Mr. Blackburn told them the cloud was not a cold storage option for the project. He said the high recording speed and the sheer volume of data captured made it impractical to upload to a cloud. He explained, “At the moment, parallel recording to massive banks of hard drives, then physically shipping those drives somewhere is still the most practical solution.”
The data collected on the helium hard disk drive packs were processed by a grid computer made of about 800 CPUs all connected through a 40Gbps network at the MIT Haystack Observatory MA, and the Max Planck Institute for Radio Astronomy in Germany.

Geoff Crew, co-leader of the EHT correlation working group at Haystack Observatory told SearchDataBackup It is impractical to use the cloud for computing. Mr. Crew said;
Cloud computing does not make sense today, as the volume of data would be prohibitively expensive to load into the cloud and, once there, might not be physically placed to be efficiently computed.
The EHT scientists built algorithms that converted sparse data into images. They developed a way to cut the number of possible images by sorting out which results were physically plausible and which were wildly unlikely making it less hard to create the images.

Converting sparse data into images matters beyond astronomy. Mr. Blackburn told 538 the problem comes up in other areas as well; it occurs in medical imaging when doctors use MRIs to convert radio waves into pictures of your body. It’s also a key part of self-driving cars, which rely on computer visualization to “see” everything from potholes to people.
Just like any enterprise, EHT had to find a workable method of data protection. That includes deciding what won’t be protected. EHT has not found a cost-effective way to replicate or protect the raw radio signal data from the telescope sites. However, once the data has been processed and reduced to tens of petabytes it is backed up on-site on several different RAID systems and on Google Cloud Storage. Mr. Crew told SearchDataBackup;
The reduced data is archived and replicated to a number of internal EHT sites for the use of the team, and eventually, it will all be publicly archived. The raw data isn’t saved; we presently do not have any efficient and cost-effective means to back it up.
Mr. Blackburn said the raw data isn’t worth backing up. Because of the complexity of protecting such a large amount of data, it would be simpler to run another observation and gather a new set of data. Mr. Blackburn said; “Backing up original raw data to preserve every bit is not so important.”
Mr. Blackburn said he can’t seriously consider implementing a backup process unless it is “sufficiently straightforward and economical.”
Instead, he said he’s looking at where technology might be in the next five or 10 years to find the best method to handle petabyte-scale raw data from the telescopes. Mr. Blackburn told SearchDataBackup;
Right now, it is not clear if that will be continuing to record to hard drives and using special-purpose correlation clusters, recording to hard drives and getting the data as quickly as possible to the cloud, or if SSD or even tape technology will progress to a point to where they are competitive in both cost and speed to hard disks
rb-
The image of the black hole validated Einstein’s general theory of relativity and proves that enterprise-class IT can solve intergalactic problems.
The EHT team had to figure out how to save, move and backup massive quantities of data and of course do more with less. EHT’s Geoff Crew summed up the problem most IT pros have; “Most of our challenges are related to insufficient money, rather than technical hurdles.”
Related articles
- Trolls hijacked a scientist’s image to attack Katie Bouman. They picked the wrong astrophysicist. (MSN)
Ralph Bach has been in IT long enough to know better and has blogged from his Bach Seat about IT, careers, and anything else that catches his attention since 2005. You can follow him on LinkedIn, Facebook, and Twitter. Email the Bach Seat here.