Hadoop Soup: Compression and Decompression in MapReduce

Friday, 10 January 2014

Compression and Decompression in MapReduce

In order to reduces the space needed to store files, and it speeds up data transfer across the network or to or from disk file compression plays a very important role. When dealing with large volumes of data, both of these savings can be significant, so it pays to carefully consider how to use compression in Hadoop. There are many different compression formats, tools and algorithms, each with different characteristics in Hadoop.

Compression format	CompressionCodec
DEFLATE	org.apache.hadoop.io.compress.DefaultCodec
gzip	org.apache.hadoop.io.compress.GzipCodec
bzip2	org.apache.hadoop.io.compress.BZip2Codec
LZO	com.hadoop.compression.lzo.LzopCodec
LZ4	org.apache.hadoop.io.compress.Lz4Codec
Snappy	org.apache.hadoop.io.compress.SnappyCodec

All compression algorithms exhibit a space/time trade-off i.e faster compression and decompression speeds usually come at the expense of smaller space savings. The different tools have very different compression characteristics.

Gzip is a generalpurpose compressor and sits in the middle of the space/time trade-off.
Bzip2 compresses more effectively than gzip, but is slower. Bzip2’s decompression speed is faster than its compression speed, but it is still slower than the other formats.
LZ4. and Snappy, on the other hand, all optimize for speed and are around an order of magnitude faster than gzip, but compress less effectively. Snappy and LZ4 are also significantly faster than LZO for decompression.

The tools listed above typically give some control over this trade-off at compression time by offering nine different options. –1 means optimize for speed, and -9 means optimize for space. For example, the following command creates a compressed file file.gz using the fastest compression method: gzip -1 file

Simplest program for compression.

In the above code there is no mapper and reducer.Notice the two lines which are doing the job of compression.Instead of defalte type any other compression format can be taken.

Simplest program for decompression.

In the decompression code we are not using any special expression which is doing the job of compression.In fact we are not doing anything at all. But if you take a compressed file as in input file for above code an run it, it will decompress the file. Now the format in which the decompressed file is produced is just FILE.

21 comments:

Unknown22 September 2014 at 00:16
Aman please have a look also at https://github.com/carlomedas/4mc : splittable LZ4 power unleashed in hadoop at any stage of M/R.
ReplyDelete
Replies
Unknown2 November 2015 at 20:54
Good article. I have a questions. Assume we have data in compressed form on hdfc and we are using some splittable codec (bzip2), when exactly the decompression takes place? is it during getSplit() at the client side? are the inputsplits compressed and recordReader decompress them?
ReplyDelete
Replies
Austin P10 April 2022 at 17:13
Herpes Virus whether it is oral or genital. To control its symptoms, you usually do many things but it doesn’t give you the expected results. And sometimes some medicines can even give you side effects which can make your situation more critical. Personally I always prefer natural cure for herpes Or any Other Infection because they won’t give you side effects. You can cure your infection/Diseases smoothly and with less trouble with natural remedies. I Strongly Recommend Herbal doctor Razor's Traditional Medicine , Get in touch with him on his Facebook Page https://web.facebook.com/HerbalistrazorMedicinalcure He is blessed with the wisdom to get rid of this virus and other Diseases. I had suffered from this Virus since I was a child, I'd learnt to live with it but still wanted to get cured of it and DOC RAZOR simply helped me with that . All thanks To Doctor Razor Who Rescued Me. Contact him on email : drrazorherbalhome@gmail.com, . Reach Him directly on https://wa.me/message/USI4SETUUEW4H1
ReplyDelete
Replies
StarryNightWanderer4729 September 2023 at 21:32
Antalya
Antep
Burdur
Sakarya
istanbul
ZWB2
ReplyDelete
Replies
NovaFrostfallHarbinger29 September 2023 at 22:29
Batman
Ardahan
Adıyaman
Antalya
Giresun
KJFU8
ReplyDelete
Replies
ByteBender881 October 2023 at 08:01
Eskişehir
Adana
Sivas
Kayseri
Samsun
RU3
ReplyDelete
Replies
StarshipPhantom90008 October 2023 at 08:02
görüntülüshow
ücretli show
V72
ReplyDelete
Replies
StardustGoddessXYOLG21 October 2023 at 08:57
Tokat Lojistik
Konya Lojistik
Mersin Lojistik
Karabük Lojistik
Samsun Lojistik
5Q51Q1
ReplyDelete
Replies
86200MaceyF02625 November 2023 at 15:00
E1C3D
Eskişehir Lojistik
Ordu Evden Eve Nakliyat
Siirt Evden Eve Nakliyat
Tokat Lojistik
Burdur Parça Eşya Taşıma
ReplyDelete
Replies
DEEE0Bruce4D7DE6 November 2023 at 03:48
5D28E
Adıyaman Lojistik
Kars Evden Eve Nakliyat
Kütahya Evden Eve Nakliyat
Bartın Lojistik
Çorum Parça Eşya Taşıma
ReplyDelete
Replies
B884BChristina3005D8 November 2023 at 00:35
1E969
Mersin Evden Eve Nakliyat
Tokat Evden Eve Nakliyat
Aydın Evden Eve Nakliyat
Çerkezköy Cam Balkon
Karaman Evden Eve Nakliyat
ReplyDelete
Replies
80D21Mark7D1C911 November 2023 at 15:28
51C5F
Batman Evden Eve Nakliyat
Area Coin Hangi Borsada
Konya Lojistik
Paribu Güvenilir mi
Urfa Şehirler Arası Nakliyat
Kocaeli Lojistik
Muğla Evden Eve Nakliyat
Kütahya Lojistik
Çerkezköy Çekici
ReplyDelete
Replies
8E6D0LindaD12D012 November 2023 at 19:25
D6042
Antep Şehirler Arası Nakliyat
Bilecik Parça Eşya Taşıma
Sinop Şehir İçi Nakliyat
Elazığ Evden Eve Nakliyat
Malatya Şehir İçi Nakliyat
Kripto Para Nedir
İstanbul Evden Eve Nakliyat
Kocaeli Parça Eşya Taşıma
Ünye Evden Eve Nakliyat
ReplyDelete
Replies
Anonymous29 August 2024 at 04:03
WFEDGVRF
شركة كشف تسربات المياه بالدمام
ReplyDelete
Replies
Anonymous4 November 2024 at 00:45
شركة عزل اسطح بالجبيل bsVIHy5Ly9
ReplyDelete
Replies
Anonymous19 November 2024 at 04:23
شركة مكافحة الحشرات بالاحساء zpK8evXwbQ
ReplyDelete
Replies
Anonymous27 November 2024 at 05:54
شركة تنظيف بالجبيل U1FlTa2Uz0
ReplyDelete
Replies
Anonymous7 December 2024 at 01:47
شركة مكافحة حشرات بالهفوف swnjbI3NC9
ReplyDelete
Replies
A62FCAE36CEdie6DE9BB389631 December 2024 at 01:02
A31B8FD2E0
takipçi
ReplyDelete
Replies
Anonymous6 February 2025 at 00:07
42941660B1
Anadoluslot Yeni Adres
Anadoluslot Giriş
Anadoluslot
Anadoluslot Güncel Adres
Trwin
Trwin Giriş
Trwin
Trwin Yeni Adres
Trwin
ReplyDelete
Replies
Anonymous8 March 2025 at 04:43
شركة تنظيف شقق بالجبيل nLubcsTtBJ
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)