这篇“C语言字符串压缩之ZSTD算法怎么使用”文章的知识点大部分人都不太理解,所以小编给大家总结了以下内容,内容详细,步骤清晰,具有一定的借鉴价值,希望大家阅读完这篇文章能有所收获,下面我们一起来看看这篇“C语言字符串压缩之ZSTD算法怎么使用”文章吧。
前言
字符串压缩,我们通常的需求有几个,一是高压缩率,二是压缩速率高,三是解压速率高。不过高压缩率与高压缩速率是鱼和熊掌的关系,不可皆得,优秀的算法一般也是采用压缩率与性能折中的方案。从压缩率、压缩速率、解压速率考虑,zstd与lz4有较好的压缩与解压性能,最终选取zstd与lz4进行调研。
zstd是facebook开源的提供高压缩比的快速压缩算法,很想了解一下它在压缩与解压方面的实际表现。
一、zstd压缩与解压
ZSTD_compress属于ZSTD的Simple API范畴,只有压缩级别可以设置。
ZSTD_compress函数原型如下:
size_t ZSTD_compressvoid* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel)
ZSTD_decompress函数原型如下:
size_t ZSTD_decompress void* dst, size_t dstCapacity, const void* src, size_t compressedSize); 我们先来看看zstd的压缩与解压缩示例。
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main) { // compress size_t com_space_size; size_t peppa_pig_text_size; char *com_ptr = NULL; char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud."; peppa_pig_text_size = strlenpeppa_pig_buf); com_space_size= ZSTD_compressBoundpeppa_pig_text_size); com_ptr = char *)malloccom_space_size); ifNULL == com_ptr) { cout << "compress malloc failed" << endl; return -1; } size_t com_size; com_size = ZSTD_compresscom_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, ZSTD_fast); cout << "peppa pig text size:" << peppa_pig_text_size << endl; cout << "compress text size:" << com_size << endl; cout << "compress ratio:" << float)peppa_pig_text_size / float)com_size << endl << endl; // decompress char* decom_ptr = NULL; unsigned long long decom_buf_size; decom_buf_size = ZSTD_getFrameContentSizecom_ptr, com_size); decom_ptr = char *)mallocsize_t)decom_buf_size); ifNULL == decom_ptr) { cout << "decompress malloc failed" << endl; return -1; } size_t decom_size; decom_size = ZSTD_decompressdecom_ptr, decom_buf_size, com_ptr, com_size); cout << "decompress text size:" << decom_size << endl; ifstrncmppeppa_pig_buf, decom_ptr, peppa_pig_text_size)) { cout << "decompress text is not equal peppa pig text" << endl; } freecom_ptr); freedecom_ptr); return 0; }
执行结果:
从结果可以发现,压缩之前的peppa pig文本长度为1827,压缩后的文本长度为759,压缩率为2.4,解压后的长度与压缩前相等。
另外,上文提到可以调整ZSTD_compress函数的压缩级别,zstd的默认级别为ZSTD_CLEVEL_DEFAULT = 3,最小值为0,最大值为ZSTD_MAX_CLEVEL = 22。另外也提供一些策略设置,例如 ZSTD_fast, ZSTD_greedy, ZSTD_lazy, ZSTD_lazy2, ZSTD_btlazy2。压缩级别越高,压缩率越高,但是压缩速率越低。
二、ZSTD压缩与解压性能探索
上面探索了zstd的基础压缩与解压方法,接下来再摸索一下zstd的压缩与解压缩性能。
测试方法是,使用ZSTD_compress连续压缩同一段文本并持续10秒,最后得到每一秒的平均压缩速率。测试压缩性能的代码示例如下:
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main) { int cnt = 0; size_t com_size; size_t com_space_size; size_t peppa_pig_text_size; char *com_ptr = NULL; char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud."; timeval st, et; peppa_pig_text_size = strlenpeppa_pig_buf); com_space_size= ZSTD_compressBoundpeppa_pig_text_size); gettimeofday&st, NULL); while1) { com_ptr = char *)malloccom_space_size); com_size = ZSTD_compresscom_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, ZSTD_fast); freecom_ptr); cnt++; gettimeofday&et, NULL); ifet.tv_sec - st.tv_sec >= 10) { break; } } cout << "compress per second:" << cnt/10 << " times" << endl; return 0; }
执行结果:
结果显示ZSTD的压缩性能大概在每秒6-7万次左右,这个结果其实并不是太理想。需要说明的是压缩性能与待压缩文本的长度、字符内容也是有关系的。
我们再来探索一下ZSTD的解压缩性能。与上面的测试方法类似,先对本文进行压缩,然后连续解压同一段被压缩过的数据并持续10秒,最后得到每一秒的平均解压速率。测试解压性能的代码示例如下:
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main) { int cnt = 0; size_t com_size; size_t com_space_size; size_t peppa_pig_text_size; timeval st, et; char *com_ptr = NULL; char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud."; size_t decom_size; char* decom_ptr = NULL; unsigned long long decom_buf_size; peppa_pig_text_size = strlenpeppa_pig_buf); com_space_size= ZSTD_compressBoundpeppa_pig_text_size); com_ptr = char *)malloccom_space_size); com_size = ZSTD_compresscom_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, 1); gettimeofday&st, NULL); decom_buf_size = ZSTD_getFrameContentSizecom_ptr, com_size); while1) { decom_ptr = char *)mallocsize_t)decom_buf_size); decom_size = ZSTD_decompressdecom_ptr, decom_buf_size, com_ptr, com_size); ifdecom_size != peppa_pig_text_size) { cout << "decompress error" << endl; break; } freedecom_ptr); cnt++; gettimeofday&et, NULL); ifet.tv_sec - st.tv_sec >= 10) { break; } } cout << "decompress per second:" << cnt/10 << " times" << endl; freecom_ptr); return 0; }
执行结果:
结果显示ZSTD的解压缩性能大概在每秒12万次左右,解压性能比压缩性能高。
三、zstd的高级用法
zstd提供了一个名为PZSTD的压缩和解压工具。PZSTD(parallel zstd),并行压缩的zstd,是一个使用多线程对待压缩文本进行切片分段,且进行并行压缩的命令行工具。
其实高版本(v1.4.0及以上)的zstd也提供了指定多线程对文本进行并行压缩的相关API接口,也就是本小节要介绍的zstd高级API用法。下面我们再来探索一下zstd的多线程压缩使用方法。
多线程并行压缩的两个关键API,一个是参数设置API,另一个是压缩API。
参数设置API的原型是:
size_t ZSTD_CCtx_setParameterZSTD_CCtx* cctx, ZSTD_cParameter param, int value)
压缩API的原型是:
size_t ZSTD_compress2ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
下面给出zstd并行压缩的示例demo,通过ZSTD_CCtx_setParameter设置线程数为3,即指定宏ZSTD_c_nbWorkers为3,通过ZSTD_compress2压缩相关文本。另外,为了展示zstd确实使用了多线程,需要先读取一个非常大的文件,作为zstd的压缩文本源,尽量使zstd运行较长时间。
#include <stdio.h> #include <string.h> #include <sys/time.h> #include <malloc.h> #include <zstd.h> #include <iostream> using namespace std; int main) { size_t com_size; size_t com_space_size; FILE *fp = NULL; unsigned int file_len; char *com_ptr = NULL; char *file_text_ptr = NULL; fp = fopen"xxxxxx", "r"); ifNULL == fp){ cout << "file open failed" << endl; return -1; } fseekfp, 0, SEEK_END); file_len = ftellfp); fseekfp, 0, SEEK_SET); cout << "file length:" << file_len << endl; // malloc space for file content file_text_ptr = char *)mallocfile_len); ifNULL == file_text_ptr) { cout << "malloc failed" << endl; return -1; } // malloc space for compress space com_space_size = ZSTD_compressBoundfile_len); com_ptr = char *)malloccom_space_size); ifNULL == com_ptr) { cout << "malloc failed" << endl; return -1; } // read text from source file freadfile_text_ptr, 1, file_len, fp); fclosefp); ZSTD_CCtx* cctx; cctx = ZSTD_createCCtx); // set multi-thread parameter ZSTD_CCtx_setParametercctx, ZSTD_c_nbWorkers, 3); ZSTD_CCtx_setParametercctx, ZSTD_c_compressionLevel, ZSTD_btlazy2); com_size = ZSTD_compress2cctx, com_ptr, com_space_size, file_text_ptr, file_len); freecom_ptr); freefile_text_ptr); return 0; }
运行上述demo,可见zstd确实启动了3个线程对文本进行了并行压缩。且设置的线程数越多,压缩时间越短,这里就不详细展示了,读者可以自行实验。
需要说明的是,zstd当前默认编译单线程的库文件,要实现多线程的API调用,需要在make的时候指定编译参数ZSTD_MULTITHREAD。
另外,zstd还支持线程池的方式,线程池的函数原型:
POOL_ctx* ZSTD_createThreadPoolsize_t numThreads)
线程池可以避免在多次、连续压缩场景时频繁的去创建线程、撤销线程产生的非必要开销,使得算力主要开销在文本压缩方面。