--- /n/sources/plan9/sys/man/1/gzip Sun Dec 2 23:42:15 2007 +++ /sys/man/1/gzip Sat May 1 00:00:00 2021 @@ -1,12 +1,12 @@ .TH GZIP 1 .SH NAME -gzip, gunzip, bzip2, bunzip2, compress, uncompress, zip, unzip \- compress and expand data +gzip, gunzip, bzip2, bunzip2, lzip, lunzip, compress, uncompress, zip, unzip \- compress and expand data .SH SYNOPSIS .B gzip .RB [ -cvD [ 1-9 ]] .RI [ file .BR ... ] -.PP +.br .B gunzip .RB [ -ctTvD ] .RI [ file @@ -16,12 +16,22 @@ .RB [ -cvD [ 1-9 ]] .RI [ file .BR ... ] -.PP +.br .B bunzip2 .RB [ -cvD ] .RI [ file .BR ... ] .PP +.B lzip +.RB [ -cvD [ 1-9 ]] +.RI [ file +.BR ... ] +.br +.B lunzip +.RB [ -cvD ] +.RI [ file +.BR ... ] +.PP .B compress [ .B -cv @@ -29,7 +39,7 @@ .I file .B ... ] -.PP +.br .B uncompress [ .B -cv @@ -44,7 +54,7 @@ .IR zipfile ] .I file .RB [ ... ] -.PP +.br .B unzip .RB [ -cistTvD ] .RB [ -f @@ -86,7 +96,9 @@ and .IR gunzip , but use a modified Burrows-Wheeler block sorting -compression algorithm. +compression algorithm, +which often produces smaller compressed files than +.IR gzip . The default suffix for output files is .BR .bz2 , with @@ -99,6 +111,32 @@ as a synonym for .BR .tbz . .PP +.I Lzip +and +.I lunzip +are also similar in interface to +.I gzip +and +.IR gunzip , +but use a specific LZMA (Lempel-Ziv-Markov) compression algorithm, +which often produces smaller compressed files than +.IR bzip2 . +The default suffix for output files is +.BR .lz , +with +.B .tar.lz +becoming +.BR .tlz . +Note that the popular +.I xz +compression program uses different LZMA compression algorithms +and so files compressed by it will not be understood by +.I lunzip +and vice versa +(and may not even be understood by other +.I xz +implementations). +.PP .I Compress and .I uncompress @@ -130,7 +168,8 @@ If the process fails, the faulty output files are removed. .PP The options are: -.TP 0.6i +.\" .TP 0.6i +.TP 0.3i .B -a Automaticialy creates directories as needed, needed for zip files created by broken implementations which omit directories. @@ -183,9 +222,7 @@ .B -D Produce debugging output. .SH SOURCE -.B /sys/src/cmd/gzip -.br -.B /sys/src/cmd/bzip2 +.B /sys/src/cmd/*zip* .br .B /sys/src/cmd/compress .SH SEE ALSO diff -Nru /sys/src/cmd/lzip/AUTHORS /sys/src/cmd/lzip/AUTHORS --- /sys/src/cmd/lzip/AUTHORS Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/AUTHORS Sat May 1 00:00:00 2021 @@ -0,0 +1,7 @@ +Clzip was written by Antonio Diaz Diaz. + +The ideas embodied in clzip are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for +the definition of Markov chains), G.N.N. Martin (for the definition of +range encoding), Igor Pavlov (for putting all the above together in +LZMA), and Julian Seward (for bzip2's CLI). diff -Nru /sys/src/cmd/lzip/COPYING /sys/src/cmd/lzip/COPYING --- /sys/src/cmd/lzip/COPYING Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/COPYING Sat May 1 00:00:00 2021 @@ -0,0 +1,338 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. diff -Nru /sys/src/cmd/lzip/ChangeLog /sys/src/cmd/lzip/ChangeLog --- /sys/src/cmd/lzip/ChangeLog Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/ChangeLog Sat May 1 00:00:00 2021 @@ -0,0 +1,115 @@ +2017-04-13 Antonio Diaz Diaz + + * Version 1.9 released. + * The option '-l, --list' has been ported from lziprecover. + * Don't allow mixing different operations (-d, -l or -t). + * Compression time of option '-0' has been reduced by 6%. + * Compression time of options -1 to -9 has been reduced by 1%. + * Decompression time has been reduced by 7%. + * main.c: Continue testing if any input file is a terminal. + * main.c: Show trailing data in both hexadecimal and ASCII. + * file_index.c: Improve detection of bad dict and trailing data. + * lzip.h: Unified messages for bad magic, trailing data, etc. + * clzip.texi: Added missing chapters from lzip.texi. + +2016-05-13 Antonio Diaz Diaz + + * Version 1.8 released. + * main.c: Added new option '-a, --trailing-error'. + * main.c (decompress): Print up to 6 bytes of trailing data + when '-vvvv' is specified. + * decoder.c (LZd_verify_trailer): Removed test of final code. + * main.c (main): Delete '--output' file if infd is a terminal. + * main.c (main): Don't use stdin more than once. + * clzip.texi: Added chapter 'Trailing data'. + * configure: Avoid warning on some shells when testing for gcc. + * Makefile.in: Detect the existence of install-info. + * testsuite/check.sh: A POSIX shell is required to run the tests. + * testsuite/check.sh: Don't check error messages. + +2015-07-07 Antonio Diaz Diaz + + * Version 1.7 released. + * Ported fast encoder and option '-0' from lzip. + * Makefile.in: Added new targets 'install*-compress'. + +2014-08-28 Antonio Diaz Diaz + + * Version 1.6 released. + * Compression ratio of option '-9' has been slightly increased. + * main.c (close_and_set_permissions): Behave like 'cp -p'. + * clzip.texinfo: Renamed to clzip.texi. + * License changed to GPL version 2 or later. + +2013-09-17 Antonio Diaz Diaz + + * Version 1.5 released. + * Show progress of compression at verbosity level 2 (-vv). + * main.c (show_header): Don't show header version. + * Ignore option '-n, --threads' for compatibility with plzip. + * configure: Options now accept a separate argument. + +2013-02-18 Antonio Diaz Diaz + + * Version 1.4 released. + * Multi-step trials have been implemented. + * Compression ratio has been slightly increased. + * Compression time has been reduced by 10%. + * Decompression time has been reduced by 8%. + * Makefile.in: Added new target 'install-as-lzip'. + * Makefile.in: Added new target 'install-bin'. + * main.c: Use 'setmode' instead of '_setmode' on Windows and OS/2. + * main.c: Define 'strtoull' to 'strtoul' on Windows. + +2012-02-25 Antonio Diaz Diaz + + * Version 1.3 released. + * main.c (close_and_set_permissions): Inability to change output + file attributes has been downgraded from error to warning. + * encoder.c (Mf_init): Return false if out of memory instead of + calling cleanup_and_fail. + * Small change in '--help' output and man page. + * Changed quote characters in messages as advised by GNU Standards. + * configure: 'datadir' renamed to 'datarootdir'. + +2011-05-18 Antonio Diaz Diaz + + * Version 1.2 released. + * main.c: Added new option '-F, --recompress'. + * main.c (decompress): Print only one status line for each + multimember file when only one '-v' is specified. + * encoder.h (Lee_update_prices): Update high length symbol prices + independently of the value of 'pos_state'. This gives better + compression for large values of '--match-length' without being + slower. + * encoder.h encoder.c: Optimize pair price calculations. This + reduces compression time for large values of '--match-length' + by up to 6%. + +2011-01-11 Antonio Diaz Diaz + + * Version 1.1 released. + * Code has been converted to 'C89 + long long' from C99. + * main.c: Fixed warning about fchown return value being ignored. + * decoder.c: '-tvvvv' now shows compression ratio. + * main.c: Match length limit set by options -1 to -8 has been + reduced to extend range of use towards gzip. Lower numbers now + compress less but faster. (-1 now takes 43% less time for only + 20% larger compressed size). + * Compression ratio of option '-9' has been slightly increased. + * main.c (open_instream): Don't show the message + " and '--stdout' was not specified" for directories, etc. + * New examples have been added to the manual. + +2010-04-05 Antonio Diaz Diaz + + * Version 1.0 released. + * Initial release. + * Translated to C from the C++ source of lzip 1.10. + + +Copyright (C) 2010-2017 Antonio Diaz Diaz. + +This file is a collection of facts, and thus it is not copyrightable, +but just in case, you have unlimited permission to copy, distribute and +modify it. diff -Nru /sys/src/cmd/lzip/NEWS /sys/src/cmd/lzip/NEWS --- /sys/src/cmd/lzip/NEWS Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/NEWS Sat May 1 00:00:00 2021 @@ -0,0 +1,21 @@ +Changes in version 1.9: + +The option '-l, --list' has been ported from lziprecover. + +It is now an error to specify two or more different operations in the +command line (--decompress, --list or --test). + +Compression time of option '-0' has been reduced by 6%. + +Compression time of options '-1' to '-9' has been reduced by 1%. + +Decompression time has been reduced by 7%. + +In test mode, clzip now continues checking the rest of the files if any +input file is a terminal. + +Trailing data are now shown both in hexadecimal and as a string of +printable ASCII characters. + +Three missing chapters have been added to the manual, which now contains +all the chapters of the lzip manual. diff -Nru /sys/src/cmd/lzip/README /sys/src/cmd/lzip/README --- /sys/src/cmd/lzip/README Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/README Sat May 1 00:00:00 2021 @@ -0,0 +1,126 @@ +Description + +Clzip is a C language version of lzip, fully compatible with lzip-1.4 or +newer. As clzip is written in C, it may be easier to integrate in +applications like package managers, embedded devices, or systems lacking +a C++ compiler. + +Lzip is a lossless data compressor with a user interface similar to the +one of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0), +or compress most files more than bzip2 (lzip -9). Decompression speed is +intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 +from a data recovery perspective. + +The lzip file format is designed for data sharing and long-term +archiving, taking into account both data integrity and decoder +availability: + + * The lzip format provides very safe integrity checking and some data + recovery means. The lziprecover program can repair bit-flip errors + (one of the most common forms of data corruption) in lzip files, + and provides data recovery capabilities, including error-checked + merging of damaged copies of a file. + + * The lzip format is as simple as possible (but not simpler). The + lzip manual provides the source code of a simple decompressor along + with a detailed explanation of how it works, so that with the only + help of the lzip manual it would be possible for a digital + archaeologist to extract the data from a lzip file long after + quantum computers eventually render LZMA obsolete. + + * Additionally the lzip reference implementation is copylefted, which + guarantees that it will remain free forever. + +A nice feature of the lzip format is that a corrupt byte is easier to +repair the nearer it is from the beginning of the file. Therefore, with +the help of lziprecover, losing an entire archive just because of a +corrupt byte near the beginning is a thing of the past. + +Clzip uses the same well-defined exit status values used by lzip and +bzip2, which makes it safer than compressors returning ambiguous warning +values (like gzip) when it is used as a back end for other programs like +tar or zutils. + +Clzip will automatically use the smallest possible dictionary size for +each file without exceeding the given limit. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. + +The amount of memory required for compression is about 1 or 2 times the +dictionary size limit (1 if input file size is less than dictionary size +limit, else 2) plus 9 times the dictionary size really used. The option +'-0' is special and only requires about 1.5 MiB at most. The amount of +memory required for decompression is about 46 kB larger than the +dictionary size really used. + +When compressing, clzip replaces every file given in the command line +with a compressed version of itself, with the name "original_name.lz". +When decompressing, clzip attempts to guess the name for the decompressed +file from that of the compressed file as follows: + +filename.lz becomes filename +filename.tlz becomes filename.tar +anyothername becomes anyothername.out + +(De)compressing a file is much like copying or moving it; therefore clzip +preserves the access and modification dates, permissions, and, when +possible, ownership of the file just as "cp -p" does. (If the user ID or +the group ID can't be duplicated, the file permission bits S_ISUID and +S_ISGID are cleared). + +Clzip is able to read from some types of non regular files if the +"--stdout" option is specified. + +If no file names are specified, clzip compresses (or decompresses) from +standard input to standard output. In this case, clzip will decline to +write compressed output to a terminal, as this would be entirely +incomprehensible and therefore pointless. + +Clzip will correctly decompress a file which is the concatenation of two +or more compressed files. The result is the concatenation of the +corresponding uncompressed files. Integrity testing of concatenated +compressed files is also supported. + +Clzip can produce multimember files, and lziprecover can safely recover +the undamaged members in case of file damage. Clzip can also split the +compressed output in volumes of a given size, even when reading from +standard input. This allows the direct creation of multivolume +compressed tar archives. + +Clzip is able to compress and decompress streams of unlimited size by +automatically creating multimember output. The members so created are +large, about 2 PiB each. + +In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option '-0' of lzip uses the scheme in almost +the simplest way possible; issuing the longest match it can find, or a +literal byte if it can't find a match. Inversely, a much more elaborated +way of finding coding sequences of minimum size than the one currently +used by lzip could be developed, and the resulting sequence could also +be coded using the LZMA coding scheme. + +Clzip currently implements two variants of the LZMA algorithm; fast +(used by option '-0') and normal (used by all other compression levels). + +The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77/78) and markov models (the +thing used by every compression algorithm that uses a range encoder or +similar order-0 entropy coder as its last stage) with segregation of +contexts according to what the bits are used for. + +The ideas embodied in clzip are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for +the definition of Markov chains), G.N.N. Martin (for the definition of +range encoding), Igor Pavlov (for putting all the above together in +LZMA), and Julian Seward (for bzip2's CLI). + + +Copyright (C) 2010-2017 Antonio Diaz Diaz. + +This file is free documentation: you have unlimited permission to copy, +distribute and modify it. + +The file Makefile.in is a data file used by configure to produce the +Makefile. It has the same copyright owner and permissions that configure +itself. diff -Nru /sys/src/cmd/lzip/README.plan9 /sys/src/cmd/lzip/README.plan9 --- /sys/src/cmd/lzip/README.plan9 Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/README.plan9 Sat May 1 00:00:00 2021 @@ -0,0 +1,2 @@ +This is clzip 1.9, tuned and somewhat beautified. +It's still not pretty but it's legible. diff -Nru /sys/src/cmd/lzip/decoder.c /sys/src/cmd/lzip/decoder.c --- /sys/src/cmd/lzip/decoder.c Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/decoder.c Sat May 1 00:00:00 2021 @@ -0,0 +1,294 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include "lzip.h" +#include "decoder.h" + +void +Pp_show_msg(Pretty_print *pp, char *msg) +{ + if (verbosity >= 0) { + if (pp->first_post) { + unsigned i; + + pp->first_post = false; + fprintf(stderr, "%s: ", pp->name); + for (i = strlen(pp->name); i < pp->longest_name; ++i) + fputc(' ', stderr); + if (!msg) + fflush(stderr); + } + if (msg) + fprintf(stderr, "%s\n", msg); + } +} + +/* Returns the number of bytes really read. + If returned value < size and no read error, means EOF was reached. + */ +int +readblock(int fd, uchar *buf, int size) +{ + int n, sz; + + for (sz = 0; sz < size; sz += n) { + n = read(fd, buf + sz, size - sz); + if (n <= 0) + break; + } + return sz; +} + +/* Returns the number of bytes really written. + If (returned value < size), it is always an error. + */ +int +writeblock(int fd, uchar *buf, int size) +{ + int n, sz; + + for (sz = 0; sz < size; sz += n) { + n = write(fd, buf + sz, size - sz); + if (n != size - sz) + break; + } + return sz; +} + +bool +Rd_read_block(Range_decoder *rdec) +{ + if (!rdec->at_stream_end) { + rdec->stream_pos = readblock(rdec->infd, rdec->buffer, rd_buffer_size); + if (rdec->stream_pos != rd_buffer_size && errno) { + show_error( "Read error", errno, false ); + cleanup_and_fail(1); + } + rdec->at_stream_end = (rdec->stream_pos < rd_buffer_size); + rdec->partial_member_pos += rdec->pos; + rdec->pos = 0; + } + return rdec->pos < rdec->stream_pos; +} + +void +LZd_flush_data(LZ_decoder *d) +{ + if (d->pos > d->stream_pos) { + int size = d->pos - d->stream_pos; + CRC32_update_buf(&d->crc, d->buffer + d->stream_pos, size); + if (d->outfd >= 0 && + writeblock(d->outfd, d->buffer + d->stream_pos, size) != size) { + show_error( "Write error", errno, false ); + cleanup_and_fail(1); + } + if (d->pos >= d->dict_size) { + d->partial_data_pos += d->pos; + d->pos = 0; + d->pos_wrapped = true; + } + d->stream_pos = d->pos; + } +} + +static bool +LZd_verify_trailer(LZ_decoder *d, Pretty_print *pp) +{ + File_trailer trailer; + int size = Rd_read_data(d->rdec, trailer, Ft_size); + uvlong data_size = LZd_data_position(d); + uvlong member_size = Rd_member_position(d->rdec); + bool error = false; + + if (size < Ft_size) { + error = true; + if (verbosity >= 0) { + Pp_show_msg(pp, 0); + fprintf( stderr, "Trailer truncated at trailer position %d;" + " some checks may fail.\n", size ); + } + while (size < Ft_size) + trailer[size++] = 0; + } + + if (Ft_get_data_crc(trailer) != LZd_crc(d)) { + error = true; + if (verbosity >= 0) { + Pp_show_msg(pp, 0); + fprintf( stderr, "CRC mismatch; trailer says %08X, data CRC is %08X\n", + Ft_get_data_crc(trailer), LZd_crc(d)); + } + } + if (Ft_get_data_size(trailer) != data_size) { + error = true; + if (verbosity >= 0) { + Pp_show_msg(pp, 0); + fprintf( stderr, "Data size mismatch; trailer says %llud, data size is %llud (0x%lluX)\n", + Ft_get_data_size(trailer), data_size, data_size); + } + } + if (Ft_get_member_size(trailer) != member_size) { + error = true; + if (verbosity >= 0) { + Pp_show_msg(pp, 0); + fprintf(stderr, "Member size mismatch; trailer says %llud, member size is %llud (0x%lluX)\n", + Ft_get_member_size(trailer), member_size, member_size); + } + } + if (0 && !error && verbosity >= 2 && data_size > 0 && member_size > 0) + fprintf(stderr, "%6.3f:1, %6.3f bits/byte, %5.2f%% saved. ", + (double)data_size / member_size, + (8.0 * member_size) / data_size, + 100.0 * (1.0 - (double)member_size / data_size)); + if (!error && verbosity >= 4) + fprintf( stderr, "CRC %08X, decompressed %9llud, compressed %8llud. ", + LZd_crc(d), data_size, member_size); + return !error; +} + +/* Return value: 0 = OK, 1 = decoder error, 2 = unexpected EOF, + 3 = trailer error, 4 = unknown marker found. */ +int +LZd_decode_member(LZ_decoder *d, Pretty_print *pp) +{ + Range_decoder *rdec = d->rdec; + Bit_model bm_literal[1<= start_dis_model) { + unsigned dis_slot = distance; + int direct_bits = (dis_slot >> 1) - 1; + distance = (2 | (dis_slot & 1)) << direct_bits; + if (dis_slot < end_dis_model) + distance += Rd_decode_tree_reversed(rdec, + bm_dis + (distance - dis_slot), direct_bits); + else { + distance += + Rd_decode(rdec, direct_bits - dis_align_bits) << dis_align_bits; + distance += Rd_decode_tree_reversed4(rdec, bm_align); + if (distance == 0xFFFFFFFFU) /* marker found */ { + Rd_normalize(rdec); + LZd_flush_data(d); + if (len == min_match_len) /* End Of Stream marker */ { + if (LZd_verify_trailer(d, pp)) +/* code folded from here */ + return 0; +/* unfolding */ + else +/* code folded from here */ + return 3; +/* unfolding */ + } + if (len == min_match_len + 1) /* Sync Flush marker */ { + Rd_load(rdec); + continue; + } + if (verbosity >= 0) { + Pp_show_msg(pp, 0); + fprintf( stderr, "Unsupported marker code '%d'\n", len ); + } + return 4; + } + } + } + rep3 = rep2; + rep2 = rep1; + rep1 = rep0; + rep0 = distance; + state = St_set_match(state); + if (rep0 >= d->dict_size || (rep0 >= d->pos && !d->pos_wrapped)) { + LZd_flush_data(d); + return 1; + } + } + LZd_copy_block(d, rep0, len); + } + } + LZd_flush_data(d); + return 2; +} + diff -Nru /sys/src/cmd/lzip/decoder.h /sys/src/cmd/lzip/decoder.h --- /sys/src/cmd/lzip/decoder.h Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/decoder.h Sat May 1 00:00:00 2021 @@ -0,0 +1,354 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +enum { rd_buffer_size = 16384 }; + +typedef struct LZ_decoder LZ_decoder; +typedef struct Range_decoder Range_decoder; + +struct Range_decoder { + uvlong partial_member_pos; + uchar * buffer; /* input buffer */ + int pos; /* current pos in buffer */ + int stream_pos; /* when reached, a new block must be read */ + uint32_t code; + uint32_t range; + int infd; /* input file descriptor */ + bool at_stream_end; +}; + +bool Rd_read_block(Range_decoder *rdec); + +static bool +Rd_init(Range_decoder *rdec, int ifd) +{ + rdec->partial_member_pos = 0; + rdec->buffer = (uchar *)malloc(rd_buffer_size); + if (!rdec->buffer) + return false; + rdec->pos = 0; + rdec->stream_pos = 0; + rdec->code = 0; + rdec->range = 0xFFFFFFFFU; + rdec->infd = ifd; + rdec->at_stream_end = false; + return true; +} + +static void +Rd_free(Range_decoder *rdec) +{ + free(rdec->buffer); +} + +static bool +Rd_finished(Range_decoder *rdec) +{ + return rdec->pos >= rdec->stream_pos && !Rd_read_block(rdec); +} + +static uvlong +Rd_member_position(Range_decoder *rdec) +{ + return rdec->partial_member_pos + rdec->pos; +} + +static void +Rd_reset_member_position(Range_decoder *rdec) +{ + rdec->partial_member_pos = 0; + rdec->partial_member_pos -= rdec->pos; +} + +static uchar +Rd_get_byte(Range_decoder *rdec) +{ + /* 0xFF avoids decoder error if member is truncated at EOS marker */ + if (Rd_finished(rdec)) + return 0xFF; + return rdec->buffer[rdec->pos++]; +} + +static int Rd_read_data(Range_decoder *rdec, uchar *outbuf, int size) +{ + int sz = 0; + + while (sz < size && !Rd_finished(rdec)) { + int rd, rsz = size - sz; + int rpos = rdec->stream_pos - rdec->pos; + + if (rsz < rpos) + rd = rsz; + else + rd = rpos; + memcpy(outbuf + sz, rdec->buffer + rdec->pos, rd); + rdec->pos += rd; + sz += rd; + } + return sz; +} + +static void +Rd_load(Range_decoder *rdec) +{ + int i; + rdec->code = 0; + for (i = 0; i < 5; ++i) + rdec->code = (rdec->code << 8) | Rd_get_byte(rdec); + rdec->range = 0xFFFFFFFFU; + rdec->code &= rdec->range; /* make sure that first byte is discarded */ +} + +static void +Rd_normalize(Range_decoder *rdec) +{ + if (rdec->range <= 0x00FFFFFFU) { + rdec->range <<= 8; + rdec->code = (rdec->code << 8) | Rd_get_byte(rdec); + } +} + +static unsigned +Rd_decode(Range_decoder *rdec, int num_bits) +{ + unsigned symbol = 0; + int i; + for (i = num_bits; i > 0; --i) { + bool bit; + Rd_normalize(rdec); + rdec->range >>= 1; + /* symbol <<= 1; */ + /* if(rdec->code >= rdec->range) { rdec->code -= rdec->range; symbol |= 1; } */ + bit = (rdec->code >= rdec->range); + symbol = (symbol << 1) + bit; + rdec->code -= rdec->range & (0U - bit); + } + return symbol; +} + +static unsigned +Rd_decode_bit(Range_decoder *rdec, Bit_model *probability) +{ + uint32_t bound; + Rd_normalize(rdec); + bound = (rdec->range >> bit_model_total_bits) * *probability; + if (rdec->code < bound) { + rdec->range = bound; + *probability += (bit_model_total - *probability) >> bit_model_move_bits; + return 0; + } else { + rdec->range -= bound; + rdec->code -= bound; + *probability -= *probability >> bit_model_move_bits; + return 1; + } +} + +static unsigned +Rd_decode_tree3(Range_decoder *rdec, Bit_model bm[]) +{ + unsigned symbol = 1; + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + return symbol & 7; +} + +static unsigned +Rd_decode_tree6(Range_decoder *rdec, Bit_model bm[]) +{ + unsigned symbol = 1; + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + return symbol & 0x3F; +} + +static unsigned +Rd_decode_tree8(Range_decoder *rdec, Bit_model bm[]) +{ + unsigned symbol = 1; + int i; + for (i = 0; i < 8; ++i) + symbol = (symbol << 1) | Rd_decode_bit(rdec, &bm[symbol]); + return symbol & 0xFF; +} + +static unsigned +Rd_decode_tree_reversed(Range_decoder *rdec, Bit_model bm[], int num_bits) +{ + unsigned model = 1; + unsigned symbol = 0; + int i; + for (i = 0; i < num_bits; ++i) { + unsigned bit = Rd_decode_bit(rdec, &bm[model]); + model = (model << 1) + bit; + symbol |= (bit << i); + } + return symbol; +} + +static unsigned +Rd_decode_tree_reversed4(Range_decoder *rdec, Bit_model bm[]) +{ + unsigned symbol = Rd_decode_bit(rdec, &bm[1]); + unsigned model = 2 + symbol; + unsigned bit = Rd_decode_bit(rdec, &bm[model]); + model = (model << 1) + bit; + symbol |= (bit << 1); + bit = Rd_decode_bit(rdec, &bm[model]); + model = (model << 1) + bit; + symbol |= (bit << 2); + symbol |= (Rd_decode_bit(rdec, &bm[model]) << 3); + return symbol; +} + +static unsigned +Rd_decode_matched(Range_decoder *rdec, Bit_model bm[], unsigned match_byte) +{ + unsigned symbol = 1; + unsigned mask = 0x100; + while (true) { + unsigned match_bit = (match_byte <<= 1) & mask; + unsigned bit = Rd_decode_bit(rdec, &bm[symbol+match_bit+mask]); + symbol = (symbol << 1) + bit; + if (symbol > 0xFF) + return symbol & 0xFF; + mask &= ~(match_bit ^ (bit << 8)); /* if(match_bit != bit) mask = 0; */ + } +} + +static unsigned +Rd_decode_len(struct Range_decoder *rdec, Len_model *lm, int pos_state) +{ + if (Rd_decode_bit(rdec, &lm->choice1) == 0) + return Rd_decode_tree3(rdec, lm->bm_low[pos_state]); + if (Rd_decode_bit(rdec, &lm->choice2) == 0) + return len_low_syms + Rd_decode_tree3(rdec, lm->bm_mid[pos_state]); + return len_low_syms + len_mid_syms + Rd_decode_tree8(rdec, lm->bm_high); +} + +struct LZ_decoder { + uvlong partial_data_pos; + struct Range_decoder *rdec; + unsigned dict_size; + uchar * buffer; /* output buffer */ + unsigned pos; /* current pos in buffer */ + unsigned stream_pos; /* first byte not yet written to file */ + uint32_t crc; + int outfd; /* output file descriptor */ + bool pos_wrapped; +}; + +void LZd_flush_data(LZ_decoder *d); + +static uchar +LZd_peek_prev(LZ_decoder *d) +{ + if (d->pos > 0) + return d->buffer[d->pos-1]; + if (d->pos_wrapped) + return d->buffer[d->dict_size-1]; + return 0; /* prev_byte of first byte */ +} + +static uchar +LZd_peek(LZ_decoder *d, +unsigned distance) +{ + unsigned i = ((d->pos > distance) ? 0 : d->dict_size) + + d->pos - distance - 1; + return d->buffer[i]; +} + +static void +LZd_put_byte(LZ_decoder *d, uchar b) +{ + d->buffer[d->pos] = b; + if (++d->pos >= d->dict_size) + LZd_flush_data(d); +} + +static void +LZd_copy_block(LZ_decoder *d, unsigned distance, unsigned len) +{ + unsigned lpos = d->pos, i = lpos -distance -1; + bool fast, fast2; + + if (lpos > distance) { + fast = (len < d->dict_size - lpos); + fast2 = (fast && len <= lpos - i); + } else { + i += d->dict_size; + fast = (len < d->dict_size - i); /* (i == pos) may happen */ + fast2 = (fast && len <= i - lpos); + } + if (fast) /* no wrap */ { + d->pos += len; + if (fast2) /* no wrap, no overlap */ + memcpy(d->buffer + lpos, d->buffer + i, len); + else + for (; len > 0; --len) + d->buffer[lpos++] = d->buffer[i++]; + } else + for (; len > 0; --len) { + d->buffer[d->pos] = d->buffer[i]; + if (++d->pos >= d->dict_size) + LZd_flush_data(d); + if (++i >= d->dict_size) + i = 0; + } +} + +static bool +LZd_init(struct LZ_decoder *d, Range_decoder *rde, unsigned dict_size, int ofd) +{ + d->partial_data_pos = 0; + d->rdec = rde; + d->dict_size = dict_size; + d->buffer = (uchar *)malloc(d->dict_size); + if (!d->buffer) + return false; + d->pos = 0; + d->stream_pos = 0; + d->crc = 0xFFFFFFFFU; + d->outfd = ofd; + d->pos_wrapped = false; + return true; +} + +static void +LZd_free(LZ_decoder *d) +{ + free(d->buffer); +} + +static unsigned +LZd_crc(LZ_decoder *d) +{ + return d->crc ^ 0xFFFFFFFFU; +} + +static uvlong +LZd_data_position(LZ_decoder *d) +{ + return d->partial_data_pos + d->pos; +} + +int LZd_decode_member(struct LZ_decoder *d, Pretty_print *pp); diff -Nru /sys/src/cmd/lzip/encoder.c /sys/src/cmd/lzip/encoder.c --- /sys/src/cmd/lzip/encoder.c Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/encoder.c Sat May 1 00:00:00 2021 @@ -0,0 +1,735 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include "lzip.h" +#include "encoder_base.h" +#include "encoder.h" + +CRC32 crc32; + +/* + * starting at data[len] and data[len-delta], what's the longest match + * up to len_limit? + */ +int +maxmatch(uchar *data, int delta, int len, int len_limit) +{ + uchar *pdel, *p; + + p = &data[len]; + pdel = p - delta; + while (*pdel++ == *p++ && len < len_limit) + ++len; + return len; +} + +static int +findpairmaxlen(LZ_encoder *e, Pair **pairsp, int *npairsp, int maxlen, int pos1, + int len_limit, int min_pos, uchar *data, int np2, int np3) +{ + int num_pairs; + Pair *pairs; + + pairs = *pairsp; + num_pairs = *npairsp; + if (np2 > min_pos && e->eb.mb.buffer[np2-1] == data[0]) { + pairs[0].dis = e->eb.mb.pos - np2; + pairs[0].len = maxlen = 2; + num_pairs = 1; + } + if (np2 != np3 && np3 > min_pos && e->eb.mb.buffer[np3-1] == data[0]) { + maxlen = 3; + np2 = np3; + pairs[num_pairs].dis = e->eb.mb.pos - np2; + ++num_pairs; + } + if (num_pairs > 0) { + maxlen = maxmatch(data, pos1 - np2, maxlen, len_limit); + pairs[num_pairs-1].len = maxlen; + if (maxlen >= len_limit) + *pairsp = nil; /* done. now just skip */ + } + if (maxlen < 3) + maxlen = 3; + *npairsp = num_pairs; + return maxlen; +} + +int +LZe_get_match_pairs(LZ_encoder *e, Pair *pairs) +{ + int len = 0, len0, len1, maxlen, num_pairs, len_limit, avail; + int pos1, min_pos, cyclic_pos, delta, count, key2, key3, key4, newpos1; + int32_t *ptr0, *ptr1, *newptr, *prevpos; + uchar *data; + uchar *p; + unsigned tmp; + + len_limit = e->match_len_limit; + avail = Mb_avail_bytes(&e->eb.mb); + if (len_limit > avail) { + len_limit = avail; + if (len_limit < 4) + return 0; + } + + data = Mb_ptr_to_current_pos(&e->eb.mb); + tmp = crc32[data[0]] ^ data[1]; + key2 = tmp & (Nprevpos2 - 1); + tmp ^= (unsigned)data[2] << 8; + key3 = Nprevpos2 + (tmp & (Nprevpos3 - 1)); + key4 = Nprevpos2 + Nprevpos3 + + ((tmp ^ (crc32[data[3]] << 5)) & e->eb.mb.key4_mask); + + min_pos = (e->eb.mb.pos > e->eb.mb.dict_size) ? + e->eb.mb.pos - e->eb.mb.dict_size : 0; + pos1 = e->eb.mb.pos + 1; + prevpos = e->eb.mb.prev_positions; + maxlen = 0; + num_pairs = 0; + if (pairs) + maxlen = findpairmaxlen(e, &pairs, &num_pairs, maxlen, pos1, + len_limit, min_pos, data, prevpos[key2], prevpos[key3]); + newpos1 = prevpos[key4]; + prevpos[key2] = prevpos[key3] = prevpos[key4] = pos1; + + cyclic_pos = e->eb.mb.cyclic_pos; + ptr0 = e->eb.mb.pos_array + (cyclic_pos << 1); + ptr1 = ptr0 + 1; + len0 = len1 = 0; + for (count = e->cycles; ;) { + if (newpos1 <= min_pos || --count < 0) { + *ptr0 = *ptr1 = 0; + break; + } + + delta = pos1 - newpos1; + newptr = e->eb.mb.pos_array + ((cyclic_pos - delta + + (cyclic_pos >= delta? 0: e->eb.mb.dict_size + 1)) << 1); + p = &data[len]; + if (p[-delta] == *p) { + len = maxmatch(data, delta, len + 1, len_limit); + if (pairs && maxlen < len) { + pairs[num_pairs].dis = delta - 1; + pairs[num_pairs].len = maxlen = len; + ++num_pairs; + } + if (len >= len_limit) { + *ptr0 = newptr[0]; + *ptr1 = newptr[1]; + break; + } + p = &data[len]; + } + if (p[-delta] < *p) { + *ptr0 = newpos1; + ptr0 = newptr + 1; + newpos1 = *ptr0; + len0 = len; + if (len1 < len) + len = len1; + } else { + *ptr1 = newpos1; + ptr1 = newptr; + newpos1 = *ptr1; + len1 = len; + if (len0 < len) + len = len0; + } + } + return num_pairs; +} + +static void +LZe_update_distance_prices(LZ_encoder *e) +{ + int dis, len_state; + + for (dis = start_dis_model; dis < modeled_distances; ++dis) { + int dis_slot = dis_slots[dis]; + int direct_bits = (dis_slot >> 1) - 1; + int base = (2 | (dis_slot & 1)) << direct_bits; + int price = price_symbol_reversed(e->eb.bm_dis + (base - dis_slot), + dis - base, direct_bits); + + for (len_state = 0; len_state < len_states; ++len_state) + e->dis_prices[len_state][dis] = price; + } + + for (len_state = 0; len_state < len_states; ++len_state) { + int *dsp = e->dis_slot_prices[len_state]; + int *dp = e->dis_prices[len_state]; + Bit_model * bmds = e->eb.bm_dis_slot[len_state]; + int slot = 0; + + for (; slot < end_dis_model; ++slot) + dsp[slot] = price_symbol6(bmds, slot); + for (; slot < e->num_dis_slots; ++slot) + dsp[slot] = price_symbol6(bmds, slot) + + ((((slot >> 1) - 1) - dis_align_bits) << price_shift_bits); + + for (dis = 0; dis < start_dis_model; ++dis) + dp[dis] = dsp[dis]; + for (; dis < modeled_distances; ++dis) + dp[dis] += dsp[dis_slots[dis]]; + } +} + +static int +pricestate2(LZ_encoder *e, int price, int *ps2p, State *st2p, int len2) +{ + int pos_state2; + State state2; + + state2 = *st2p; + pos_state2 = *ps2p; + + pos_state2 = (pos_state2 + 1) & pos_state_mask; + state2 = St_set_char(state2); + price += price1(e->eb.bm_match[state2][pos_state2]) + + price1(e->eb.bm_rep[state2]) + + LZe_price_rep0_len(e, len2, state2, pos_state2); + + *ps2p = pos_state2; + *st2p = state2; + return price; +} + +static int +encinit(LZ_encoder *e, int reps[num_rep_distances], + int replens[num_rep_distances], State state, int main_len, + int num_pairs, int rep_index, int *ntrialsp) +{ + int i, rep, num_trials, len; + int pos_state = Mb_data_position(&e->eb.mb) & pos_state_mask; + int match_price = price1(e->eb.bm_match[state][pos_state]); + int rep_match_price = match_price + price1(e->eb.bm_rep[state]); + uchar prev_byte = Mb_peek(&e->eb.mb, 1); + uchar cur_byte = Mb_peek(&e->eb.mb, 0); + uchar match_byte = Mb_peek(&e->eb.mb, reps[0] + 1); + + e->trials[1].price = price0(e->eb.bm_match[state][pos_state]); + if (St_is_char(state)) + e->trials[1].price += LZeb_price_literal(&e->eb, + prev_byte, cur_byte); + else + e->trials[1].price += LZeb_price_matched(&e->eb, + prev_byte, cur_byte, match_byte); + e->trials[1].dis4 = -1; /* literal */ + + if (match_byte == cur_byte) + Tr_update(&e->trials[1], rep_match_price + + LZeb_price_shortrep(&e->eb, state, pos_state), 0, 0); + num_trials = replens[rep_index]; + if (num_trials < main_len) + num_trials = main_len; + *ntrialsp = num_trials; + if (num_trials < min_match_len) { + e->trials[0].price = 1; + e->trials[0].dis4 = e->trials[1].dis4; + Mb_move_pos(&e->eb.mb); + return 1; + } + + e->trials[0].state = state; + for (i = 0; i < num_rep_distances; ++i) + e->trials[0].reps[i] = reps[i]; + + for (len = min_match_len; len <= num_trials; ++len) + e->trials[len].price = infinite_price; + + for (rep = 0; rep < num_rep_distances; ++rep) { + int price, replen; + + if (replens[rep] < min_match_len) + continue; + price = rep_match_price + LZeb_price_rep(&e->eb, rep, + state, pos_state); + replen = replens[rep]; + for (len = min_match_len; len <= replen; ++len) + Tr_update(&e->trials[len], price + + Lp_price(&e->rep_len_prices, len, pos_state), rep, 0); + } + + if (main_len > replens[0]) { + int dis, normal_match_price = match_price + + price0(e->eb.bm_rep[state]); + int replp1 = replens[0] + 1; + int i = 0, len = max(replp1, min_match_len); + + while (len > e->pairs[i].len) + ++i; + for (;;) { + dis = e->pairs[i].dis; + Tr_update(&e->trials[len], normal_match_price + + LZe_price_pair(e, dis, len, pos_state), + dis + num_rep_distances, 0); + if (++len > e->pairs[i].len && ++i >= num_pairs) + break; + } + } + return 0; +} + +static void +finalvalues(LZ_encoder *e, int cur, Trial *cur_trial, State *cstatep) +{ + int i; + int dis4 = cur_trial->dis4; + int prev_index = cur_trial->prev_index; + int prev_index2 = cur_trial->prev_index2; + State cur_state; + + if (prev_index2 == single_step_trial) { + cur_state = e->trials[prev_index].state; + if (prev_index + 1 == cur) { /* len == 1 */ + if (dis4 == 0) + cur_state = St_set_short_rep(cur_state); + else + cur_state = St_set_char(cur_state); /* literal */ + } else if (dis4 < num_rep_distances) + cur_state = St_set_rep(cur_state); + else + cur_state = St_set_match(cur_state); + } else { + if (prev_index2 == dual_step_trial) /* dis4 == 0 (rep0) */ + --prev_index; + else /* prev_index2 >= 0 */ + prev_index = prev_index2; + cur_state = 8; /* St_set_char_rep(); */ + } + cur_trial->state = cur_state; + for (i = 0; i < num_rep_distances; ++i) + cur_trial->reps[i] = e->trials[prev_index].reps[i]; + mtf_reps(dis4, cur_trial->reps); /* literal is ignored */ + *cstatep = cur_state; +} + +static int +litrep0(LZ_encoder *e, State cur_state, int cur, Trial *cur_trial, + int num_trials, int triable_bytes, int pos_state, int next_price) +{ + int len = 1, endtrials, limit, mlpl1, dis; + uchar *data = Mb_ptr_to_current_pos(&e->eb.mb); + + dis = cur_trial->reps[0] + 1; + mlpl1 = e->match_len_limit + 1; + limit = min(mlpl1, triable_bytes); + len = maxmatch(data, dis, len, limit); + if (--len >= min_match_len) { + int pos_state2, price; + State state2; + + pos_state2 = (pos_state + 1) & pos_state_mask; + state2 = St_set_char(cur_state); + price = next_price + price1(e->eb.bm_match[state2][pos_state2])+ + price1(e->eb.bm_rep[state2]) + + LZe_price_rep0_len(e, len, state2, pos_state2); + endtrials = cur + 1 + len; + while (num_trials < endtrials) + e->trials[++num_trials].price = infinite_price; + Tr_update2(&e->trials[endtrials], price, cur + 1); + } + return num_trials; +} + +static int +repdists(LZ_encoder *e, State cur_state, int cur, Trial *cur_trial, + int num_trials, int triable_bytes, int pos_state, + int rep_match_price, int len_limit, int *stlenp) +{ + int i, rep, len, price, dis, start_len; + + start_len = *stlenp; + for (rep = 0; rep < num_rep_distances; ++rep) { + uchar *data = Mb_ptr_to_current_pos(&e->eb.mb); + + dis = cur_trial->reps[rep] + 1; + if (data[0-dis] != data[0] || data[1-dis] != data[1]) + continue; + len = maxmatch(data, dis, min_match_len, len_limit); + while (num_trials < cur + len) + e->trials[++num_trials].price = infinite_price; + price = rep_match_price + LZeb_price_rep(&e->eb, rep, + cur_state, pos_state); + for (i = min_match_len; i <= len; ++i) + Tr_update(&e->trials[cur+i], price + + Lp_price(&e->rep_len_prices, i, pos_state), rep, cur); + + if (rep == 0) + start_len = len + 1; /* discard shorter matches */ + + /* try rep + literal + rep0 */ + { + int pos_state2, endtrials, limit, mlpl2, len2; + State state2; + + len2 = len + 1; + mlpl2 = e->match_len_limit + len2; + limit = min(mlpl2, triable_bytes); + len2 = maxmatch(data, dis, len2, limit); + len2 -= len + 1; + if (len2 < min_match_len) + continue; + + pos_state2 = (pos_state + len) & pos_state_mask; + state2 = St_set_rep(cur_state); + price += Lp_price(&e->rep_len_prices, len, pos_state) + + price0(e->eb.bm_match[state2][pos_state2]) + + LZeb_price_matched(&e->eb, data[len-1], + data[len], data[len-dis]); + price = pricestate2(e, price, &pos_state2, + &state2, len2); + endtrials = cur + len + 1 + len2; + while (num_trials < endtrials) + e->trials[++num_trials].price = infinite_price; + Tr_update3(&e->trials[endtrials], price, rep, + endtrials - len2, cur); + } + } + *stlenp = start_len; + return num_trials; +} + +static int +trymatches(LZ_encoder *e, State cur_state, int cur, int num_trials, + int triable_bytes, int pos_state, int num_pairs, + int normal_match_price, int start_len) +{ + int i, dis, len, price; + + i = 0; + while (e->pairs[i].len < start_len) + ++i; + dis = e->pairs[i].dis; + for (len = start_len; ; ++len) { + price = normal_match_price + LZe_price_pair(e, dis, len, pos_state); + Tr_update(&e->trials[cur+len], price, dis + num_rep_distances, cur); + + /* try match + literal + rep0 */ + if (len == e->pairs[i].len) { + uchar *data = Mb_ptr_to_current_pos(&e->eb.mb); + int endtrials, mlpl2, limit; + int dis2 = dis + 1, len2 = len + 1; + + mlpl2 = e->match_len_limit + len2; + limit = min(mlpl2, triable_bytes); + len2 = maxmatch(data, dis2, len2, limit); + len2 -= len + 1; + if (len2 >= min_match_len) { + int pos_state2 = (pos_state + len) &pos_state_mask; + State state2 = St_set_match(cur_state); + + price += price0(e->eb.bm_match[state2][pos_state2]) + + LZeb_price_matched(&e->eb, data[len-1], data[len], data[len-dis2]); + price = pricestate2(e, price, + &pos_state2, &state2, len2); + endtrials = cur + len + 1 + len2; + while (num_trials < endtrials) + e->trials[++num_trials].price = infinite_price; + Tr_update3(&e->trials[endtrials], + price, dis + + num_rep_distances, + endtrials - len2, cur); + } + if (++i >= num_pairs) + break; + dis = e->pairs[i].dis; + } + } + return num_trials; +} + +/* + * Returns the number of bytes advanced (ahead). + trials[0]..trials[ahead-1] contain the steps to encode. + (trials[0].dis4 == -1) means literal. + A match/rep longer or equal than match_len_limit finishes the sequence. + */ +static int +LZe_sequence_optimizer(LZ_encoder *e, int reps[num_rep_distances], State state) +{ + int main_len, num_pairs, i, num_trials; + int rep_index = 0, cur = 0; + int replens[num_rep_distances]; + + if (e->pending_num_pairs > 0) { /* from previous call */ + num_pairs = e->pending_num_pairs; + e->pending_num_pairs = 0; + } else + num_pairs = LZe_read_match_distances(e); + main_len = (num_pairs > 0) ? e->pairs[num_pairs-1].len : 0; + + for (i = 0; i < num_rep_distances; ++i) { + replens[i] = Mb_true_match_len(&e->eb.mb, 0, reps[i] + 1); + if (replens[i] > replens[rep_index]) + rep_index = i; + } + if (replens[rep_index] >= e->match_len_limit) { + e->trials[0].price = replens[rep_index]; + e->trials[0].dis4 = rep_index; + LZe_move_and_update(e, replens[rep_index]); + return replens[rep_index]; + } + + if (main_len >= e->match_len_limit) { + e->trials[0].price = main_len; + e->trials[0].dis4 = e->pairs[num_pairs-1].dis + num_rep_distances; + LZe_move_and_update(e, main_len); + return main_len; + } + + if (encinit(e, reps, replens, state, main_len, num_pairs, rep_index, + &num_trials) > 0) + return 1; + + /* + * Optimize price. + */ + for (;;) { + Trial *cur_trial, *next_trial; + int newlen, pos_state, triable_bytes, len_limit; + int next_price, match_price, rep_match_price; + int start_len = min_match_len; + State cur_state; + uchar prev_byte, cur_byte, match_byte; + + Mb_move_pos(&e->eb.mb); + if (++cur >= num_trials) { /* no more initialized trials */ + LZe_backward(e, cur); + return cur; + } + + num_pairs = LZe_read_match_distances(e); + newlen = num_pairs > 0? e->pairs[num_pairs-1].len: 0; + if (newlen >= e->match_len_limit) { + e->pending_num_pairs = num_pairs; + LZe_backward(e, cur); + return cur; + } + + /* give final values to current trial */ + cur_trial = &e->trials[cur]; + finalvalues(e, cur, cur_trial, &cur_state); + + pos_state = Mb_data_position(&e->eb.mb) & pos_state_mask; + prev_byte = Mb_peek(&e->eb.mb, 1); + cur_byte = Mb_peek(&e->eb.mb, 0); + match_byte = Mb_peek(&e->eb.mb, cur_trial->reps[0] + 1); + + next_price = cur_trial->price + + price0(e->eb.bm_match[cur_state][pos_state]); + if (St_is_char(cur_state)) + next_price += LZeb_price_literal(&e->eb, prev_byte, cur_byte); + else + next_price += LZeb_price_matched(&e->eb, prev_byte, + cur_byte, match_byte); + + /* try last updates to next trial */ + next_trial = &e->trials[cur+1]; + + Tr_update(next_trial, next_price, -1, cur); /* literal */ + + match_price = cur_trial->price + + price1(e->eb.bm_match[cur_state][pos_state]); + rep_match_price = match_price + price1(e->eb.bm_rep[cur_state]); + + if (match_byte == cur_byte && next_trial->dis4 != 0 && + next_trial->prev_index2 == single_step_trial) { + int price = rep_match_price + + LZeb_price_shortrep(&e->eb, cur_state, pos_state); + if (price <= next_trial->price) { + next_trial->price = price; + next_trial->dis4 = 0; /* rep0 */ + next_trial->prev_index = cur; + } + } + + int trm1mcur = max_num_trials - 1 - cur; + + triable_bytes = Mb_avail_bytes(&e->eb.mb); + if (triable_bytes > trm1mcur) + triable_bytes = trm1mcur; + if (triable_bytes < min_match_len) + continue; + + len_limit = min(e->match_len_limit, triable_bytes); + + /* try literal + rep0 */ + if (match_byte != cur_byte && next_trial->prev_index != cur) + num_trials = litrep0(e, cur_state, cur, cur_trial, + num_trials, triable_bytes, pos_state, next_price); + + /* try rep distances */ + num_trials = repdists(e, cur_state, cur, cur_trial, + num_trials, triable_bytes, pos_state, + rep_match_price, len_limit, &start_len); + + /* try matches */ + if (newlen >= start_len && newlen <= len_limit) { + int normal_match_price = match_price + + price0(e->eb.bm_rep[cur_state]); + + while (num_trials < cur + newlen) + e->trials[++num_trials].price = infinite_price; + + num_trials = trymatches(e, cur_state, cur, num_trials, + triable_bytes, pos_state, num_pairs, + normal_match_price, start_len); + } + } +} + +static int +encrepmatch(LZ_encoder *e, State state, int len, int dis, int pos_state) +{ + int bit = (dis == 0); + + Re_encode_bit(&e->eb.renc, &e->eb.bm_rep0[state], !bit); + if (bit) + Re_encode_bit(&e->eb.renc, &e->eb.bm_len[state][pos_state], + len > 1); + else { + Re_encode_bit(&e->eb.renc, &e->eb.bm_rep1[state], dis > 1); + if (dis > 1) + Re_encode_bit(&e->eb.renc, &e->eb.bm_rep2[state], + dis > 2); + } + if (len == 1) + state = St_set_short_rep(state); + else { + Re_encode_len(&e->eb.renc, &e->eb.rep_len_model, len, pos_state); + Lp_decr_counter(&e->rep_len_prices, pos_state); + state = St_set_rep(state); + } + return state; +} + +bool +LZe_encode_member(LZ_encoder *e, uvlong member_size) +{ + uvlong member_size_limit = member_size - Ft_size - max_marker_size; + bool best = (e->match_len_limit > 12); + int dis_price_count = best? 1: 512; + int align_price_count = best? 1: dis_align_size; + int price_count = (e->match_len_limit > 36? 1013 : 4093); + int price_counter = 0; /* counters may decrement below 0 */ + int dis_price_counter = 0; + int align_price_counter = 0; + int ahead, i; + int reps[num_rep_distances]; + State state = 0; + + for (i = 0; i < num_rep_distances; ++i) + reps[i] = 0; + + if (Mb_data_position(&e->eb.mb) != 0 || + Re_member_position(&e->eb.renc) != Fh_size) + return false; /* can be called only once */ + + if (!Mb_data_finished(&e->eb.mb)) { /* encode first byte */ + uchar prev_byte = 0; + uchar cur_byte = Mb_peek(&e->eb.mb, 0); + + Re_encode_bit(&e->eb.renc, &e->eb.bm_match[state][0], 0); + LZeb_encode_literal(&e->eb, prev_byte, cur_byte); + CRC32_update_byte(&e->eb.crc, cur_byte); + LZe_get_match_pairs(e, 0); + Mb_move_pos(&e->eb.mb); + } + + while (!Mb_data_finished(&e->eb.mb)) { + if (price_counter <= 0 && e->pending_num_pairs == 0) { + /* recalculate prices every these many bytes */ + price_counter = price_count; + if (dis_price_counter <= 0) { + dis_price_counter = dis_price_count; + LZe_update_distance_prices(e); + } + if (align_price_counter <= 0) { + align_price_counter = align_price_count; + for (i = 0; i < dis_align_size; ++i) + e->align_prices[i] = price_symbol_reversed( + e->eb.bm_align, i, dis_align_bits); + } + Lp_update_prices(&e->match_len_prices); + Lp_update_prices(&e->rep_len_prices); + } + + ahead = LZe_sequence_optimizer(e, reps, state); + price_counter -= ahead; + + for (i = 0; ahead > 0;) { + int pos_state = (Mb_data_position(&e->eb.mb) - ahead) & + pos_state_mask; + int len = e->trials[i].price; + int dis = e->trials[i].dis4; + bool bit = (dis < 0); + + Re_encode_bit(&e->eb.renc, &e->eb.bm_match[state][pos_state], + !bit); + if (bit) { /* literal byte */ + uchar prev_byte = Mb_peek(&e->eb.mb, ahead+1); + uchar cur_byte = Mb_peek(&e->eb.mb, ahead); + + CRC32_update_byte(&e->eb.crc, cur_byte); + if (St_is_char(state)) + LZeb_encode_literal(&e->eb, prev_byte, + cur_byte); + else { + uchar match_byte = Mb_peek(&e->eb.mb, + ahead + reps[0] + 1); + + LZeb_encode_matched(&e->eb, prev_byte, + cur_byte, match_byte); + } + state = St_set_char(state); + } else { /* match or repeated match */ + CRC32_update_buf(&e->eb.crc, + Mb_ptr_to_current_pos(&e->eb.mb) - ahead, + len); + mtf_reps(dis, reps); + bit = (dis < num_rep_distances); + Re_encode_bit(&e->eb.renc, &e->eb.bm_rep[state], + bit); + if (bit) /* repeated match */ + state = encrepmatch(e, state, len, dis, + pos_state); + else { /* match */ + dis -= num_rep_distances; + LZeb_encode_pair(&e->eb, dis, len, + pos_state); + if (dis >= modeled_distances) + --align_price_counter; + --dis_price_counter; + Lp_decr_counter( + &e->match_len_prices, pos_state); + state = St_set_match(state); + } + } + ahead -= len; + i += len; + if (Re_member_position(&e->eb.renc) >= member_size_limit) { + if (!Mb_dec_pos(&e->eb.mb, ahead)) + return false; + LZeb_full_flush(&e->eb, state); + return true; + } + } + } + LZeb_full_flush(&e->eb, state); + return true; +} diff -Nru /sys/src/cmd/lzip/encoder.h /sys/src/cmd/lzip/encoder.h --- /sys/src/cmd/lzip/encoder.h Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/encoder.h Sat May 1 00:00:00 2021 @@ -0,0 +1,323 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +typedef struct Len_prices Len_prices; +struct Len_prices { + struct Len_model *lm; + int len_syms; + int count; + int prices[pos_states][max_len_syms]; + int counters[pos_states]; /* may decrement below 0 */ +}; + +static void +Lp_update_low_mid_prices(Len_prices *lp, int pos_state) +{ + int *pps = lp->prices[pos_state]; + int tmp = price0(lp->lm->choice1); + int len = 0; + for (; len < len_low_syms && len < lp->len_syms; ++len) + pps[len] = tmp + price_symbol3(lp->lm->bm_low[pos_state], len); + if (len >= lp->len_syms) + return; + tmp = price1(lp->lm->choice1) + price0(lp->lm->choice2); + for (; len < len_low_syms + len_mid_syms && len < lp->len_syms; ++len) + pps[len] = tmp + + price_symbol3(lp->lm->bm_mid[pos_state], len - len_low_syms); +} + +static void +Lp_update_high_prices(Len_prices *lp) +{ + int tmp = price1(lp->lm->choice1) + price1(lp->lm->choice2); + int len; + for (len = len_low_syms + len_mid_syms; len < lp->len_syms; ++len) + /* using 4 slots per value makes "Lp_price" faster */ + lp->prices[3][len] = lp->prices[2][len] = + lp->prices[1][len] = lp->prices[0][len] = tmp + + price_symbol8(lp->lm->bm_high, len - len_low_syms - len_mid_syms); +} + +static void +Lp_reset(Len_prices *lp) +{ + int i; + for (i = 0; i < pos_states; ++i) + lp->counters[i] = 0; +} + +static void +Lp_init(Len_prices *lp, Len_model *lm, int match_len_limit) +{ + lp->lm = lm; + lp->len_syms = match_len_limit + 1 - min_match_len; + lp->count = (match_len_limit > 12) ? 1 : lp->len_syms; + Lp_reset(lp); +} + +static void +Lp_decr_counter(Len_prices *lp, int pos_state) +{ + --lp->counters[pos_state]; +} + +static void +Lp_update_prices(Len_prices *lp) +{ + int pos_state; + bool high_pending = false; + + for (pos_state = 0; pos_state < pos_states; ++pos_state) + if (lp->counters[pos_state] <= 0) { + lp->counters[pos_state] = lp->count; + Lp_update_low_mid_prices(lp, pos_state); + high_pending = true; + } + if (high_pending && lp->len_syms > len_low_syms + len_mid_syms) + Lp_update_high_prices(lp); +} + +typedef struct Pair Pair; +struct Pair { /* distance-length pair */ + int dis; + int len; +}; + +enum { + infinite_price = 0x0FFFFFFF, + max_num_trials = 1 << 13, + single_step_trial = -2, + dual_step_trial = -1 +}; + +typedef struct Trial Trial; +struct Trial { + State state; + int price; /* dual use var; cumulative price, match length */ + int dis4; /* -1 for literal, or rep, or match distance + 4 */ + int prev_index; /* index of prev trial in trials[] */ + int prev_index2; /* -2 trial is single step */ + /* -1 literal + rep0 */ + /* >= 0 (rep or match) + literal + rep0 */ + int reps[num_rep_distances]; +}; + +static void +Tr_update2(Trial *trial, int pr, int p_i) +{ + if (pr < trial->price) { + trial->price = pr; + trial->dis4 = 0; + trial->prev_index = p_i; + trial->prev_index2 = dual_step_trial; + } +} + +static void +Tr_update3(Trial *trial, int pr, int distance4, int p_i, int p_i2) +{ + if (pr < trial->price) { + trial->price = pr; + trial->dis4 = distance4; + trial->prev_index = p_i; + trial->prev_index2 = p_i2; + } +} + +typedef struct LZ_encoder LZ_encoder; +struct LZ_encoder { + LZ_encoder_base eb; + int cycles; + int match_len_limit; + Len_prices match_len_prices; + Len_prices rep_len_prices; + int pending_num_pairs; + Pair pairs[max_match_len+1]; + Trial trials[max_num_trials]; + + int dis_slot_prices[len_states][2*max_dict_bits]; + int dis_prices[len_states][modeled_distances]; + int align_prices[dis_align_size]; + int num_dis_slots; +}; + +static bool +Mb_dec_pos(struct Matchfinder_base *mb, int ahead) +{ + if (ahead < 0 || mb->pos < ahead) + return false; + mb->pos -= ahead; + if (mb->cyclic_pos < ahead) + mb->cyclic_pos += mb->dict_size + 1; + mb->cyclic_pos -= ahead; + return true; +} + +int LZe_get_match_pairs(struct LZ_encoder *e, struct Pair *pairs); + +/* move-to-front dis in/into reps; do nothing if(dis4 <= 0) */ +static void +mtf_reps(int dis4, int reps[num_rep_distances]) +{ + if (dis4 >= num_rep_distances) /* match */ { + reps[3] = reps[2]; + reps[2] = reps[1]; + reps[1] = reps[0]; + reps[0] = dis4 - num_rep_distances; + } else if (dis4 > 0) /* repeated match */ { + int distance = reps[dis4]; + int i; + for (i = dis4; i > 0; --i) + reps[i] = reps[i-1]; + reps[0] = distance; + } +} + +static int +LZeb_price_shortrep(struct LZ_encoder_base *eb, State state, int pos_state) +{ + return price0(eb->bm_rep0[state]) + price0(eb->bm_len[state][pos_state]); +} + +static int +LZeb_price_rep(struct LZ_encoder_base *eb, int rep, State state, int pos_state) +{ + int price; + if (rep == 0) + return price0(eb->bm_rep0[state]) + + price1(eb->bm_len[state][pos_state]); + price = price1(eb->bm_rep0[state]); + if (rep == 1) + price += price0(eb->bm_rep1[state]); + else { + price += price1(eb->bm_rep1[state]); + price += price_bit(eb->bm_rep2[state], rep - 2); + } + return price; +} + +static int +LZe_price_rep0_len(struct LZ_encoder *e, int len, State state, int pos_state) +{ + return LZeb_price_rep(&e->eb, 0, state, pos_state) + + Lp_price(&e->rep_len_prices, len, pos_state); +} + +static int +LZe_price_pair(struct LZ_encoder *e, int dis, int len, int pos_state) +{ + int price = Lp_price(&e->match_len_prices, len, pos_state); + int len_state = get_len_state(len); + if (dis < modeled_distances) + return price + e->dis_prices[len_state][dis]; + else + return price + e->dis_slot_prices[len_state][get_slot(dis)] + + e->align_prices[dis & (dis_align_size - 1)]; +} + +static int +LZe_read_match_distances(struct LZ_encoder *e) +{ + int num_pairs = LZe_get_match_pairs(e, e->pairs); + if (num_pairs > 0) { + int len = e->pairs[num_pairs-1].len; + if (len == e->match_len_limit && len < max_match_len) + e->pairs[num_pairs-1].len = + Mb_true_match_len(&e->eb.mb, len, e->pairs[num_pairs-1].dis + 1); + } + return num_pairs; +} + +static void +LZe_move_and_update(struct LZ_encoder *e, int n) +{ + while (true) { + Mb_move_pos(&e->eb.mb); + if (--n <= 0) + break; + LZe_get_match_pairs(e, 0); + } +} + +static void +LZe_backward(struct LZ_encoder *e, int cur) +{ + int dis4 = e->trials[cur].dis4; + while (cur > 0) { + int prev_index = e->trials[cur].prev_index; + struct Trial *prev_trial = &e->trials[prev_index]; + + if (e->trials[cur].prev_index2 != single_step_trial) { + prev_trial->dis4 = -1; /* literal */ + prev_trial->prev_index = prev_index - 1; + prev_trial->prev_index2 = single_step_trial; + if (e->trials[cur].prev_index2 >= 0) { + struct Trial *prev_trial2 = &e->trials[prev_index-1]; + prev_trial2->dis4 = dis4; + dis4 = 0; /* rep0 */ + prev_trial2->prev_index = e->trials[cur].prev_index2; + prev_trial2->prev_index2 = single_step_trial; + } + } + prev_trial->price = cur - prev_index; /* len */ + cur = dis4; + dis4 = prev_trial->dis4; + prev_trial->dis4 = cur; + cur = prev_index; + } +} + +enum { + Nprevpos3 = 1 << 16, + Nprevpos2 = 1 << 10 +}; + +static bool +LZe_init(struct LZ_encoder *e, int dict_size, int len_limit, int ifd, int outfd) +{ + enum { + before = max_num_trials, + /* bytes to keep in buffer after pos */ + after_size = (2 *max_match_len) + 1, + dict_factor = 2, + Nprevpos23 = Nprevpos2 + Nprevpos3, + pos_array_factor = 2 + }; + + if (!LZeb_init(&e->eb, before, dict_size, after_size, dict_factor, + Nprevpos23, pos_array_factor, ifd, outfd)) + return false; + e->cycles = (len_limit < max_match_len) ? 16 + (len_limit / 2) : 256; + e->match_len_limit = len_limit; + Lp_init(&e->match_len_prices, &e->eb.match_len_model, e->match_len_limit); + Lp_init(&e->rep_len_prices, &e->eb.rep_len_model, e->match_len_limit); + e->pending_num_pairs = 0; + e->num_dis_slots = 2 * real_bits(e->eb.mb.dict_size - 1); + e->trials[1].prev_index = 0; + e->trials[1].prev_index2 = single_step_trial; + return true; +} + +static void +LZe_reset(struct LZ_encoder *e) +{ + LZeb_reset(&e->eb); + Lp_reset(&e->match_len_prices); + Lp_reset(&e->rep_len_prices); + e->pending_num_pairs = 0; +} + +bool LZe_encode_member(struct LZ_encoder *e, uvlong member_size); diff -Nru /sys/src/cmd/lzip/encoder_base.c /sys/src/cmd/lzip/encoder_base.c --- /sys/src/cmd/lzip/encoder_base.c Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/encoder_base.c Sat May 1 00:00:00 2021 @@ -0,0 +1,203 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include "lzip.h" +#include "encoder_base.h" + +Dis_slots dis_slots; +Prob_prices prob_prices; + +bool +Mb_read_block(Matchfinder_base *mb) +{ + if (!mb->at_stream_end && mb->stream_pos < mb->buffer_size) { + int size = mb->buffer_size - mb->stream_pos; + int rd = readblock(mb->infd, mb->buffer + mb->stream_pos, size); + + mb->stream_pos += rd; + if (rd != size && errno) { + show_error( "Read error", errno, false ); + cleanup_and_fail(1); + } + if (rd < size) { + mb->at_stream_end = true; + mb->pos_limit = mb->buffer_size; + } + } + return mb->pos < mb->stream_pos; +} + +void +Mb_normalize_pos(Matchfinder_base *mb) +{ + if (mb->pos > mb->stream_pos) + internal_error( "pos > stream_pos in Mb_normalize_pos." ); + if (!mb->at_stream_end) { + int i, offset = mb->pos - mb->before_size - mb->dict_size; + int size = mb->stream_pos - offset; + + memmove(mb->buffer, mb->buffer + offset, size); + mb->partial_data_pos += offset; + mb->pos -= offset; /* pos = before_size + dict_size */ + mb->stream_pos -= offset; + for (i = 0; i < mb->num_prev_positions; ++i) + if (mb->prev_positions[i] < offset) + mb->prev_positions[i] = 0; + else + mb->prev_positions[i] -= offset; + for (i = 0; i < mb->pos_array_size; ++i) + if (mb->pos_array[i] < offset) + mb->pos_array[i] = 0; + else + mb->pos_array[i] -= offset; + Mb_read_block(mb); + } +} + +bool +Mb_init(Matchfinder_base *mb, int before, int dict_size, int after_size, int dict_factor, int num_prev_positions23, int pos_array_factor, int ifd) +{ + int buffer_size_limit = (dict_factor * dict_size) + before + after_size; + unsigned size; + int i; + + mb->partial_data_pos = 0; + mb->before_size = before; + mb->pos = 0; + mb->cyclic_pos = 0; + mb->stream_pos = 0; + mb->infd = ifd; + mb->at_stream_end = false; + + mb->buffer_size = max(65536, dict_size); + mb->buffer = (uchar *)malloc(mb->buffer_size); + if (!mb->buffer) + return false; + if (Mb_read_block(mb) && !mb->at_stream_end && + mb->buffer_size < buffer_size_limit) { + uchar * tmp; + mb->buffer_size = buffer_size_limit; + tmp = (uchar *)realloc(mb->buffer, mb->buffer_size); + if (!tmp) { + free(mb->buffer); + return false; + } + mb->buffer = tmp; + Mb_read_block(mb); + } + if (mb->at_stream_end && mb->stream_pos < dict_size) + mb->dict_size = max(min_dict_size, mb->stream_pos); + else + mb->dict_size = dict_size; + mb->pos_limit = mb->buffer_size; + if (!mb->at_stream_end) + mb->pos_limit -= after_size; + size = real_bits(mb->dict_size - 1) - 2; + if (size < 16) + size = 16; + size = 1 << size; +// if (mb->dict_size > (1 << 26)) /* 64 MiB */ +// size >>= 1; + mb->key4_mask = size - 1; + size += num_prev_positions23; + + mb->num_prev_positions = size; + mb->pos_array_size = pos_array_factor * (mb->dict_size + 1); + size += mb->pos_array_size; + if (size * sizeof mb->prev_positions[0] <= size) + mb->prev_positions = 0; + else + mb->prev_positions = + (int32_t *)malloc(size * sizeof mb->prev_positions[0]); + if (!mb->prev_positions) { + free(mb->buffer); + return false; + } + mb->pos_array = mb->prev_positions + mb->num_prev_positions; + for (i = 0; i < mb->num_prev_positions; ++i) + mb->prev_positions[i] = 0; + return true; +} + +void +Mb_reset(Matchfinder_base *mb) +{ + int i; + + if (mb->stream_pos > mb->pos) + memmove(mb->buffer, mb->buffer + mb->pos, mb->stream_pos - mb->pos); + mb->partial_data_pos = 0; + mb->stream_pos -= mb->pos; + mb->pos = 0; + mb->cyclic_pos = 0; + for (i = 0; i < mb->num_prev_positions; ++i) + mb->prev_positions[i] = 0; + Mb_read_block(mb); +} + +void +Re_flush_data(Range_encoder *renc) +{ + if (renc->pos > 0) { + if (renc->outfd >= 0 && + writeblock(renc->outfd, renc->buffer, renc->pos) != renc->pos) { + show_error( "Write error", errno, false ); + cleanup_and_fail(1); + } + renc->partial_member_pos += renc->pos; + renc->pos = 0; + show_progress(0, 0, 0, 0); + } +} + +/* End Of Stream mark => (dis == 0xFFFFFFFFU, len == min_match_len) */ +void +LZeb_full_flush(LZ_encoder_base *eb, State state) +{ + int i; + int pos_state = Mb_data_position(&eb->mb) & pos_state_mask; + File_trailer trailer; + Re_encode_bit(&eb->renc, &eb->bm_match[state][pos_state], 1); + Re_encode_bit(&eb->renc, &eb->bm_rep[state], 0); + LZeb_encode_pair(eb, 0xFFFFFFFFU, min_match_len, pos_state); + Re_flush(&eb->renc); + Ft_set_data_crc(trailer, LZeb_crc(eb)); + Ft_set_data_size(trailer, Mb_data_position(&eb->mb)); + Ft_set_member_size(trailer, Re_member_position(&eb->renc) + Ft_size); + for (i = 0; i < Ft_size; ++i) + Re_put_byte(&eb->renc, trailer[i]); + Re_flush_data(&eb->renc); +} + +void +LZeb_reset(LZ_encoder_base *eb) +{ + Mb_reset(&eb->mb); + eb->crc = 0xFFFFFFFFU; + Bm_array_init(eb->bm_literal[0], (1 << literal_context_bits) * 0x300); + Bm_array_init(eb->bm_match[0], states * pos_states); + Bm_array_init(eb->bm_rep, states); + Bm_array_init(eb->bm_rep0, states); + Bm_array_init(eb->bm_rep1, states); + Bm_array_init(eb->bm_rep2, states); + Bm_array_init(eb->bm_len[0], states * pos_states); + Bm_array_init(eb->bm_dis_slot[0], len_states * (1 << dis_slot_bits)); + Bm_array_init(eb->bm_dis, modeled_distances - end_dis_model + 1); + Bm_array_init(eb->bm_align, dis_align_size); + Lm_init(&eb->match_len_model); + Lm_init(&eb->rep_len_model); + Re_reset(&eb->renc); +} diff -Nru /sys/src/cmd/lzip/encoder_base.h /sys/src/cmd/lzip/encoder_base.h --- /sys/src/cmd/lzip/encoder_base.h Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/encoder_base.h Sat May 1 00:00:00 2021 @@ -0,0 +1,559 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + */ + +#include "lzip.h" + +static void +Dis_slots_init(void) +{ + int i, size, slot; + for (slot = 0; slot < 4; ++slot) + dis_slots[slot] = slot; + for (i = 4, size = 2, slot = 4; slot < 20; slot += 2) { + memset(&dis_slots[i], slot, size); + memset(&dis_slots[i+size], slot + 1, size); + size <<= 1; + i += size; + } +} + +static uchar +get_slot(unsigned dis) +{ + if (dis < (1 << 10)) + return dis_slots[dis]; + if (dis < (1 << 19)) + return dis_slots[dis>> 9] + 18; + if (dis < (1 << 28)) + return dis_slots[dis>>18] + 36; + return dis_slots[dis>>27] + 54; +} + +static void +Prob_prices_init(void) +{ + int i, j; + for (i = 0; i < bit_model_total >> price_step_bits; ++i) { + unsigned val = (i * price_step) + (price_step / 2); + int bits = 0; /* base 2 logarithm of val */ + + for (j = 0; j < price_shift_bits; ++j) { + val = val * val; + bits <<= 1; + while (val >= (1 << 16)) { + val >>= 1; + ++bits; + } + } + bits += 15; /* remaining bits in val */ + prob_prices[i] = (bit_model_total_bits << price_shift_bits) - bits; + } +} + +static int +price_symbol3(Bit_model bm[], int symbol) +{ + int price; + bool bit = symbol & 1; + + symbol |= 8; + symbol >>= 1; + price = price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + return price + price_bit(bm[1], symbol & 1); +} + +static int +price_symbol6(Bit_model bm[], unsigned symbol) +{ + int price; + bool bit = symbol & 1; + + symbol |= 64; + symbol >>= 1; + price = price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + return price + price_bit(bm[1], symbol & 1); +} + +static int +price_symbol8(Bit_model bm[], int symbol) +{ + int price; + bool bit = symbol & 1; + symbol |= 0x100; + symbol >>= 1; + price = price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[symbol], bit); + return price + price_bit(bm[1], symbol & 1); +} + +static int +price_symbol_reversed(Bit_model bm[], int symbol, int num_bits) +{ + int price = 0; + int model = 1; + int i; + + for (i = num_bits; i > 0; --i) { + bool bit = symbol & 1; + symbol >>= 1; + price += price_bit(bm[model], bit); + model = (model << 1) | bit; + } + return price; +} + +static int +price_matched(Bit_model bm[], unsigned symbol, unsigned match_byte) +{ + int price = 0; + unsigned mask = 0x100; + + symbol |= mask; + for (;;) { + unsigned match_bit = (match_byte <<= 1) & mask; + bool bit = (symbol <<= 1) & 0x100; + + price += price_bit(bm[(symbol>>9) + match_bit + mask], bit); + if (symbol >= 0x10000) + return price; + mask &= ~(match_bit ^ symbol); + /* if(match_bit != bit) mask = 0; */ + } +} + +struct Matchfinder_base { + uvlong partial_data_pos; + uchar * buffer; /* input buffer */ + int32_t * prev_positions; /* 1 + last seen position of key. else 0 */ + int32_t * pos_array; /* may be tree or chain */ + int before_size; /* bytes to keep in buffer before dictionary */ + int buffer_size; + int dict_size; /* bytes to keep in buffer before pos */ + int pos; /* current pos in buffer */ + int cyclic_pos; /* cycles through [0, dict_size] */ + int stream_pos; /* first byte not yet read from file */ + int pos_limit; /* when reached, a new block must be read */ + int key4_mask; + int num_prev_positions; /* size of prev_positions */ + int pos_array_size; + int infd; /* input file descriptor */ + bool at_stream_end; /* stream_pos shows real end of file */ +}; + +bool Mb_read_block(Matchfinder_base *mb); +void Mb_normalize_pos(Matchfinder_base *mb); +bool Mb_init(Matchfinder_base *mb, int before, int dict_size, int after_size, int dict_factor, int num_prev_positions23, int pos_array_factor, int ifd); + +static void +Mb_free(Matchfinder_base *mb) +{ + free(mb->prev_positions); + free(mb->buffer); +} + +static int +Mb_avail_bytes(Matchfinder_base *mb) +{ + return mb->stream_pos - mb->pos; +} + +static uvlong +Mb_data_position(Matchfinder_base *mb) +{ + return mb->partial_data_pos + mb->pos; +} + +static bool +Mb_data_finished(Matchfinder_base *mb) +{ + return mb->at_stream_end && mb->pos >= mb->stream_pos; +} + +static int +Mb_true_match_len(Matchfinder_base *mb, int index, int distance) +{ + uchar * data = mb->buffer + mb->pos; + int i = index; + int len_limit = min(Mb_avail_bytes(mb), max_match_len); + while (i < len_limit && data[i-distance] == data[i]) + ++i; + return i; +} + +static void +Mb_move_pos(Matchfinder_base *mb) +{ + if (++mb->cyclic_pos > mb->dict_size) + mb->cyclic_pos = 0; + if (++mb->pos >= mb->pos_limit) + Mb_normalize_pos(mb); +} + +void Mb_reset(Matchfinder_base *mb); + +enum { re_buffer_size = 65536 }; + +typedef struct LZ_encoder_base LZ_encoder_base; +typedef struct Matchfinder_base Matchfinder_base; +typedef struct Range_encoder Range_encoder; + +struct Range_encoder { + uvlong low; + uvlong partial_member_pos; + uchar * buffer; /* output buffer */ + int pos; /* current pos in buffer */ + uint32_t range; + unsigned ff_count; + int outfd; /* output file descriptor */ + uchar cache; + File_header header; +}; + +void Re_flush_data(Range_encoder *renc); + +static void +Re_put_byte(Range_encoder *renc, uchar b) +{ + renc->buffer[renc->pos] = b; + if (++renc->pos >= re_buffer_size) + Re_flush_data(renc); +} + +static void +Re_shift_low(Range_encoder *renc) +{ + if (renc->low >> 24 != 0xFF) { + bool carry = (renc->low > 0xFFFFFFFFU); + Re_put_byte(renc, renc->cache + carry); + for (; renc->ff_count > 0; --renc->ff_count) + Re_put_byte(renc, 0xFF + carry); + renc->cache = renc->low >> 24; + } else + ++renc->ff_count; + renc->low = (renc->low & 0x00FFFFFFU) << 8; +} + +static void +Re_reset(Range_encoder *renc) +{ + int i; + renc->low = 0; + renc->partial_member_pos = 0; + renc->pos = 0; + renc->range = 0xFFFFFFFFU; + renc->ff_count = 0; + renc->cache = 0; + for (i = 0; i < Fh_size; ++i) + Re_put_byte(renc, renc->header[i]); +} + +static bool +Re_init(Range_encoder *renc, unsigned dict_size, int ofd) +{ + renc->buffer = (uchar *)malloc(re_buffer_size); + if (!renc->buffer) + return false; + renc->outfd = ofd; + Fh_set_magic(renc->header); + Fh_set_dict_size(renc->header, dict_size); + Re_reset(renc); + return true; +} + +static void +Re_free(Range_encoder *renc) +{ + free(renc->buffer); +} + +static uvlong +Re_member_position(Range_encoder *renc) +{ + return renc->partial_member_pos + renc->pos + renc->ff_count; +} + +static void +Re_flush(Range_encoder *renc) +{ + int i; + for (i = 0; i < 5; ++i) + Re_shift_low(renc); +} + +static void +Re_encode(Range_encoder *renc, int symbol, int num_bits) +{ + unsigned mask; + for (mask = 1 << (num_bits - 1); mask > 0; mask >>= 1) { + renc->range >>= 1; + if (symbol & mask) + renc->low += renc->range; + if (renc->range <= 0x00FFFFFFU) { + renc->range <<= 8; + Re_shift_low(renc); + } + } +} + +static void +Re_encode_bit(Range_encoder *renc, Bit_model *probability, bool bit) +{ + Bit_model prob = *probability; + uint32_t bound = (renc->range >> bit_model_total_bits) * prob; + + if (!bit) { + renc->range = bound; + *probability += (bit_model_total - prob) >> bit_model_move_bits; + } else { + renc->low += bound; + renc->range -= bound; + *probability -= prob >> bit_model_move_bits; + } + if (renc->range <= 0x00FFFFFFU) { + renc->range <<= 8; + Re_shift_low(renc); + } +} + +static void +Re_encode_tree3(Range_encoder *renc, Bit_model bm[], int symbol) +{ + int model = 1; + bool bit = (symbol >> 2) & 1; + + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + bit = (symbol >> 1) & 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + Re_encode_bit(renc, &bm[model], symbol & 1); +} + +static void +Re_encode_tree6(Range_encoder *renc, Bit_model bm[], unsigned symbol) +{ + int model = 1; + bool bit = (symbol >> 5) & 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + bit = (symbol >> 4) & 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + bit = (symbol >> 3) & 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + bit = (symbol >> 2) & 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + bit = (symbol >> 1) & 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + Re_encode_bit(renc, &bm[model], symbol & 1); +} + +static void +Re_encode_tree8(Range_encoder *renc, Bit_model bm[], int symbol) +{ + int model = 1; + int i; + for (i = 7; i >= 0; --i) { + bool bit = (symbol >> i) & 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + } +} + +static void +Re_encode_tree_reversed(Range_encoder *renc, Bit_model bm[], int symbol, int num_bits) +{ + int model = 1; + int i; + for (i = num_bits; i > 0; --i) { + bool bit = symbol & 1; + symbol >>= 1; + Re_encode_bit(renc, &bm[model], bit); + model = (model << 1) | bit; + } +} + +static void +Re_encode_matched(Range_encoder *renc, Bit_model bm[], unsigned symbol, unsigned match_byte) +{ + unsigned mask = 0x100; + symbol |= mask; + while (true) { + unsigned match_bit = (match_byte <<= 1) & mask; + bool bit = (symbol <<= 1) & 0x100; + Re_encode_bit(renc, &bm[(symbol>>9)+match_bit+mask], bit); + if (symbol >= 0x10000) + break; + mask &= ~(match_bit ^ symbol); + /* if(match_bit != bit) mask = 0; */ + } +} + +static void +Re_encode_len(struct Range_encoder *renc, Len_model *lm, int symbol, int pos_state) +{ + bool bit = ((symbol -= min_match_len) >= len_low_syms); + Re_encode_bit(renc, &lm->choice1, bit); + if (!bit) + Re_encode_tree3(renc, lm->bm_low[pos_state], symbol); + else { + bit = ((symbol -= len_low_syms) >= len_mid_syms); + Re_encode_bit(renc, &lm->choice2, bit); + if (!bit) + Re_encode_tree3(renc, lm->bm_mid[pos_state], symbol); + else + Re_encode_tree8(renc, lm->bm_high, symbol - len_mid_syms); + } +} + +enum { + max_marker_size = 16, + num_rep_distances = 4 /* must be 4 */ +}; + +struct LZ_encoder_base { + struct Matchfinder_base mb; + uint32_t crc; + + Bit_model bm_literal[1<mb, before, dict_size, after_size, dict_factor, + num_prev_positions23, pos_array_factor, ifd)) + return false; + if (!Re_init(&eb->renc, eb->mb.dict_size, outfd)) + return false; + LZeb_reset(eb); + return true; +} + +static void +LZeb_free(LZ_encoder_base *eb) +{ + Re_free(&eb->renc); + Mb_free(&eb->mb); +} + +static unsigned +LZeb_crc(LZ_encoder_base *eb) +{ + return eb->crc ^ 0xFFFFFFFFU; +} + +static int +LZeb_price_literal(LZ_encoder_base *eb, uchar prev_byte, uchar symbol) +{ + return price_symbol8(eb->bm_literal[get_lit_state(prev_byte)], symbol); +} + +static int +LZeb_price_matched(LZ_encoder_base *eb, uchar prev_byte, uchar symbol, uchar match_byte) +{ + return price_matched(eb->bm_literal[get_lit_state(prev_byte)], symbol, + match_byte); +} + +static void +LZeb_encode_literal(LZ_encoder_base *eb, uchar prev_byte, uchar symbol) +{ + Re_encode_tree8(&eb->renc, eb->bm_literal[get_lit_state(prev_byte)], + symbol); +} + +static void +LZeb_encode_matched(LZ_encoder_base *eb, uchar prev_byte, uchar symbol, uchar match_byte) +{ + Re_encode_matched(&eb->renc, eb->bm_literal[get_lit_state(prev_byte)], + symbol, match_byte); +} + +static void +LZeb_encode_pair(LZ_encoder_base *eb, unsigned dis, int len, int pos_state) +{ + unsigned dis_slot = get_slot(dis); + Re_encode_len(&eb->renc, &eb->match_len_model, len, pos_state); + Re_encode_tree6(&eb->renc, eb->bm_dis_slot[get_len_state(len)], dis_slot); + + if (dis_slot >= start_dis_model) { + int direct_bits = (dis_slot >> 1) - 1; + unsigned base = (2 | (dis_slot & 1)) << direct_bits; + unsigned direct_dis = dis - base; + + if (dis_slot < end_dis_model) + Re_encode_tree_reversed(&eb->renc, eb->bm_dis + (base - dis_slot), + direct_dis, direct_bits); + else { + Re_encode(&eb->renc, direct_dis >> dis_align_bits, + direct_bits - dis_align_bits); + Re_encode_tree_reversed(&eb->renc, eb->bm_align, direct_dis, dis_align_bits); + } + } +} + +void LZeb_full_flush(LZ_encoder_base *eb, State state); diff -Nru /sys/src/cmd/lzip/fast_encoder.c /sys/src/cmd/lzip/fast_encoder.c --- /sys/src/cmd/lzip/fast_encoder.c Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/fast_encoder.c Sat May 1 00:00:00 2021 @@ -0,0 +1,188 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include "lzip.h" +#include "encoder_base.h" +#include "fast_encoder.h" + +int +FLZe_longest_match_len(FLZ_encoder *fe, int *distance) +{ + enum { len_limit = 16 }; + uchar *data = Mb_ptr_to_current_pos(&fe->eb.mb); + int32_t * ptr0 = fe->eb.mb.pos_array + fe->eb.mb.cyclic_pos; + int pos1 = fe->eb.mb.pos + 1; + int maxlen = 0, newpos1, count; + int available = min(Mb_avail_bytes(&fe->eb.mb), max_match_len); + + if (available < len_limit) + return 0; + + fe->key4 = ((fe->key4 << 4) ^ data[3]) & fe->eb.mb.key4_mask; + newpos1 = fe->eb.mb.prev_positions[fe->key4]; + fe->eb.mb.prev_positions[fe->key4] = pos1; + + for (count = 4; ;) { + int32_t * newptr; + int delta; + + if (newpos1 <= 0 || --count < 0 || + (delta = pos1 - newpos1) > fe->eb.mb.dict_size) { + *ptr0 = 0; + break; + } + newptr = fe->eb.mb.pos_array + + (fe->eb.mb.cyclic_pos - delta + + ((fe->eb.mb.cyclic_pos >= delta) ? 0 : fe->eb.mb.dict_size + 1)); + + if (data[maxlen-delta] == data[maxlen]) { + int len = 0; + while (len < available && data[len-delta] == data[len]) + ++len; + if (maxlen < len) { + maxlen = len; + *distance = delta - 1; + if (maxlen >= len_limit) { + *ptr0 = *newptr; + break; + } + } + } + + *ptr0 = newpos1; + ptr0 = newptr; + newpos1 = *ptr0; + } + return maxlen; +} + +bool +FLZe_encode_member(FLZ_encoder *fe, uvlong member_size) +{ + uvlong member_size_limit = member_size - Ft_size - max_marker_size; + int rep = 0, i; + int reps[num_rep_distances]; + State state = 0; + + for (i = 0; i < num_rep_distances; ++i) + reps[i] = 0; + + if (Mb_data_position(&fe->eb.mb) != 0 || + Re_member_position(&fe->eb.renc) != Fh_size) + return false; /* can be called only once */ + + if (!Mb_data_finished(&fe->eb.mb)) /* encode first byte */ { + uchar prev_byte = 0; + uchar cur_byte = Mb_peek(&fe->eb.mb, 0); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_match[state][0], 0); + LZeb_encode_literal(&fe->eb, prev_byte, cur_byte); + CRC32_update_byte(&fe->eb.crc, cur_byte); + FLZe_reset_key4(fe); + FLZe_update_and_move(fe, 1); + } + + while (!Mb_data_finished(&fe->eb.mb) && + Re_member_position(&fe->eb.renc) < member_size_limit) { + int match_distance; + int main_len = FLZe_longest_match_len(fe, &match_distance); + int pos_state = Mb_data_position(&fe->eb.mb) & pos_state_mask; + int len = 0; + + for (i = 0; i < num_rep_distances; ++i) { + int tlen = Mb_true_match_len(&fe->eb.mb, 0, reps[i] + 1); + if (tlen > len) { + len = tlen; + rep = i; + } + } + if (len > min_match_len && len + 3 > main_len) { + CRC32_update_buf(&fe->eb.crc, Mb_ptr_to_current_pos(&fe->eb.mb), len); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_match[state][pos_state], 1); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_rep[state], 1); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_rep0[state], rep != 0); + if (rep == 0) + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_len[state][pos_state], 1); + else { + int distance; + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_rep1[state], rep > 1); + if (rep > 1) + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_rep2[state], rep > 2); + distance = reps[rep]; + for (i = rep; i > 0; --i) + reps[i] = reps[i-1]; + reps[0] = distance; + } + state = St_set_rep(state); + Re_encode_len(&fe->eb.renc, &fe->eb.rep_len_model, len, pos_state); + Mb_move_pos(&fe->eb.mb); + FLZe_update_and_move(fe, len - 1); + continue; + } + + if (main_len > min_match_len) { + CRC32_update_buf(&fe->eb.crc, Mb_ptr_to_current_pos(&fe->eb.mb), main_len); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_match[state][pos_state], 1); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_rep[state], 0); + state = St_set_match(state); + for (i = num_rep_distances - 1; i > 0; --i) + reps[i] = reps[i-1]; + reps[0] = match_distance; + LZeb_encode_pair(&fe->eb, match_distance, main_len, pos_state); + Mb_move_pos(&fe->eb.mb); + FLZe_update_and_move(fe, main_len - 1); + continue; + } + + { + uchar prev_byte = Mb_peek(&fe->eb.mb, 1); + uchar cur_byte = Mb_peek(&fe->eb.mb, 0); + uchar match_byte = Mb_peek(&fe->eb.mb, reps[0] + 1); + Mb_move_pos(&fe->eb.mb); + CRC32_update_byte(&fe->eb.crc, cur_byte); + + if (match_byte == cur_byte) { + int short_rep_price = price1(fe->eb.bm_match[state][pos_state]) + + price1(fe->eb.bm_rep[state]) + + price0(fe->eb.bm_rep0[state]) + + price0(fe->eb.bm_len[state][pos_state]); + int price = price0(fe->eb.bm_match[state][pos_state]); + if (St_is_char(state)) + price += LZeb_price_literal(&fe->eb, prev_byte, cur_byte); + else + price += LZeb_price_matched(&fe->eb, prev_byte, cur_byte, match_byte); + if (short_rep_price < price) { + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_match[state][pos_state], 1); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_rep[state], 1); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_rep0[state], 0); + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_len[state][pos_state], 0); + state = St_set_short_rep(state); + continue; + } + } + + /* literal byte */ + Re_encode_bit(&fe->eb.renc, &fe->eb.bm_match[state][pos_state], 0); + if (St_is_char(state)) + LZeb_encode_literal(&fe->eb, prev_byte, cur_byte); + else + LZeb_encode_matched(&fe->eb, prev_byte, cur_byte, match_byte); + state = St_set_char(state); + } + } + + LZeb_full_flush(&fe->eb, state); + return true; +} diff -Nru /sys/src/cmd/lzip/fast_encoder.h /sys/src/cmd/lzip/fast_encoder.h --- /sys/src/cmd/lzip/fast_encoder.h Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/fast_encoder.h Sat May 1 00:00:00 2021 @@ -0,0 +1,71 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +typedef struct FLZ_encoder FLZ_encoder; +struct FLZ_encoder { + struct LZ_encoder_base eb; + unsigned key4; /* key made from latest 4 bytes */ +}; + +static void +FLZe_reset_key4(FLZ_encoder *fe) +{ + int i; + fe->key4 = 0; + for (i = 0; i < 3 && i < Mb_avail_bytes(&fe->eb.mb); ++i) + fe->key4 = (fe->key4 << 4) ^ fe->eb.mb.buffer[i]; +} + +int FLZe_longest_match_len(FLZ_encoder *fe, int *distance); + +static void +FLZe_update_and_move(FLZ_encoder *fe, int n) +{ + while (--n >= 0) { + if (Mb_avail_bytes(&fe->eb.mb) >= 4) { + fe->key4 = ((fe->key4 << 4) ^ fe->eb.mb.buffer[fe->eb.mb.pos+3]) & + fe->eb.mb.key4_mask; + fe->eb.mb.pos_array[fe->eb.mb.cyclic_pos] = fe->eb.mb.prev_positions[fe->key4]; + fe->eb.mb.prev_positions[fe->key4] = fe->eb.mb.pos + 1; + } + Mb_move_pos(&fe->eb.mb); + } +} + +static bool +FLZe_init(FLZ_encoder *fe, int ifd, int outfd) +{ + enum { + before = 0, + dict_size = 65536, + /* bytes to keep in buffer after pos */ + after_size = max_match_len, + dict_factor = 16, + num_prev_positions23 = 0, + pos_array_factor = 1 + }; + + return LZeb_init(&fe->eb, before, dict_size, after_size, dict_factor, + num_prev_positions23, pos_array_factor, ifd, outfd); +} + +static void +FLZe_reset(FLZ_encoder *fe) +{ + LZeb_reset(&fe->eb); +} + +bool FLZe_encode_member(FLZ_encoder *fe, uvlong member_size); diff -Nru /sys/src/cmd/lzip/lzip.h /sys/src/cmd/lzip/lzip.h --- /sys/src/cmd/lzip/lzip.h Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/lzip.h Sat May 1 00:00:00 2021 @@ -0,0 +1,497 @@ +/* Clzip - LZMA lossless data compressor + Copyright (C) 2010-2017 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + */ + +#ifndef _LZIP_H +#define _LZIP_H + +#include +#include +#include +#include + +#define exit(n) exits((n) == 0? 0: "err") +#define isatty(fd) 0 +#define lseek seek + +#ifndef max +#define max(x,y) ((x) >= (y) ? (x) : (y)) +#endif +#ifndef min +#define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif + +typedef int State; +typedef long int32_t; +typedef ulong uint32_t; +typedef int bool; + +enum { false, true }; + +enum { states = 12 }; +enum { + min_dict_bits = 12, + min_dict_size = 1 << min_dict_bits, /* >= modeled_distances */ + max_dict_bits = 29, + max_dict_size = 1 << max_dict_bits, + min_member_size = 36, + literal_context_bits = 3, + literal_pos_state_bits = 0, /* not used */ + pos_state_bits = 2, + pos_states = 1 << pos_state_bits, + pos_state_mask = pos_states -1, + + len_states = 4, + dis_slot_bits = 6, + start_dis_model = 4, + end_dis_model = 14, + modeled_distances = 1 << (end_dis_model / 2), /* 128 */ + dis_align_bits = 4, + dis_align_size = 1 << dis_align_bits, + + len_low_bits = 3, + len_mid_bits = 3, + len_high_bits = 8, + len_low_syms = 1 << len_low_bits, + len_mid_syms = 1 << len_mid_bits, + len_high_syms = 1 << len_high_bits, + max_len_syms = len_low_syms + len_mid_syms + len_high_syms, + + min_match_len = 2, /* must be 2 */ + max_match_len = min_match_len + max_len_syms - 1, /* 273 */ + min_match_len_limit = 5, + + bit_model_move_bits = 5, + bit_model_total_bits = 11, + bit_model_total = 1 << bit_model_total_bits, +}; + +typedef struct Len_model Len_model; +typedef struct Pretty_print Pretty_print; +typedef struct Matchfinder_base Matchfinder_base; +typedef int Bit_model; + +struct Len_model { + Bit_model choice1; + Bit_model choice2; + Bit_model bm_low[pos_states][len_low_syms]; + Bit_model bm_mid[pos_states][len_mid_syms]; + Bit_model bm_high[len_high_syms]; +}; +struct Pretty_print { + char *name; + char *stdin_name; + ulong longest_name; + bool first_post; +}; + +typedef ulong CRC32[256]; /* Table of CRCs of all 8-bit messages. */ + +extern CRC32 crc32; + +#define errno 0 + +static uchar magic_string[4] = { "LZIP" }; + +typedef uchar File_header[6]; /* 0-3 magic bytes */ +/* 4 version */ +/* 5 coded_dict_size */ +enum { Fh_size = 6 }; + +typedef uchar File_trailer[20]; +/* 0-3 CRC32 of the uncompressed data */ +/* 4-11 size of the uncompressed data */ +/* 12-19 member size including header and trailer */ + +enum { Ft_size = 20 }; + +enum { + price_shift_bits = 6, + price_step_bits = 2, + price_step = 1 << price_step_bits, +}; + +typedef uchar Dis_slots[1<<10]; +typedef short Prob_prices[bit_model_total >> price_step_bits]; + +extern Dis_slots dis_slots; +extern Prob_prices prob_prices; + +#define get_price(prob) prob_prices[(prob) >> price_step_bits] +#define price0(prob) get_price(prob) +#define price1(prob) get_price(bit_model_total - (prob)) +#define price_bit(bm, bit) ((bit)? price1(bm): price0(bm)) + +#define Mb_ptr_to_current_pos(mb) ((mb)->buffer + (mb)->pos) +#define Mb_peek(mb, distance) (mb)->buffer[(mb)->pos - (distance)] + +#define Lp_price(lp, len, pos_state) \ + (lp)->prices[pos_state][(len) - min_match_len] + +#define Tr_update(trial, pr, distance4, p_i) \ +{ \ + if ((pr) < (trial)->price) { \ + (trial)->price = pr; \ + (trial)->dis4 = distance4; \ + (trial)->prev_index = p_i; \ + (trial)->prev_index2 = single_step_trial; \ + } else { \ + } \ +} + +/* these functions are now extern and must be defined exactly once */ +#ifdef _DEFINE_INLINES +#define _INLINES_DEFINED + +int +get_len_state(int len) +{ + int lenstm1, lenmmm; + + lenmmm = len - min_match_len; + lenstm1 = len_states - 1; + if (lenmmm < lenstm1) + return lenmmm; + else + return lenstm1; +} + +State +St_set_char(State st) +{ + static State next[states] = { 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 4, 5 }; + + assert((unsigned)st < nelem(next)); + return next[st]; +} + +int +get_lit_state(uchar prev_byte) +{ + return prev_byte >> (8 - literal_context_bits); +} + +void +Bm_init(Bit_model *probability) +{ + *probability = bit_model_total / 2; +} + +void +Bm_array_init(Bit_model bm[], int size) +{ + int i; + + for (i = 0; i < size; ++i) + Bm_init(&bm[i]); +} + +void +Lm_init(Len_model *lm) +{ + Bm_init(&lm->choice1); + Bm_init(&lm->choice2); + Bm_array_init(lm->bm_low[0], pos_states * len_low_syms); + Bm_array_init(lm->bm_mid[0], pos_states * len_mid_syms); + Bm_array_init(lm->bm_high, len_high_syms); +} + +void +Pp_init(Pretty_print *pp, char *filenames[], int num_filenames, int verbosity) +{ + unsigned stdin_name_len; + int i; + + pp->name = 0; + pp->stdin_name = "(stdin)"; + pp->longest_name = 0; + pp->first_post = false; + + if (verbosity <= 0) + return; + stdin_name_len = strlen(pp->stdin_name); + for (i = 0; i < num_filenames; ++i) { + char *s = filenames[i]; + unsigned len = strcmp(s, "-") == 0? stdin_name_len: strlen(s); + + if (len > pp->longest_name) + pp->longest_name = len; + } + if (pp->longest_name == 0) + pp->longest_name = stdin_name_len; +} + +void +Pp_set_name(Pretty_print *pp, char *filename) +{ + if ( filename && filename[0] && strcmp( filename, "-" ) != 0 ) + pp->name = filename; + else + pp->name = pp->stdin_name; + pp->first_post = true; +} + +void +Pp_reset(Pretty_print *pp) +{ + if (pp->name && pp->name[0]) + pp->first_post = true; +} + +void +Pp_show_msg(Pretty_print *pp, char *msg); + +void +CRC32_init(void) +{ + unsigned n; + + for (n = 0; n < 256; ++n) { + unsigned c = n; + int k; + for (k = 0; k < 8; ++k) { + if (c & 1) + c = 0xEDB88320U ^ (c >> 1); + else + c >>= 1; + } + crc32[n] = c; + } +} + +void +CRC32_update_byte(uint32_t *crc, uchar byte) +{ + *crc = crc32[(*crc^byte)&0xFF] ^ (*crc >> 8); +} + +void +CRC32_update_buf(uint32_t *crc, uchar *buffer, int size) +{ + int i; + uint32_t c = *crc; + for (i = 0; i < size; ++i) + c = crc32[(c^buffer[i])&0xFF] ^ (c >> 8); + *crc = c; +} + +bool +isvalid_ds(unsigned dict_size) +{ + return (dict_size >= min_dict_size && + dict_size <= max_dict_size); +} + +int +real_bits(unsigned value) +{ + int bits = 0; + + while (value > 0) { + value >>= 1; + ++bits; + } + return bits; +} + +void +Fh_set_magic(File_header data) +{ + memcpy(data, magic_string, 4); + data[4] = 1; +} + +bool +Fh_verify_magic(File_header data) +{ + return (memcmp(data, magic_string, 4) == 0); +} + +/* detect truncated header */ +bool +Fh_verify_prefix(File_header data, int size) +{ + int i; + for (i = 0; i < size && i < 4; ++i) + if (data[i] != magic_string[i]) + return false; + return (size > 0); +} + +uchar +Fh_version(File_header data) +{ + return data[4]; +} + +bool +Fh_verify_version(File_header data) +{ + return (data[4] == 1); +} + +unsigned +Fh_get_dict_size(File_header data) +{ + unsigned sz = (1 << (data[5] &0x1F)); + if (sz > min_dict_size) + sz -= (sz / 16) * ((data[5] >> 5) & 7); + return sz; +} + +bool +Fh_set_dict_size(File_header data, unsigned sz) +{ + if (!isvalid_ds(sz)) + return false; + data[5] = real_bits(sz - 1); + if (sz > min_dict_size) { + unsigned base_size = 1 << data[5]; + unsigned fraction = base_size / 16; + unsigned i; + for (i = 7; i >= 1; --i) + if (base_size - (i * fraction) >= sz) { + data[5] |= (i << 5); + break; + } + } + return true; +} + +unsigned +Ft_get_data_crc(File_trailer data) +{ + unsigned tmp = 0; + int i; + for (i = 3; i >= 0; --i) { + tmp <<= 8; + tmp += data[i]; + } + return tmp; +} + +void +Ft_set_data_crc(File_trailer data, unsigned crc) +{ + int i; + for (i = 0; i <= 3; ++i) { + data[i] = (uchar)crc; + crc >>= 8; + } +} + +uvlong +Ft_get_data_size(File_trailer data) +{ + uvlong tmp = 0; + int i; + for (i = 11; i >= 4; --i) { + tmp <<= 8; + tmp += data[i]; + } + return tmp; +} + +void +Ft_set_data_size(File_trailer data, uvlong sz) +{ + int i; + for (i = 4; i <= 11; ++i) { + data[i] = (uchar)sz; + sz >>= 8; + } +} + +uvlong +Ft_get_member_size(File_trailer data) +{ + uvlong tmp = 0; + int i; + for (i = 19; i >= 12; --i) { + tmp <<= 8; + tmp += data[i]; + } + return tmp; +} + +void +Ft_set_member_size(File_trailer data, uvlong sz) +{ + int i; + for (i = 12; i <= 19; ++i) { + data[i] = (uchar)sz; + sz >>= 8; + } +} +#else /* _DEFINE_INLINES */ +void Bm_array_init(Bit_model bm[], int size); +void Bm_init(Bit_model *probability); +void CRC32_init(void); +void CRC32_update_buf(uint32_t *crc, uchar *buffer, int size); +void CRC32_update_byte(uint32_t *crc, uchar byte); +unsigned Fh_get_dict_size(File_header data); +bool Fh_set_dict_size(File_header data, unsigned sz); +void Fh_set_magic(File_header data); +bool Fh_verify_magic(File_header data); +bool Fh_verify_prefix(File_header data, int size); +bool Fh_verify_version(File_header data); +uchar Fh_version(File_header data); +unsigned Ft_get_data_crc(File_trailer data); +uvlong Ft_get_data_size(File_trailer data); +uvlong Ft_get_member_size(File_trailer data); +void Ft_set_data_crc(File_trailer data, unsigned crc); +void Ft_set_data_size(File_trailer data, uvlong sz); +void Ft_set_member_size(File_trailer data, uvlong sz); +void Lm_init(Len_model *lm); +void Pp_init(Pretty_print *pp, char *filenames[], int num_filenames, int verbosity); +void Pp_reset(Pretty_print *pp); +void Pp_set_name(Pretty_print *pp, char *filename); +void Pp_show_msg(Pretty_print *pp, char *msg); +State St_set_char(State st); +int get_lit_state(uchar prev_byte); +int get_len_state(int len); +bool isvalid_ds(unsigned dict_size); +int real_bits(unsigned value); +#endif /* _DEFINE_INLINES */ + +#define St_is_char(state) ((state) < 7) +#define St_set_match(state) ((state) < 7? 7: 10) +#define St_set_rep(state) ((state) < 7? 8: 11) +#define St_set_short_rep(state) ((state) < 7? 9: 11) + +static char *bad_magic_msg = "Bad magic number (file not in lzip format)."; +static char *bad_dict_msg = "Invalid dictionary size in member header."; +static char *trailing_msg = "Trailing data not allowed."; + +/* defined in decoder.c */ +int readblock(int fd, uchar *buf, int size); +int writeblock(int fd, uchar *buf, int size); + +/* defined in main.c */ +extern int verbosity; +Dir; +char *bad_version(unsigned version); +char *format_ds(unsigned dict_size); +int open_instream(char *name, Dir *in_statsp, bool no_ofile, bool reg_only); +void *resize_buffer(void *buf, unsigned min_size); +void cleanup_and_fail(int retval); +void show_error(char *msg, int errcode, bool help); +void show_file_error(char *filename, char *msg, int errcode); +void internal_error(char *msg); +struct Matchfinder_base; +void show_progress(uvlong partial_size, Matchfinder_base *m, Pretty_print *p, + uvlong cfile_size); +#endif diff -Nru /sys/src/cmd/lzip/main.c /sys/src/cmd/lzip/main.c --- /sys/src/cmd/lzip/main.c Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/main.c Sat May 1 00:00:00 2021 @@ -0,0 +1,883 @@ +/* + * Clzip - LZMA lossless data compressor + * Copyright (C) 2010-2017 Antonio Diaz Diaz. + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ +/* + * Exit status: 0 for a normal exit, 1 for environmental problems + * (file not found, invalid flags, I/O errors, etc), 2 to indicate a + * corrupt or invalid input file, 3 for an internal consistency error + * (eg, bug) which caused lzip to panic. + */ + +#define _DEFINE_INLINES +#include "lzip.h" +#include "decoder.h" +#include "encoder_base.h" +#include "encoder.h" +#include "fast_encoder.h" + +int verbosity = 0; + +char *argv0 = "lzip"; + +struct { + char * from; + char * to; +} known_extensions[] = { + { ".lz", "" }, + { ".tlz", ".tar" }, + { 0, 0 } +}; + +typedef struct Lzma_options Lzma_options; +struct Lzma_options { + int dict_size; /* 4 KiB .. 512 MiB */ + int match_len_limit; /* 5 .. 273 */ +}; + +enum Mode { m_compress, m_decompress, }; + +char *output_filename = nil; +int outfd = -1; +bool delete_output_on_interrupt = false; + +static void +usage(void) +{ + fprintf(stderr, "Usage: %s [-[0-9]cdv] [file...]\n", argv0); + exit(2); +} + +char * +bad_version(unsigned version) +{ + static char buf[80]; + + snprintf(buf, sizeof buf, "Version %ud member format not supported.", + version); + return buf; +} + +char * +format_ds(unsigned dict_size) +{ + enum { bufsize = 16, factor = 1024 }; + char *prefix[8] = { "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi" }; + char *p = ""; + char *np = " "; + unsigned num = dict_size, i; + bool exact = (num % factor == 0); + static char buf[bufsize]; + + for (i = 0; i < 8 && (num > 9999 || (exact && num >= factor)); ++i) { + num /= factor; + if (num % factor != 0) + exact = false; + p = prefix[i]; + np = ""; + } + snprintf( buf, bufsize, "%s%4ud %sB", np, num, p ); + return buf; +} + +static void +show_header(unsigned dict_size) +{ + if (verbosity >= 3) + fprintf(stderr, "dictionary %s. ", format_ds( dict_size) ); +} + +static uvlong +getnum(char *ptr, uvlong llimit, uvlong ulimit) +{ + int bad; + uvlong result; + char *tail; + + bad = 0; + result = strtoull(ptr, &tail, 0); + if (tail == ptr) { + show_error( "Bad or missing numerical argument.", 0, true ); + exit(1); + } + + if (!errno && tail[0]) { + unsigned factor = (tail[1] == 'i') ? 1024 : 1000; + int i, exponent = 0; /* 0 = bad multiplier */ + + switch (tail[0]) { + case 'Y': + exponent = 8; + break; + case 'Z': + exponent = 7; + break; + case 'E': + exponent = 6; + break; + case 'P': + exponent = 5; + break; + case 'T': + exponent = 4; + break; + case 'G': + exponent = 3; + break; + case 'M': + exponent = 2; + break; + case 'K': + if (factor == 1024) + exponent = 1; + break; + case 'k': + if (factor == 1000) + exponent = 1; + break; + } + if (exponent <= 0) { + show_error( "Bad multiplier in numerical argument.", 0, true ); + exit(1); + } + for (i = 0; i < exponent; ++i) { + if (ulimit / factor >= result) + result *= factor; + else { + bad++; + break; + } + } + } + if (bad || result < llimit || result > ulimit) { + show_error( "Numerical argument out of limits.", 0, false ); + exit(1); + } + return result; +} + +static int +get_dict_size(char *arg) +{ + char *tail; + long bits = strtol(arg, &tail, 0); + + if (bits >= min_dict_bits && + bits <= max_dict_bits && *tail == 0) + return (1 << bits); + return getnum(arg, min_dict_size, max_dict_size); +} + +void +set_mode(enum Mode *program_modep, enum Mode new_mode) +{ + if (*program_modep != m_compress && *program_modep != new_mode) { + show_error( "Only one operation can be specified.", 0, true ); + exit(1); + } + *program_modep = new_mode; +} + +static int +extension_index(char *name) +{ + int eindex; + + for (eindex = 0; known_extensions[eindex].from; ++eindex) { + char * ext = known_extensions[eindex].from; + unsigned name_len = strlen(name); + unsigned ext_len = strlen(ext); + + if (name_len > ext_len && + strncmp(name + name_len - ext_len, ext, ext_len) == 0) + return eindex; + } + return - 1; +} + +int +open_instream(char *name, Dir *, bool, bool) +{ + int infd = open(name, OREAD); + + if (infd < 0) + show_file_error( name, "Can't open input file", errno ); + return infd; +} + +static int +open_instream2(char *name, Dir *in_statsp, enum Mode program_mode, + int eindex, bool recompress, bool to_stdout) +{ + bool no_ofile = to_stdout; + + if (program_mode == m_compress && !recompress && eindex >= 0) { + if (verbosity >= 0) + fprintf( stderr, "%s: Input file '%s' already has '%s' suffix.\n", + argv0, name, known_extensions[eindex].from); + return - 1; + } + return open_instream(name, in_statsp, no_ofile, false); +} + +/* assure at least a minimum size for buffer 'buf' */ +void * +resize_buffer(void *buf, unsigned min_size) +{ + buf = realloc(buf, min_size); + if (!buf) { + show_error("Not enough memory.", 0, false); + cleanup_and_fail(1); + } + return buf; +} + +static void +set_c_outname(char *name, bool multifile) +{ + output_filename = resize_buffer(output_filename, strlen(name) + 5 + + strlen(known_extensions[0].from) + 1); + strcpy(output_filename, name); + if (multifile) + strcat( output_filename, "00001" ); + strcat(output_filename, known_extensions[0].from); +} + +static void +set_d_outname(char *name, int eindex) +{ + unsigned name_len = strlen(name); + if (eindex >= 0) { + char * from = known_extensions[eindex].from; + unsigned from_len = strlen(from); + + if (name_len > from_len) { + output_filename = resize_buffer(output_filename, name_len + + strlen(known_extensions[eindex].to) + 1); + strcpy(output_filename, name); + strcpy(output_filename + name_len - from_len, known_extensions[eindex].to); + return; + } + } + output_filename = resize_buffer(output_filename, name_len + 4 + 1); + strcpy(output_filename, name); + strcat(output_filename, ".out"); + if (verbosity >= 1) + fprintf( stderr, "%s: Can't guess original name for '%s' -- using '%s'\n", + argv0, name, output_filename); +} + +static bool +open_outstream(bool force, bool) +{ + int flags = OWRITE; + + if (force) + flags |= OTRUNC; + else + flags |= OEXCL; + + outfd = create(output_filename, flags, 0666); + if (outfd >= 0) + delete_output_on_interrupt = true; + else if (verbosity >= 0) + fprintf(stderr, "%s: Can't create output file '%s': %r\n", + argv0, output_filename); + return outfd >= 0; +} + +static bool +check_tty(int, enum Mode program_mode) +{ + if (program_mode == m_compress && isatty(outfd) || + program_mode == m_decompress && isatty(infd)) { + usage(); + return false; + } + return true; +} + +void +cleanup_and_fail(int retval) +{ + if (delete_output_on_interrupt) { + delete_output_on_interrupt = false; + if (verbosity >= 0) + fprintf(stderr, "%s: Deleting output file '%s', if it exists.\n", + argv0, output_filename); + if (outfd >= 0) { + close(outfd); + outfd = -1; + } + if (remove(output_filename) != 0) + fprintf(stderr, "%s: can't remove output file %s: %r\n", + argv0, output_filename); + } + exit(retval); +} + +/* Set permissions, owner and times. */ +static void +close_and_set_permissions(Dir *) +{ + if (close(outfd) != 0) { + show_error( "Error closing output file", errno, false ); + cleanup_and_fail(1); + } + outfd = -1; + delete_output_on_interrupt = false; +} + +static bool +next_filename(void) +{ + int i, j; + unsigned name_len = strlen(output_filename); + unsigned ext_len = strlen(known_extensions[0].from); + + if ( name_len >= ext_len + 5 ) /* "*00001.lz" */ + for (i = name_len - ext_len - 1, j = 0; j < 5; --i, ++j) { + if (output_filename[i] < '9') { + ++output_filename[i]; + return true; + } else + output_filename[i] = '0'; + } + return false; +} + +typedef struct Poly_encoder Poly_encoder; +struct Poly_encoder { + LZ_encoder_base *eb; + LZ_encoder *e; + FLZ_encoder *fe; +}; + +static int +compress(uvlong member_size, uvlong volume_size, + int infd, Lzma_options *encoder_options, Pretty_print *pp, + Dir *in_statsp, bool zero) +{ + int retval = 0; + uvlong in_size = 0, out_size = 0, partial_volume_size = 0; + uvlong cfile_size = in_statsp? in_statsp->length / 100: 0; + Poly_encoder encoder = { 0, 0, 0 }; /* polymorphic encoder */ + bool error = false; + + if (verbosity >= 1) + Pp_show_msg(pp, 0); + + if (zero) { + encoder.fe = (FLZ_encoder *)malloc(sizeof * encoder.fe); + if (!encoder.fe || !FLZe_init(encoder.fe, infd, outfd)) + error = true; + else + encoder.eb = &encoder.fe->eb; + } else { + File_header header; + + if (Fh_set_dict_size(header, encoder_options->dict_size) && + encoder_options->match_len_limit >= min_match_len_limit && + encoder_options->match_len_limit <= max_match_len) + encoder.e = (LZ_encoder *)malloc(sizeof * encoder.e); + else + internal_error( "invalid argument to encoder." ); + if (!encoder.e || !LZe_init(encoder.e, Fh_get_dict_size(header), + encoder_options->match_len_limit, infd, outfd)) + error = true; + else + encoder.eb = &encoder.e->eb; + } + if (error) { + Pp_show_msg( pp, "Not enough memory. Try a smaller dictionary size." ); + return 1; + } + + for(;;) { /* encode one member per iteration */ + uvlong size; + vlong freevolsz; + + size = member_size; + if (volume_size > 0) { + freevolsz = volume_size - partial_volume_size; + if (size > freevolsz) + size = freevolsz; /* limit size */ + } + show_progress(in_size, &encoder.eb->mb, pp, cfile_size); /* init */ + if ((zero && !FLZe_encode_member(encoder.fe, size)) || + (!zero && !LZe_encode_member(encoder.e, size))) { + Pp_show_msg( pp, "Encoder error." ); + retval = 1; + break; + } + in_size += Mb_data_position(&encoder.eb->mb); + out_size += Re_member_position(&encoder.eb->renc); + if (Mb_data_finished(&encoder.eb->mb)) + break; + if (volume_size > 0) { + partial_volume_size += Re_member_position(&encoder.eb->renc); + if (partial_volume_size >= volume_size - min_dict_size) { + partial_volume_size = 0; + if (delete_output_on_interrupt) { + close_and_set_permissions(in_statsp); + if (!next_filename()) { + Pp_show_msg( pp, "Too many volume files." ); + retval = 1; + break; + } + if (!open_outstream(true, !in_statsp)) { + retval = 1; + break; + } + } + } + } + if (zero) + FLZe_reset(encoder.fe); + else + LZe_reset(encoder.e); + } + + if (retval == 0 && verbosity >= 1) + if (in_size == 0 || out_size == 0) + fputs( " no data compressed.\n", stderr ); + else { + if (0) + fprintf(stderr, + "%6.3f:1, %6.3f bits/byte, %5.2f%% saved, ", + (double)in_size / out_size, + (8.0 * out_size) / in_size, + 100.0 * (1.0 - (double)out_size/in_size)); + fprintf(stderr, "%llud in, %llud out.\n", + in_size, out_size); + } + LZeb_free(encoder.eb); + if (zero) + free(encoder.fe); + else + free(encoder.e); + return retval; +} + +static uchar +xdigit(unsigned value) +{ + if (value <= 9) + return '0' + value; + if (value <= 15) + return 'A' + value - 10; + return 0; +} + +static bool +show_trailing_data(uchar *data, int size, Pretty_print *pp, bool all, + bool ignore_trailing) +{ + if (verbosity >= 4 || !ignore_trailing) { + char buf[128]; + int i, len = snprintf(buf, sizeof buf, "%strailing data = ", + all? "": "first bytes of "); + + if (len < 0) + len = 0; + for (i = 0; i < size && len + 2 < sizeof buf; ++i) { + buf[len++] = xdigit(data[i] >> 4); + buf[len++] = xdigit(data[i] & 0x0F); + buf[len++] = ' '; + } + if (len < sizeof buf) + buf[len++] = '\''; + for (i = 0; i < size && len < sizeof buf; ++i) { + if (isprint(data[i])) + buf[len++] = data[i]; + else + buf[len++] = '.'; + } + if (len < sizeof buf) + buf[len++] = '\''; + if (len < sizeof buf) + buf[len] = 0; + else + buf[sizeof buf - 1] = 0; + Pp_show_msg(pp, buf); + if (!ignore_trailing) + show_file_error(pp->name, trailing_msg, 0); + } + return ignore_trailing; +} + +static int +decompress(int infd, Pretty_print *pp, bool ignore_trailing) +{ + uvlong partial_file_pos = 0; + Range_decoder rdec; + int retval = 0; + bool first_member; + + if (!Rd_init(&rdec, infd)) { + show_error( "Not enough memory.", 0, false ); + cleanup_and_fail(1); + } + + for (first_member = true; ; first_member = false) { + int result, size; + unsigned dict_size; + File_header header; + LZ_decoder decoder; + + Rd_reset_member_position(&rdec); + size = Rd_read_data(&rdec, header, Fh_size); + if (Rd_finished(&rdec)) /* End Of File */ { + if (first_member || Fh_verify_prefix(header, size)) { + Pp_show_msg( pp, "File ends unexpectedly at member header." ); + retval = 2; + } else if (size > 0 && !show_trailing_data(header, size, pp, + true, ignore_trailing)) + retval = 2; + break; + } + if (!Fh_verify_magic(header)) { + if (first_member) { + show_file_error(pp->name, bad_magic_msg, 0); + retval = 2; + } else if (!show_trailing_data(header, size, pp, + false, ignore_trailing)) + retval = 2; + break; + } + if (!Fh_verify_version(header)) { + Pp_show_msg(pp, bad_version(Fh_version(header))); + retval = 2; + break; + } + dict_size = Fh_get_dict_size(header); + if (!isvalid_ds(dict_size)) { + Pp_show_msg(pp, bad_dict_msg); + retval = 2; + break; + } + + if (verbosity >= 2 || (verbosity == 1 && first_member)) { + Pp_show_msg(pp, 0); + show_header(dict_size); + } + + if (!LZd_init(&decoder, &rdec, dict_size, outfd)) { + Pp_show_msg( pp, "Not enough memory." ); + retval = 1; + break; + } + result = LZd_decode_member(&decoder, pp); + partial_file_pos += Rd_member_position(&rdec); + LZd_free(&decoder); + if (result != 0) { + if (verbosity >= 0 && result <= 2) { + Pp_show_msg(pp, 0); + fprintf(stderr, "%s: %s at pos %llud\n", + argv0, (result == 2? + "file ends unexpectedly": + "decoder error"), partial_file_pos); + } + retval = 2; + break; + } + if (verbosity >= 2) { + fputs("done\n", stderr); + Pp_reset(pp); + } + } + Rd_free(&rdec); + if (verbosity == 1 && retval == 0) + fputs("done\n", stderr); + return retval; +} + +void +signal_handler(int sig) +{ + USED(sig); + show_error("interrupt caught, quitting.", 0, false); + cleanup_and_fail(1); +} + +static void +set_signals(void) +{ +} + +void +show_error(char *msg, int, bool help) +{ + if (verbosity < 0) + return; + if (msg && msg[0]) + fprintf(stderr, "%s: %s: %r\n", argv0, msg); + if (help) + fprintf(stderr, "Try '%s --help' for more information.\n", + argv0); +} + +void +show_file_error(char *filename, char *msg, int errcode) +{ + if (verbosity < 0) + return; + fprintf(stderr, "%s: %s: %s", argv0, filename, msg); + if (errcode > 0) + fprintf(stderr, ": %r"); + fputc('\n', stderr); +} + +void +internal_error(char *msg) +{ + if (verbosity >= 0) + fprintf( stderr, "%s: internal error: %s\n", argv0, msg ); + exit(3); +} + +void +show_progress(uvlong partial_size, Matchfinder_base *m, + Pretty_print *p, uvlong cfile_size) +{ + static uvlong psize = 0, csize = 0; /* csize=file_size/100 */ + static Matchfinder_base *mb = 0; + static Pretty_print *pp = 0; + + if (verbosity < 2) + return; + if (m) { /* initialize static vars */ + csize = cfile_size; + psize = partial_size; + mb = m; + pp = p; + } + if (mb && pp) { + uvlong pos = psize + Mb_data_position(mb); + + if (csize > 0) + fprintf( stderr, "%4llud%%", pos / csize ); + fprintf( stderr, " %.1f MB\r", pos / 1000000.0 ); + Pp_reset(pp); + Pp_show_msg(pp, 0); /* restore cursor position */ + } +} + +/* + * Mapping from gzip/bzip2 style 1..9 compression modes to the corresponding + * LZMA compression modes. + */ +static Lzma_options option_mapping[] = { + { 1 << 16, 16 }, + { 1 << 20, 5 }, + { 3 << 19, 6 }, + { 1 << 21, 8 }, + { 3 << 20, 12 }, + { 1 << 22, 20 }, + { 1 << 23, 36 }, + { 1 << 24, 68 }, + { 3 << 23, 132 }, +// { 1 << 25, max_match_len }, // TODO + { 1 << 26, max_match_len }, +}; + +void +main(int argc, char *argv[]) +{ + int num_filenames, infd, i, retval = 0; + bool filenames_given = false, force = false, ignore_trailing = true, + recompress = false, + stdin_used = false, to_stdout = false, zero = false; + uvlong max_member_size = 0x0008000000000000ULL; + uvlong max_volume_size = 0x4000000000000000ULL; + uvlong member_size = max_member_size; + uvlong volume_size = 0; + char *default_output_filename = ""; + char **filenames = nil; + enum Mode program_mode = m_compress; + Lzma_options encoder_options = option_mapping[6]; /* default = "-6" */ + Pretty_print pp; + + CRC32_init(); + + ARGBEGIN { + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + zero = (ARGC() == '0'); + encoder_options = option_mapping[ARGC() - '0']; + break; + case 'a': + ignore_trailing = false; + break; + case 'b': + member_size = getnum(EARGF(usage()), 100000, max_member_size); + break; + case 'c': + to_stdout = true; + break; + case 'd': + set_mode(&program_mode, m_decompress); + break; + case 'f': + force = true; + break; + case 'F': + recompress = true; + break; + case 'm': + encoder_options.match_len_limit = + getnum(EARGF(usage()), min_match_len_limit, max_match_len); + zero = false; + break; + case 'o': + default_output_filename = EARGF(usage()); + break; + case 'q': + verbosity = -1; + break; + case 's': + encoder_options.dict_size = get_dict_size(EARGF(usage())); + zero = false; + break; + case 'S': + volume_size = getnum(EARGF(usage()), 100000, max_volume_size); + break; + case 'v': + if (verbosity < 4) + ++verbosity; + break; + default: + usage(); + } ARGEND + + num_filenames = max(1, argc); + filenames = resize_buffer(filenames, num_filenames * sizeof filenames[0]); + filenames[0] = "-"; + for (i = 0; i < argc; ++i) { + filenames[i] = argv[i]; + if (strcmp(filenames[i], "-") != 0) + filenames_given = true; + } + + if (program_mode == m_compress) { + Dis_slots_init(); + Prob_prices_init(); + } + + if (!to_stdout && (filenames_given || default_output_filename[0])) + set_signals(); + + Pp_init(&pp, filenames, num_filenames, verbosity); + + output_filename = resize_buffer(output_filename, 1); + for (i = 0; i < num_filenames; ++i) { + char *input_filename = ""; + int tmp, eindex; + Dir in_stats; + Dir *in_statsp; + + output_filename[0] = 0; + if ( !filenames[i][0] || strcmp( filenames[i], "-" ) == 0 ) { + if (stdin_used) + continue; + else + stdin_used = true; + infd = 0; + if (to_stdout || !default_output_filename[0]) + outfd = 1; + else { + if (program_mode == m_compress) + set_c_outname(default_output_filename, + volume_size > 0); + else { + output_filename = resize_buffer(output_filename, + strlen(default_output_filename)+1); + strcpy(output_filename, + default_output_filename); + } + if (!open_outstream(force, true)) { + if (retval < 1) + retval = 1; + close(infd); + continue; + } + } + } else { + eindex = extension_index(input_filename = filenames[i]); + infd = open_instream2(input_filename, &in_stats, + program_mode, eindex, recompress, to_stdout); + if (infd < 0) { + if (retval < 1) + retval = 1; + continue; + } + if (to_stdout) + outfd = 1; + else { + if (program_mode == m_compress) + set_c_outname(input_filename, + volume_size > 0); + else + set_d_outname(input_filename, eindex); + if (!open_outstream(force, false)) { + if (retval < 1) + retval = 1; + close(infd); + continue; + } + } + } + + Pp_set_name(&pp, input_filename); + if (!check_tty(infd, program_mode)) { + if (retval < 1) + retval = 1; + cleanup_and_fail(retval); + } + + in_statsp = input_filename[0]? &in_stats: nil; + if (program_mode == m_compress) + tmp = compress(member_size, volume_size, infd, + &encoder_options, &pp, in_statsp, zero); + else + tmp = decompress(infd, &pp, ignore_trailing); + if (tmp > retval) + retval = tmp; + if (tmp) + cleanup_and_fail(retval); + + if (delete_output_on_interrupt) + close_and_set_permissions(in_statsp); + if (input_filename[0]) + close(infd); + } + if (outfd >= 0 && close(outfd) != 0) { + show_error("Can't close stdout", errno, false); + if (retval < 1) + retval = 1; + } + free(output_filename); + free(filenames); + exit(retval); +} diff -Nru /sys/src/cmd/lzip/mkfile /sys/src/cmd/lzip/mkfile --- /sys/src/cmd/lzip/mkfile Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/mkfile Sat May 1 00:00:00 2021 @@ -0,0 +1,21 @@ +# mkfile for lzip - LZMA lossless data compressor + /dev/null || + { + echo "$0: a POSIX shell is required to run the tests" + echo "Try bash -c \"$0 $1 $2\"" + exit 1 + } + +if [ -d tmp ] ; then rm -rf tmp ; fi +mkdir tmp +cd "${objdir}"/tmp || framework_failure + +cat "${testdir}"/test.txt > in || framework_failure +in_lz="${testdir}"/test.txt.lz +fail=0 +test_failed() { fail=1 ; printf " $1" ; [ -z "$2" ] || printf "($2)" ; } + +printf "testing clzip-%s..." "$2" + +"${LZIP}" -fkqm4 in +{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO +"${LZIP}" -fkqm274 in +{ [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO +for i in bad_size -1 0 4095 513MiB 1G 1T 1P 1E 1Z 1Y 10KB ; do + "${LZIP}" -fkqs $i in + { [ $? = 1 ] && [ ! -e in.lz ] ; } || test_failed $LINENO $i +done +"${LZIP}" -lq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq < in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -cdq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -cdq < in +[ $? = 2 ] || test_failed $LINENO +# these are for code coverage +"${LZIP}" -lt "${in_lz}" 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdl "${in_lz}" > out 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdt "${in_lz}" > out 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -t -- nx_file 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --help > /dev/null || test_failed $LINENO +"${LZIP}" -n1 -V > /dev/null || test_failed $LINENO +"${LZIP}" -m 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -z 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --bad_option 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --t 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --test=2 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output= 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +printf "LZIP\001-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\002-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\001+.............................." | "${LZIP}" -t 2> /dev/null + +printf "\ntesting decompression..." + +"${LZIP}" -lq "${in_lz}" || test_failed $LINENO +"${LZIP}" -t "${in_lz}" || test_failed $LINENO +"${LZIP}" -cd "${in_lz}" > copy || test_failed $LINENO +cmp in copy || test_failed $LINENO + +rm -f copy +cat "${in_lz}" > copy.lz || framework_failure +"${LZIP}" -dk copy.lz || test_failed $LINENO +cmp in copy || test_failed $LINENO +printf "to be overwritten" > copy || framework_failure +"${LZIP}" -d copy.lz 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -df copy.lz +{ [ $? = 0 ] && [ ! -e copy.lz ] && cmp in copy ; } || test_failed $LINENO + +printf "to be overwritten" > copy || framework_failure +"${LZIP}" -df -o copy < "${in_lz}" || test_failed $LINENO +cmp in copy || test_failed $LINENO + +rm -f copy +"${LZIP}" < in > anyothername || test_failed $LINENO +"${LZIP}" -dv --output copy - anyothername - < "${in_lz}" 2> /dev/null +{ [ $? = 0 ] && cmp in copy && cmp in anyothername.out ; } || + test_failed $LINENO +rm -f copy anyothername.out + +"${LZIP}" -lq in "${in_lz}" +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -lq nx_file.lz "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -tq in "${in_lz}" +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq nx_file.lz "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdq in "${in_lz}" > copy +{ [ $? = 2 ] && cat copy in | cmp in - ; } || test_failed $LINENO +"${LZIP}" -cdq nx_file.lz "${in_lz}" > copy +{ [ $? = 1 ] && cmp in copy ; } || test_failed $LINENO +rm -f copy +cat "${in_lz}" > copy.lz || framework_failure +for i in 1 2 3 4 5 6 7 ; do + printf "g" >> copy.lz || framework_failure + "${LZIP}" -alvv copy.lz "${in_lz}" > /dev/null 2>&1 + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -atvvvv copy.lz "${in_lz}" 2> /dev/null + [ $? = 2 ] || test_failed $LINENO $i +done +"${LZIP}" -dq in copy.lz +{ [ $? = 2 ] && [ -e copy.lz ] && [ ! -e copy ] && [ ! -e in.out ] ; } || + test_failed $LINENO +"${LZIP}" -dq nx_file.lz copy.lz +{ [ $? = 1 ] && [ ! -e copy.lz ] && [ ! -e nx_file ] && cmp in copy ; } || + test_failed $LINENO + +cat in in > in2 || framework_failure +cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure +"${LZIP}" -lq in2.lz || test_failed $LINENO +"${LZIP}" -t in2.lz || test_failed $LINENO +"${LZIP}" -cd in2.lz > copy2 || test_failed $LINENO +cmp in2 copy2 || test_failed $LINENO + +"${LZIP}" --output=copy2 < in2 || test_failed $LINENO +"${LZIP}" -lq copy2.lz || test_failed $LINENO +"${LZIP}" -t copy2.lz || test_failed $LINENO +"${LZIP}" -cd copy2.lz > copy2 || test_failed $LINENO +cmp in2 copy2 || test_failed $LINENO + +printf "\ngarbage" >> copy2.lz || framework_failure +"${LZIP}" -tvvvv copy2.lz 2> /dev/null || test_failed $LINENO +rm -f copy2 +"${LZIP}" -alq copy2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq copy2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq < copy2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -adkq copy2.lz +{ [ $? = 2 ] && [ ! -e copy2 ] ; } || test_failed $LINENO +"${LZIP}" -adkq -o copy2 < copy2.lz +{ [ $? = 2 ] && [ ! -e copy2 ] ; } || test_failed $LINENO +printf "to be overwritten" > copy2 || framework_failure +"${LZIP}" -df copy2.lz || test_failed $LINENO +cmp in2 copy2 || test_failed $LINENO + +printf "\ntesting compression..." + +"${LZIP}" -cf "${in_lz}" > out 2> /dev/null # /dev/null is a tty on OS/2 +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cFvvm36 "${in_lz}" > out 2> /dev/null || test_failed $LINENO +"${LZIP}" -cd out | "${LZIP}" -d > copy || test_failed $LINENO +cmp in copy || test_failed $LINENO + +for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do + "${LZIP}" -k -$i in || test_failed $LINENO $i + mv -f in.lz copy.lz || test_failed $LINENO $i + printf "garbage" >> copy.lz || framework_failure + "${LZIP}" -df copy.lz || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i +done + +for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do + "${LZIP}" -c -$i in > out || test_failed $LINENO $i + printf "g" >> out || framework_failure + "${LZIP}" -cd out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i +done + +for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do + "${LZIP}" -$i < in > out || test_failed $LINENO $i + "${LZIP}" -d < out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i +done + +for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do + "${LZIP}" -f -$i -o out < in || test_failed $LINENO $i + "${LZIP}" -df -o copy < out.lz || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i +done + +cat in in in in in in in in > in8 || framework_failure +"${LZIP}" -1s12 -S100k -o out < in8 || test_failed $LINENO +"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO +"${LZIP}" -cd out00001.lz out00002.lz | cmp in8 - || test_failed $LINENO +rm -f out00001.lz +"${LZIP}" -1ks4Ki -b100000 in8 || test_failed $LINENO +"${LZIP}" -t in8.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz | cmp in8 - || test_failed $LINENO +rm -f in8 +"${LZIP}" -0 -S100k -o out < in8.lz || test_failed $LINENO +"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO +"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO +rm -f out00001.lz out00002.lz +"${LZIP}" -0kF -b100k in8.lz || test_failed $LINENO +"${LZIP}" -t in8.lz.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz.lz | cmp in8.lz - || test_failed $LINENO +rm -f in8.lz in8.lz.lz + +printf "\ntesting bad input..." + +cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure +if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null && + [ -e trunc.lz ] && cmp in2.lz trunc.lz > /dev/null 2>&1 ; then + for i in 6 20 14734 14753 14754 14755 14756 14757 14758 ; do + dd if=in3.lz of=trunc.lz bs=$i count=1 2> /dev/null + "${LZIP}" -lq trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -t trunc.lz 2> /dev/null + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -tq < trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -cdq trunc.lz > out + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -dq < trunc.lz > out + [ $? = 2 ] || test_failed $LINENO $i + done +else + printf "\nwarning: skipping truncation test: 'dd' does not work on your system." +fi + +cat "${in_lz}" > ingin.lz || framework_failure +printf "g" >> ingin.lz || framework_failure +cat "${in_lz}" >> ingin.lz || framework_failure +"${LZIP}" -lq ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -t ingin.lz || test_failed $LINENO +"${LZIP}" -cd ingin.lz > copy || test_failed $LINENO +cmp in copy || test_failed $LINENO +"${LZIP}" -t < ingin.lz || test_failed $LINENO +"${LZIP}" -d < ingin.lz > copy || test_failed $LINENO +cmp in copy || test_failed $LINENO + +echo +if [ ${fail} = 0 ] ; then + echo "tests completed successfully." + cd "${objdir}" && rm -r tmp +else + echo "tests failed." +fi +exit ${fail} diff -Nru /sys/src/cmd/lzip/testsuite/test.txt /sys/src/cmd/lzip/testsuite/test.txt --- /sys/src/cmd/lzip/testsuite/test.txt Thu Jan 1 00:00:00 1970 +++ /sys/src/cmd/lzip/testsuite/test.txt Sat May 1 00:00:00 2021 @@ -0,0 +1,676 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. Binary files /sys/src/cmd/lzip/testsuite/test.txt.lz and /sys/src/cmd/lzip/testsuite/test.txt.lz differ