Builders denote sequences of bytes. They are Monoids
concatenation, which runs in O(1).
A Builder is an efficient way to build lazy Text values. There are several functions for constructing builders, but only one to inspect them: to extract any data, you have to turn them into lazy Text values using toLazyText.
Internally, a builder constructs a lazy Text by filling arrays piece by piece. As each buffer is filled, it is 'popped' off, to become a new chunk of the resulting lazy Text. All this is hidden from the user of the Builder.
IntC Int deriving( Eq, Ord, Show )
type Row = [Cell] type Table = [Row] </pre>
We use the following imports and abbreviate mappend to simplify reading.
> import qualified Data.ByteString.Lazy as L
> import Data.ByteString.Builder
> import Data.ByteString.Builder.ASCII (intDec)
> import Data.Monoid
> import Data.Foldable (foldMap)
> import Data.List (intersperse)
> infixr 4 <>
> (<>) :: Monoid m => m -> m -> m
> (<>) = mappend
CSV is a character-based representation of tables. For maximal modularity, we could first render Tables as Strings and then encode this String using some Unicode character encoding. However, this sacrifices performance due to the intermediate String representation being built and thrown away right afterwards. We get rid of this intermediate String representation by fixing the character encoding to UTF-8 and using Builders to convert Tables directly to UTF-8 encoded CSV tables represented as lazy ByteStrings.
> encodeUtf8CSV :: Table -> L.ByteString
> encodeUtf8CSV = toLazyByteString . renderTable
> renderTable :: Table -> Builder
> renderTable rs = mconcat [renderRow r <> charUtf8 '\n' | r <- rs]
> renderRow :: Row -> Builder
> renderRow  = mempty
> renderRow (c:cs) =
> renderCell c <> mconcat [ charUtf8 ',' <> renderCell c' | c' <- cs ]
> renderCell :: Cell -> Builder
> renderCell (StringC cs) = renderString cs
> renderCell (IntC i) = intDec i
> renderString :: String -> Builder
> renderString cs = charUtf8 '"' <> foldMap escape cs <> charUtf8 '"'
> escape '\\' = charUtf8 '\\' <> charUtf8 '\\'
> escape '\"' = charUtf8 '\\' <> charUtf8 '\"'
> escape c = charUtf8 c
Note that the ASCII encoding is a subset of the UTF-8 encoding, which is why we can use the optimized function intDec to encode an Int as a decimal number with UTF-8 encoded digits. Using intDec is more efficient than stringUtf8 . show, as it avoids constructing an intermediate String. Avoiding this intermediate data structure significantly improves performance because encoding Cells is the core operation for rendering CSV-tables. See Data.ByteString.Builder.Prim for further information on how to improve the performance of renderString.
We demonstrate our UTF-8 CSV encoding function on the following table.
> strings :: [String]
> strings = ["hello", "\"1\"", "»-wörld"]
> table :: Table
> table = [map StringC strings, map IntC [-3..3]]
The expression encodeUtf8CSV table results in the following lazy ByteString.
> Chunk "\"hello\",\"\\\"1\\\"\",\"\206\187-w\195\182rld\"\n-3,-2,-1,0,1,2,3\n" Empty
We can clearly see that we are converting to a binary format. The '»' and 'ö' characters, which have a Unicode codepoint above 127, are expanded to their corresponding UTF-8 multi-byte representation.
We use the criterion library (http://hackage.haskell.org/package/criterion) to benchmark the efficiency of our encoding function on the following table.
> import Criterion.Main -- add this import to the ones above
> maxiTable :: Table
> maxiTable = take 1000 $ cycle table
> main :: IO ()
> main = defaultMain
> [ bench "encodeUtf8CSV maxiTable (original)" $
> whnf (L.length . encodeUtf8CSV) maxiTable
On a Core2 Duo 2.20GHz on a 32-bit Linux, the above code takes 1ms to generate the 22'500 bytes long lazy ByteString. Looking again at the definitions above, we see that we took care to avoid intermediate data structures, as otherwise we would sacrifice performance. For example, the following (arguably simpler) definition of renderRow is about 20% slower.
> renderRow :: Row -> Builder
> renderRow = mconcat . intersperse (charUtf8 ',') . map renderCell
Similarly, using O(n) concatentations like ++ or the equivalent concat operations on strict and lazy ByteStrings should be avoided. The following definition of renderString is also about 20% slower.
> renderString :: String -> Builder
> renderString cs = charUtf8 $ "\"" ++ concatMap escape cs ++ "\""
> escape '\\' = "\\"
> escape '\"' = "\\\""
> escape c = return c
Apart from removing intermediate data-structures, encodings can be optimized further by fine-tuning their execution parameters using the functions in Data.ByteString.Builder.Extra and their "inner loops" using the functions in Data.ByteString.Builder.Prim.
We decided to rename the Builder modules. Sorry about that.
The old names will hang about for at least once release cycle before we deprecate them and then later remove them.
Efficient construction of lazy Text values. The principal operations on a Builder are singleton, fromText, and fromLazyText, which construct new builders, and mappend, which concatenates two builders.
To get maximum performance when building lazy Text values using a builder, associate mappend calls to the right. For example, prefer
> singleton 'a' `mappend` (singleton 'b' `mappend` singleton 'c')
> singleton 'a' `mappend` singleton 'b' `mappend` singleton 'c'
as the latter associates mappend to the left.
This library provides an abstraction of buffered output of byte streams and several convenience functions to exploit it. For example, it allows to efficiently serialize Haskell values to lazy bytestrings with a large average chunk size. The large average chunk size allows to make good use of cache prefetching in later processing steps (e.g. compression) and reduces the system call overhead when writing the resulting lazy bytestring to a file or sending it over the network.
Convert streams of builders to streams of bytestrings.
This package integrates the builders from the blaze-builder package with the enumerator package. It provides infrastructure and enumeratees for incrementally executing builders and pass the filled chunks to a bytestring iteratee.
This is the bytestring builder that is debuting in bytestring-0.10.4.0, which should be shipping with GHC 7.8, probably late in 2013. This builder has several nice simplifications and improvements, and more out-of-box functionality than the older blaze-builder.
Note that this package detects which version of bytestring you are compiling against, and if you are compiling against bytestring-0.10.4 or later, will be an empty package.
This package lets the new interface and implementation be used with most older compilers without upgrading bytestring, which can be rather problematic. In conjunction with blaze-builder-0.4 or later, which offers an implementation of blaze-builder in terms of bytestring-builder, this should let most people try the new interface and implementation without causing undue compatibility problems with packages that depend on blaze-builder.
GHC 7.6 did debut an almost identical interface and implementation, but with slightly different module names and organization. Trying to re-export/rename the builder provided with 7.6 did not turn out to be very practical, because this interface includes new functions that rely on Builder internals, which are not exported in 7.6. Furthermore, these module names should be deprecated in 7.10.
diagrams-builder provides backend-agnostic tools for dynamically turning code into rendered diagrams, using the hint wrapper to the GHC API. It supports conditional recompilation using hashing of diagrams source code, to avoid recompiling code that has not changed. It is useful for creating tools which compile diagrams code embedded in other documents. For example, it is used by the BlogLiterately-diagrams package (a plugin for BlogLiterately) to compile diagrams embedded in Markdown-formatted blog posts.
Executables specific to the cairo, SVG, and postscript backends are included (more executables specific to other backends may be included in the future). All take an input file and an expression to render, and output an image file. If you want these executables you must explicitly enable the -fcairo, -fsvg, or -fps flags.
A LaTeX package, diagrams-latex.sty, is also provided in the latex/ directory of the source distribution, which renders diagrams code found within diagram environments. Note that diagrams-latex.sty is licensed under the GPL.
A declarative, monadic graph construction language for small graphs. See README.
This library builds text xoj format file from hoodle data structure
Output a Builder to a Handle. The Builder is executed directly on the buffer of the Handle. If the buffer is too small (or not present), then it is replaced with a large enough buffer.
It is recommended that the Handle is set to binary and BlockBuffering mode. See hSetBinaryMode and hSetBuffering.
This function is more efficient than hPut . toLazyByteString because in many cases no buffer allocation has to be done. Moreover, the results of several executions of short Builders are concatenated in the Handles buffer, therefore avoiding unnecessary buffer flushes.
HSlackBuilder automatically generates slackBuild scripts from a cabal package
Most json packages dictate a data structure that corresponds to json values. To serialize other values to json, then that value must be marshalled into the specified structure.
This library avoids this marshalling step, and is thus potentially more efficient when serializing arbitrary data structures. Unfortunately json-builder cannot yet read or process json data, and it's not clear to me yet how pull a similar kind of trick to avoid unnecessary data structures when parsing json data into arbitrary data types.
This library builds text xoj format file from xournal data structure