[GHC] #9577: String literals are wasting space

GHC ghc-devs at haskell.org
Thu Sep 11 20:39:40 UTC 2014


#9577: String literals are wasting space
-------------------------------------+-------------------------------------
       Reporter:  xnyhps             |                   Owner:
           Type:  bug                |                  Status:  new
       Priority:  low                |               Milestone:
      Component:  Compiler (NCG)     |                 Version:  7.8.2
       Keywords:                     |        Operating System:
   Architecture:  Unknown/Multiple   |  Unknown/Multiple
     Difficulty:  Unknown            |         Type of failure:  Runtime
     Blocked By:                     |  performance bug
Related Tickets:                     |               Test Case:
                                     |                Blocking:
                                     |  Differential Revisions:
-------------------------------------+-------------------------------------
 For [https://phabricator.haskell.org/D199 D199] I looked into how string
 literals are compiled down by GHC.

 On 64-bit OS X, a simple string `"AAA"` turns into assembly:

 {{{
 .const
 .align 3
 .align 0
 c38E_str:
         .byte   65
         .byte   65
         .byte   65
         .byte   0
 }}}

 (And also something that invokes `unpackCString#`, but that isn't relevant
 here.)

 (`MkCore.mkStringExprFS` -> `CmmUtils.mkByteStringCLit` ->
 `compiler/nativeGen/X86/Ppr.pprSectionHeader`.)

 Note how this:

 * Is 8 byte aligned.
 * Is a `.const` section.

 I can't find any reason why string literals would need to be 8-byte
 aligned on OS X. There might be a small benefit in performance to read
 data starting 8-byte aligned, but I doubt doing that for string literals
 would be a meaningful difference. Assembly from both clang and gcc does
 not align string literals.

 The trivial program:

 {{{#!hs
 main :: IO ()
 main = return ()
 }}}

 has almost 5kB of wasted space of padding between all strings the Prelude
 brings in, built with GHC HEAD.


 The fact that it is a `.const` section, instead of `.cstring`
 (https://developer.apple.com/library/mac/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html#//apple_ref/doc/uid/TP30000823-TPXREF127)
 means duplicate strings aren't shared by the assembler. GHC floats out
 string literals to the top-level and uses CSE to eliminate duplicates, but
 that only works in a single modules. Strings shared between different
 modules end up as duplicate strings in an executable.

 The same program as above also has ~4kB of wasted space due to duplicate
 Prelude strings (`"base"` occurs 16 times!). Compared to the total binary
 size (4MB after stripping), removing this redundant data wouldn't be a big
 improvement (0.2%), but I still think it can be a worthwile optimization.

 I think this can be solved quite easily by creating a new section header
 for literal strings, which is unaligned and of type `.cstring`.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9577>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list